“Data archiving is the process of moving data that is no longer actively used to a separate data storage device for long-term retention. Data archives consist of older data that is still important and necessary for future reference. Data archives are indexed and have search capabilities so that data can be easily located and retrieved.” –Whatis.com
Having been around for more than 25 years, data archiving is definitely not one of those “tech buzz words”. It doesn’t compete for mindshare like social networking and iPads. And it doesn’t make for stimulating dinner party conversation. But in the last few years data archiving has become an increasingly important topic for IT departments. So why is it interesting now?
Well, there are two aspects. First, the need for data archiving has increased significantly as a result of the step change in regulation and the desire of businesses to retain more data and make it accessible for much longer. Second, the variety and amount of data makes everything more challenging from a technical perspective – setting up the processes to capture the data from different files (for example, XML) and databases, physically storing the data and then providing fast, ad-hoc access to it. This may not sound like much, but if these needs are not met, a business can be shut down.
A large broadcasting company faced just these types of challenges. In their case, the amount of data being added each month to their core business systems, such as CRM (customer relationship management) and billing, was causing operational issues – billing runs were taking too long; customer service representatives were experiencing poor response times while on the telephone with a customer, etc. In short, data archiving issues were directly impacting the company from both a service perspective and a financial perspective. Resolving them was high on the business agenda.
Naturally, the first potential solution the company explored was to add hardware to the existing operational systems and thereby remove the need to archive any data at all. However, given the amount of data being added to the databases each month, this would have meant a continual series of expensive upgrades over time with limited benefits. The problem was in the internal architecture of the databases holding the data: they just could not scale in a linear manner. After much testing, it was concluded that the only practical and cost-effective answer was simply to reduce the size of the large database tables. While required by law to store a full 7 years of data, the broadcasting company needs to keep only 18 months worth in the operational systems. The rest can be held in an archive. The vast majority of the data (billions of records) are concentrated in a few large “fact” tables, such as a billing or contact events. By archiving this data and removing it from the original operational systems, the analysis showed that significant performance and response-time benefits could be achieved.
There was a downside to this solution, however. Many thousands of users within the company still needed access to all of the data in real-time, and many of these were customer service representatives who demanded sub-second response times. This was made more challenging by the fact that another set of users wanted to be able to combine data from the archive with other data held in both operational and business intelligence systems – a sophisticated “federated query”.
The combination of these technical challenges meant that there simply wasn’t an off-the-shelf package available. The only company that could solve all aspects of their archiving requirements was Ab Initio.
Using the Ab Initio Co>Operating System, the company's developers built parallel applications called “graphs” to unload the billions of records to be archived from their operational databases. The extracted data was written in parallel to an Ab Initio Indexed Compressed Flat File (ICFF) system. ICFF systems not only can load data many times faster than traditional databases, but they also require only a small fraction of the disk to hold the data, which is ideal for an archive. To gain access to these storage devices, Ab Initio provided both a web service interface and an ANSI-standard SQL interface for ad-hoc queries. The SQL interface also supports federated queries across the archive and the operational databases. In summary: an end-to-end solution.
The project has been extremely successful for the broadcasting company. They have significantly improved the performance and responsiveness of their operational systems, and at the same time have sub-second access to terabytes of archived data. Given that the majority of their core data has now been moved from Tier-1 storage to compressed data held on inexpensive disk, it has also delivered real cost savings for the business.
Perhaps there is some sizzle in the old topic of data archiving, after all!