Take, for comparison purposes, the process of building an airplane. While an airplane may have hundreds of thousands of parts, each airplane coming off the assembly line is approximately the same as the previous one. The parts are predesigned and manufactured to fit together. The architecture of the plane is well known (two wings, a fuselage, landing gear…), and at design time, the engineering needed to choose sizes and shapes and materials is all known. All airplanes fly through air, and the properties of air on Earth are well documented. So as hard as it is to build airplanes, it is a relatively repeatable task that has predictable results.
This is not so for EDWs. Each EDW is different from all others. Each has a completely different set of inputs and outputs, and different methods for putting all the data together into an understandable structure. In an attempt to create conformity and reuse, customers have moved in the direction of using “industry-standard models”. While this sounds like a good idea, this is akin to saying that airplanes, helicopters, and rocket ships can all be manufactured from a single master blueprint. Yes, they’re all conceptually the same, but it’s the details that kill.
And kill they do. In the EDW world, the details start with requirements gathering: The industry-standard model is morphed into a physical model. The physical model is turned into specifications. The specifications get turned into spreadsheets of rules and mappings. The spreadsheets get sent offshore where they are turned into code. The code is sent back where, often for the first time, it is run against actual data. Surprise, surprise: the code doesn’t match the data, and the team gets to start over. And over, and over again. This process is like starting the manufacture of an airplane before knowing whether it is going to fly through air or water. Even with the best architects and engineers, errors are going to be common and costly.
To compound the problem, because there are so many streams of data coming into and going out of the EDW, the EDW is deconstructed into a large number of “feeds,” and an army of people are employed to work separately on each one. To manage the large number of people, each feed is further deconstructed into distinct steps, and a different person is typically assigned to each step. The result is an enormous number of handoffs from one person to the next, and very little reuse since each feed is developed separately. All this multiplies the time needed to build an EDW. And time equals money. Using the standard approach with standard technologies, EDWs are very expensive to build.
But there is a better way.
Ab Initio has attacked this problem by applying “first principles” thinking and the results are startling. Instead of a feed taking 3 to 4 months to implement, the Ab Initio approach takes 2 to 3 weeks, and sometimes just days! Ab Initio calls this approach building a metadata driven warehouse (MDW).
An MDW is a collection of reusable Ab Initio-based applications that have the ability to process data from the source file all the way to the consolidated EDW model. This includes all the feed-processing complexity that is usually built by hand: file checking, data enrichments, filtering, validation, cleansing, key management, history management, aggregation, mapping, archiving, recovery, and model loading. Each one of these EDW activities has been abstracted to a level where the same template application can be used on a wide variety of feeds simply by changing the associated metadata.
By design, the MDW enables an analyst to specify and test the processing for a feed at the beginning of the cycle – usually without any development involvement at all. This approach is based on using metadata to drive the system. This is the same metadata that the analyst would normally have to specify in a document, such as the target data model, file formats, keys, mapping rules, and so on. Instead of writing a document and passing it along to a development team, the analyst, using the MDW, can enter the metadata directly, and then run and test the resulting application immediately on real data, avoiding the long and costly development and test iterations. The savings in time are enormous, and the quality of the resulting system is much higher since it passes through many fewer hands.
The Ab Initio technology was designed, from the beginning, to support this approach. From the user’s perspective, the savings begin with the construction of the MDW and the benefits are ongoing. For example, all the entered metadata is held in Ab Initio’s metadata repository, called the Enterprise Meta>Environment (EME), providing the necessary management capabilities, such as version control, data lineage, impact analysis, data quality, and access security. The EME gives management an integrated view into their systems that they would never have otherwise. They can ask questions about how fields in reports were calculated, tracing all the way back to operational systems. They can quickly answer questions about how data propagates around the system – answers that are necessary for auditors and for predicting maintenance efforts. And they can get a handle on the quality of their data, and how that quality impacts downstream systems. In short, management can finally understand the systems they have paid to build and that were previously black boxes.
Ab Initio has helped many customers build MDWs. And while each Ab Initio-built MDW is able to meet the specific and differing needs of those customers, the results are repeatable and predictable.