Oswald Regular
OpenSans Regular
Enterprise Meta>Environment
The Ab Initio Metadata System

The IT infrastructure is the central nervous systems of modern businesses, and management needs to know everything about it. What information flows through it, what does that information represent, how accurate is it, how does it flow from one place to another, how is it processed, and where is it stored? This is what metadata is about – it is “information about information.”

But getting that metadata is not so simple. While there are products that claim to address this need, they have taken a very academic approach. Indeed, the concept of “information about information” raises the question of what “information” is in the first place. So these metadata products have focused on defining concepts and how the concepts relate to each other. While these concepts eventually connect with actual information, the connections are tenuous. This metadata must be manually entered by humans and is therefore subjective, incomplete, subject to human error, and inevitably obsolete, since it trails the real systems, which are changing constantly.

Ab Initio has taken a very different approach by focusing on operational metadata. Operational metadata is actionable by business and IT management. It is about the systems that process data, the applications in those systems, and the rules in those applications. It is about the datasets throughout the enterprise, what is in them, how they got there, and who uses them. It is about the quality of the data, and how that quality has changed over time. It is about all the many things in all the IT systems.

Ab Initio also ties this operational metadata together with business metadata – business definitions, created by business people, of the various pieces of information throughout the enterprise. The result is a true enterprise metadata management system – the Ab Initio Enterprise Meta>Environment, or EME.

An enterprise metadata management system must be many things to many people:

  • The CFO needs to be able to tell regulators what a field in a report means and what the source of the data in it is.
  • The CIO wants to know about the IT systems – hardware and software – in the company. Who owns this system? What systems does it depend on? What systems depend on it? What is the level of data quality across these systems, and how does it change from one system to the next?
  • The business analyst who is helping a division head manage her business needs a business glossary that will help her find the pieces of data she needs to pull together for an analysis by 5 PM today.
  • The operations staff wants to know what happened in production, today and in the past. What jobs ran successfully? How long did they take? How much data was processed? How much spare capacity is available? How good is the quality of the data?
  • The systems architect is concerned with the inventory of applications, data tables, files, and messages that make up the company’s systems. How do they all connect? What produces what? What reads what? What depends on what?
  • Application developers want to know the history of changes to their code. What does the data look like now? Who fixed what? When? How? Why? What got released? What work is still in progress?

There is no end to these kinds of questions. Getting useful answers quickly is essential. These are questions that can be answered with the Ab Initio Enterprise Meta>Environment.

Different metadata in different contexts

The term “metadata” has different meanings across industries. Ab Initio uses the term “metadata” in the context of the business computing world. In the image processing world, for example, it means something altogether different: information such as when an image was captured, what kind of device took the picture, what the lighting was, and so on. Web pages have metadata too, this being the language the page was written in, the tools used to create it, and how to find more information on this topic.

Navigating and understanding metadata

Ab Initio’s metadata graphical user interface, the EME Metadata Portal, allows one to start at any point in the system and explore in any direction one chooses. All this metadata is presented at the appropriate level of detail for each audience. Business users are not overwhelmed with technical minutiae when they are trying to answer business questions, while developers and operational staff can easily find the details of interest to them.

Consider a file that the EME has identified as the ultimate source for a calculation used in a report. What can the EME tell you, a user, about this file? Through Ab Initio’s approach of relating elements of metadata, one to the other, you can glean interesting and important information about the file from the intuitive graphical interface, including:

  • Which applications use the file
  • Its record format
  • Its data quality
  • Its size over time
  • The documented, expected values for each of its fields
  • The actual census of values found
  • The stewards (and their managers) responsible for its governance
  • Documentation about its business meaning and the use of each of its fields
  • Its relationship to logical models and similar datasets, including database tables and messages
  • A list of programs that read or write the dataset

Below is a screen shot of the EME in the process of navigating metadata. The underlying screen is a lineage diagram that displays a number of datasets and their processing relationships. Each of the overlays shows different types of metadata that have all been linked together to the same metadata element.

The EME can show
DATA LINEAGE and:
DATA STEWARD INFORMATION
OPERATIONAL STATISTICS
DATA PROFILE RESULTS
CONCEPTUAL DEFINITIONS
DATASET DETAILS
MAPPING SPECIFICATIONS
ENTITY RELATIONSHIPS
DATA QUALITY METRICS
DATA QUALITY HEATMAP
SEMANTIC MODELS
b9
bg

Metadata integration

Capturing so much metadata and storing it in separate buckets would be an accomplishment in and of itself, but the EME does more than that. It establishes relationships between elements of metadata, which effectively enriches their value, revealing deeper meaning about the business to the real-world users of metadata at a company.

The challenge, of course, is how to gather all this metadata in a way that is actually useful. In large, complex organizations with heterogeneous, distributed (even global) environments, this challenge is particularly hard. There are issues of scalability and integration. How to gather metadata from such a disparate set of sources and technologies? How to process so much information? How to store it and display it intelligently, without overwhelming the user or dumbing down the content? How to marry metadata across lines of business, countries, even languages?

The EME integrates all the different kinds of metadata stored in it and, as a result, multiplies the value of each. For example, this integration enables end-to-end data lineage across technologies, consolidated operational statistics for comprehensive capacity planning, and fully linked data profile statistics and data quality metrics.

To begin with, all information about the definition and execution of Ab Initio applications is automatically captured and loaded into the EME. This includes business rules, data structures, application structure, documentation, and run-time statistics. Because users build end-to-end operational applications with the Co>Operating System, everything about those applications is automatically captured.

This metadata is then integrated with external metadata through a combination of the EME’s Metadata Importer and sophisticated metadata processing with the Co>Operating System.

Ab Initio’s support for combining metadata from multiple sources allows metadata from one source system to be enriched with metadata from other sources. For example, the Metadata Importer might load the core details of database tables and columns from a database catalog, then enrich the metadata with descriptions and logical links from a modeling tool, and finally link the imported metadata to data quality metrics. The Metadata Importer can load external metadata such as:

  • Reporting tools: MicroStrategy, Business Objects, Cognos, …
  • Modeling tools: ERwin, ERstudio, and Rational Architect, …
  • Database system catalogs for all major and most minor relational database management systems
  • Tabular metadata, usually stored in spreadsheets using either predefined templates or customer-specific layouts
  • Industry-standard protocols for metadata exchanges, including Common Warehouse Model XML Metadata Interchange Format (CWM XMI)

Non-standard and custom metadata sources can also be imported and integrated into the EME. Users can apply the Co>Operating System’s powerful data processing capabilities to arbitrarily complex sources of metadata. The Co>Operating System can extract metadata from these non-standard systems, process it as necessary, and load and integrate it with other metadata in the EME.

Many types of metadata

The EME integrates a very wide range of metadata and is fully extensible. The home page of the Metadata Portal allows the user to directly navigate the type of metadata of interest:

From this page you can select an area of interest and dive in to see:

Metadata about projects and applications. The EME stores and manages all information about Ab Initio projects and the applications they contain. Projects are organized in hierarchies and can be shared or kept private. The EME keeps track of which projects reference other projects, as well as tracking all objects within a project.

Details about application versions. The EME maintains complete version information and history about every detail of Ab Initio applications. Differences between versions of graphs, record formats, and transform rules are displayed graphically. Users can see details about the exact versions that are being used in production.

Users, groups, locks, and permissions. The EME provides access control management for all metadata. Furthermore, as part of a complete source code management system, the EME’s exclusive locking mechanism for whole applications or pieces of applications prevents developers from interfering with each other.

Hierarchical organization of metadata. Metadata can be organized into arbitrary hierarchies and folders to help capture business meaning and to provide targeted navigation.

Data dictionaries. The EME supports the creation of one or more data dictionaries or conceptual data models. Data dictionaries can be a simple hierarchical list of business terms, or a more complex semantic model with complex relationships between business terms.

Enterprise-wide deployments typically have multiple data dictionaries – one for each division or product area, as well as an enterprise model. In the EME, divisional business terms link directly to columns and fields, and have relationships back into the enterprise model. This allows companies to harmonize business concepts across the enterprise without forcing each division to abandon its own data dictionary.

Metadata from reporting tools. The EME imports metadata from all the major business intelligence (BI) reporting tools, including MicroStrategy, Business Objects, and Cognos. This includes details about reports and report fields, as well as internal reporting objects such as Facts, Metrics, Attributes, and Aggregates. Lineage queries can trace the calculations of various report fields back through the BI tools into the data mart or data warehouse, and from there all the way back to the ultimate sources.

Metadata from database systems. The EME imports metadata (schemas, tables, columns, views, keys, indices, and stored procedures) from many database systems. The EME performs lineage analysis through multiple levels of views and stored procedures. For large database systems, the EME is often the only way to understand the interrelationship of database tables, views, and procedures – especially for impact analysis queries, table reuse exercises, and consolidation projects.

Metadata from files. The EME imports metadata about files, including complex hierarchical record formats such as XML and COBOL copybooks.

End-to-end data lineage. The EME builds complete models of the flow of data through an enterprise by harvesting metadata from a large number of different operational systems, reporting tools, database systems, ETL products, SQL scripts, etc. This integrated model allows users to query the system about data lineage – how data was computed, and what is impacted by a change.

System diagrams. The EME stores graphical pictures that can represent system diagrams or other diagrams of metadata organization. In the Metadata Portal, clicking on a “hot-linked” graphical item within a diagram navigates the user to the connected metadata object.

Logical models. The EME imports logical and physical models from common modeling tools. It models links from logical models to physical models, which are then merged with the schema information in the actual databases.

Domains and reference data. The EME stores reference data, including domains and reference code values. It can be the primary manager for certain reference data, or can track and maintain a copy of reference data from a different system. It also supports code mappings between logical domain values and multiple physical encodings.

Data profiles. The EME stores data profile results and links them with datasets and individual fields. Many statistics are computed, such as common values and data distributions. These statistics can be computed on demand or automatically as part of an Ab Initio application.

Operational statistics. The Co>Operating System produces runtime statistics for every job and for every dataset that is read or written. These statistics can be stored in the EME for trend analysis, capacity planning, and general operational queries.

Data quality metrics. To support a comprehensive data quality program, Ab Initio computes data quality statistics and error aggregates and stores them in the EME. The EME can analyze and display data quality metrics for datasets and for collections of datasets. Data quality metrics can also be combined with data lineage to see a “heat map” showing where there are data quality problems in the enterprise.

Pre-development specifications. The EME can capture mapping specifications as part of the development process. The Metadata Portal allows analysts to specify existing or proposed sources and targets along with arbitrary mapping expressions. By using the EME for defining mappings, users can see how the mappings fit into a larger enterprise lineage picture.

These specifications can then be used to guide a development team and to permanently record requirements. After production deployment, the EME will continue to show these specifications in lineage diagrams alongside their actual implementations.

Data masking rules. The EME stores data masking rules, which can then be applied to data flowing through Ab Initio applications. Ab Initio provides many built-in rules, and users can define their own custom masking algorithms. These rules can be associated with fields or columns, or with business terms in the conceptual model. When linked at the conceptual level, data masking rules are automatically applied to the corresponding physical columns and fields.

Data stewards and metadata about people and groups. The EME stores metadata about people and groups. This metadata can be linked to other metadata objects to document data governance roles such as data stewardship. Metadata about people and groups can be automatically imported from external systems such as corporate LDAP servers.

Built-in and custom metadata reports. The EME provides many built-in reports. Users can also define custom reports that run against metadata stored in the EME and that are accessible from the Metadata Portal.

Custom metadata. Users can extend the EME schema to allow a wide variety of additional metadata to be integrated into the EME. Schema extensions include the addition of attributes to existing objects, as well the creation of new metadata objects that can be linked to other existing metadata. Users can easily customize the EME user interface to allow for tabular and graphical views on both standard and custom metadata.

The EME is an open system

The EME is an open system based on industry-standard technologies:

  • A published, extensible relational schema. The EME comes preconfigured with a rich metaschema that contains a wide variety of types of metadata. The metaschema can be customized and extended with custom tables and columns to support a variety of user-defined metadata. The EME manages these extensions and customizations in concert with the built-in metadata objects, and provides full customization of screens and reports.
  • A standard commercial relational database (currently Oracle, DB2, or Microsoft SQL Server), which holds all of the business metadata and summaries of the operational and technical metadata. Technical metadata is stored in an object data store accessible via ODBC.
  • A graphical user interface that can be hosted in any standard web browser. In addition, the EME supports navigation to external repositories of detailed metadata, such as document management systems, image databases, and 3rd party products.
  • A three-tier architecture using commonly available application server technology. On top of the database is a standard Java application server (currently WebSphere, WebLogic, JBoss, or Apache Tomcat) that manages security, calculates role-based views, and implements the workflows around metadata maintenance.
  • Support for external reporting tools. While the EME supports a wide range of built-in reports through the Metadata Portal, 3rd party reporting products can also directly access the metadata in the database for custom reports. The relational schema is fully documented and comes with preconfigured database views to support these reporting tools.
  • Web services APIs to enable service-oriented architecture and metadata as a service. These interfaces allow external systems to query business metadata as well as to submit metadata change requests. External systems can also subscribe to changes to the metadata, thereby enabling the EME to send messages when any approved change occurs. For example, if the EME is managing valid values, the approval workflow (described later) can send messages to external operational systems to update their cached lookups of these valid values.
  • Metadata exports. In addition to the data access interfaces, the EME can also export metadata in a number of ways. For example:
    • Virtually every EME tabular screen can be converted into an Excel spreadsheet with the click of a mouse.
    • The EME can export metadata in the emerging standard for metadata exchange, CWM XMI.
    • The EME can generate a Business Objects universe and populate it with metadata.

Metadata governance

The EME provides sophisticated governance processes that can be customized to meet the needs of large enterprises.

For technical metadata (applications and business rules), the EME supports a complete source code management system, with checkin/checkout, locking, versioning, branching, and differencing.

For business and operational metadata, the EME comes with a built-in metadata governance workflow, including work queues, approvals, and audit trails. The EME can also interface with external approval workflow tools. The EME’s proposal/approval workflow mechanism is based on changesets. Users create changesets to propose metadata additions, updates, and/or deletions, and then submit them for approval.

Below is a screen shot of the changeset submission process:

When a user submits a changeset for approval, the EME sends an email message to the appropriate metadata stewards. These stewards can inspect the proposed changes and approve or reject them. If approved, the changeset is applied and becomes visible to the general user population.

The EME also supports integration of changesets via its web services API, as well as with external workflow approval/BPM systems, such as Oracle’s AquaLogic. In this case the external workflow system is responsible for communicating items in work queues, documenting communications, managing escalations, and resolving final status.

All approved changesets result in new versions of metadata in the EME. The EME maintains a complete history of all previous versions and their details.

A final word

Enterprise metadata management was long a goal of large companies, but an unattainable, impractical one. Passive “repositories” (in many cases simply glorified data dictionaries) held only a fraction of the relevant metadata and soon became stale, out-of-date “islands” of metadata. The organizations that most needed a comprehensive approach to managing metadata – complex, global companies with inherent problems of scalability, with diverse metadata sources, with security issues that cross lines of business, and with huge amounts of information to display and navigate – were the least likely to succeed.

But Ab Initio's Enterprise Meta>Environment has finally made enterprise metadata management possible, even in the largest of companies. Some examples:

  • A major global bank is finally able to meet the requests of its regulator that its accounting numbers be verifiable. A full-scale data quality program across the enterprise, including quality measurements at various points of the data lineage, is being rolled out using the EME.
  • A major financial institution is saving tens of millions of dollars on the replacement of a key software system because the EME has enabled a complete understanding of how the legacy code worked, and empowered business and IT to collaborate in describing how the replacement system should function. Many person-years of planned effort were simply eliminated.
  • Several multinational enterprises with incredibly complex IT environments – operations in as many as 100 countries, thousands of disparate systems, and hundreds of thousands of files, database tables, and messages – are using the EME to inventory every piece of data and to define its meaning and value; they’re using the EME as an asset management system. These companies realize that data items are assets to be tracked, just like cars, buildings, and office equipment.

The Ab Initio EME didn’t happen overnight, and it didn’t come from an ivory tower: it’s the result of years of serious engagement with companies like these.

Language:
English
Français
Español
Deutsch
简体中文
日本語