The heart of the Co>Operating System is a “dataflow engine.” This engine drives a large library of data processing “components” that manipulate the data flowing through an application. Applications are designed, implemented, and maintained graphically through Ab Initio’s Graphical Development Environment™ (GDE®).
The core principle of the Co>Operating System is that applications are designed and developed in the way most people would design a system on a whiteboard (or even on a napkin). Easily recognizable icons are used to represent the input and output sources, which are then combined with processing boxes and arrows to define the overall processing flow. By selecting the appropriate components from an extensive library and “wiring them up,” you create an Ab Initio® application.
Ab Initio seamlessly integrates the design and execution of applications: the drawing is the application. And the resulting application can be batch, near real-time, or real-time in nature, or even a combination of all of these – all united into one consistent and powerful computing environment.
The graphical dataflow approach means that Ab Initio can be used to build the vast majority of business applications – from operational systems, distributed application integration, and complex event processing to data warehousing and data quality management systems. But graphical design and development addresses just one part of the challenge. These applications must also meet significant operational and management requirements.
Historically, graphical programming technologies yielded pretty pictures that fell apart when it came to addressing these real-world requirements. The Co>Operating System is a genuinely different beast – it actually works. Here are some sample deployments and their underlying requirements:
You might have noticed that all these examples say “one of the world’s largest.” Why have all these major corporations chosen Ab Initio? Because not only is Ab Initio software intuitive and easy to use, it also stands up to the most complex application logic and to huge amounts of data. And it does this with high performance and remarkable robustness. This combination is unique.
These are just some examples of specific deployments at these customers, and these deployments tend to be very broad. Meeting the requirements of all these applications requires many capabilities, including:
The only way to meet these requirements is to have an architecture designed from the beginning to meet them all – at the same time. Once an architecture has been set, it is practically impossible to add fundamental capabilities. The Co>Operating System was designed from the beginning to have all these capabilities. It is a robust, proven technology that is successfully used for a very wide range of complex data processing applications.
The core of an Ab Initio application is a dataflow graph, or “graph” for short. A graph consists of components that are connected together through data “flows".
For example, the following graph is a simple application that reads each record from a flat file containing customer transactions, and then reformats the data (by applying rules) before sorting it and then rolling up (or aggregating) the data. The results of this processing are then written out to a table in a database, such as Oracle.
Through the use of highly configurable components, the Ab Initio Co>Operating System provides all the fundamental building blocks for business data processing, including:
In addition, the Co>Operating System comes with a large library of built-in components for dealing with virtually every type of data source or target. This library includes components for:
Small applications may have 3 to 5 components. Large applications may have a hundred components. Very large applications may have many thousands of components. Applications can also consist of many graphs, each of which may have many components.
Throughout the spectrum – from the smallest applications to the largest – Ab Initio applications exploit reusable rules and components, enabling rapid response to changing business needs.
Below the component level, the Co>Operating System can apply practically any type of user-specified rules and logic to any type of data. And, those rules are specified in a graphical manner. This is a significant departure from other technologies. To the extent that these technologies provide graphical specification of rules, they apply only to simple rules. When rules get complex, the user quickly runs into walls that cannot be overcome. In the end, the user frequently has to leave the technology for anything complex and instead use traditional programming methods (Java, C++, scripting, stored procedures, Perl, ...).
Not so with the Co>Operating System. Once users have the Co>Operating System, they will find that it is easier to specify complex logic from within the Co>Operating System than from without. This has tremendous implications for productivity, ease of use, and transparency – it is easier and faster to specify rules graphically, and it is easier for business people to understand them.
In the Co>Operating System, you specify rules using Ab Initio’s Data Manipulation Language (DML), and these rules go inside “transformation” components, which are the basic building blocks for processing data. There are transformation components for mapping one kind of record structure to another, for aggregating data, for filtering data, for normalizing and denormalizing structures, for joining multiple record streams, and more. Each of these components can apply rules specified in DML to the records as they are processed.
Below is a screen shot of a simple rule that computes the output of a mapping component. On the left are the fields coming into the component, on the right are the fields coming out, and in the middle are the rules.
Individual rules can be built with the “Expression Editor”, shown below. Expressions can be arbitrarily large and complex, and DML has a large library of built-in operators and functions.
Ab Initio also supports an alternate set of user interfaces for the specification of business rules. These interfaces are aimed at less technical users, often business analysts and subject matter experts. They use business terms rather than technical terms, and organize rules in a familiar spreadsheet-like manner:
Learn more about the Business Rules Environment.
With the Co>Operating System, the data is what it or the user wants it to be – the Co>Operating System does not force data to be translated into a limited set of built-in formats that it understands. Instead, the Co>Operating System processes data in its native format and does this identically on all supported operating systems. So, the Co>Operating System, and all of Ab Initio’s components, can natively process mainframe data on Unix and Windows servers, XML on mainframes, complex hierarchical and bit-packed structures everywhere, international data with automatic transcoding, and so on. Given the same data and the same application, the Co>Operating System will produce the same results, regardless of where the application is running.
Applications can read and write from a heterogeneous set of systems, and the data can have very different formats at different points in the application. The example below shows an application that reads mainframe data from a flat file, reads XML data from an MQ queue, processes that data with different intermediate formats, and outputs the results to a flat file using an international codeset.
The Co>Operating System knows how to get data formats from wherever they are stored: database catalogs, schema definition products, Cobol copybooks, XML DTDs or XSDs, WSDL, spreadsheets, or in-house data format systems.
Here is what the record format for the intermediate flow marked “ASCII hierarchical” in the example above might look like:
Finally, the Co>Operating System and its components know how to automatically translate formats as necessary. For example, if EBCDIC and packed decimal data is presented to a component writing to an Oracle database, the component will automatically translate the data to ASCII and decimals if the columns in the database have those formats.
The Co>Operating System was designed from the ground up to achieve maximum performance and scalability. Every aspect of the Co>Operating System has been optimized to get maximum performance from your hardware. And you don't need “cloud” technology because the Co>Operating System naturally distributes processing across farms of servers.
The Co>Operating System is typically at least 4 to 5 times faster than the next fastest technology. This includes programs hand-coded in Java and C++: it is the rare programmer who can write a program that runs as fast as the Ab Initio version! What’s more, this special programmer will be the only one in an organization who can do this with Java or C++, and he or she will spend weeks coding something that can be accomplished in just days with Ab Initio. Usually, such talented programmers don't do mere programming for long; they are tapped to do design, architecture, and even project management.
How does Ab Initio achieve both scalability and performance? What is Ab Initio’s “secret sauce"? There are many ingredients, but the most important ones are architecture and fanatical attention to all details. The Co>Operating System’s architecture was designed from “first principles” to enable scalable computing.
The Co>Operating System’s architecture is known as “shared-nothing.” Because the Co>Operating System does not require the CPUs to share anything, they can run completely independently from each other. This allows a single application to span as many CPUs on as many servers as desired. The Co>Operating System provides facilities for distributing workload evenly across many CPUs, so most applications achieve linear scalability. This means that doubling the number of CPUs leads to a doubling of performance. Ten times more CPUs means ten times more performance. The Co>Operating System combines data parallelism and pipeline parallelism to create the maximum number of opportunities to execute pieces of an application concurrently.
The simple example below shows how an application might partition data so that the Score component can run in parallel across many CPUs (and servers). The Partition by Round-robin component splits the customer data into equal streams of data, like someone dealing out a pack of cards. The Co>Operating System then runs multiple instances of the Score component, each instance working on one stream of data. As each record is output from each instance of the Score program, the Interleave component merges the streams back together before writing the result to the output file. It’s as simple as that.
Shared-nothing architectures work wonderfully as long as there are no bottlenecks. But a single bottleneck will ruin the whole thing. This is where attention to detail matters. Any part of the system that might cause a serial bottleneck must be carefully designed and implemented to never get in the way. All algorithms in all components must be optimal under many different scenarios. All communication and data transport between partitions and components must use the most efficient channels. Only when all these details, and many more, are attended to, will the system scale.
Another critical detail is the execution of business rules. If the system is designed properly, this is where a majority of the CPU time should be spent. The Co>Operating System’s transformation engine is what runs the business rules. This engine has a special design called a “just-in-time compiler". This means that the business rules are compiled by the Co>Operating System at the last possible moment, when all the information about what they are supposed to do is finally available. The “just-in-time” aspect yields tremendous flexibility, and the “compiler” is highly optimized to run traditional business logic with maximum performance. This is why it is hard for a programmer using traditional technologies to compete with the Co>Operating System.
The Co>Operating System is a distributed peer-to-peer processing system. It must be installed on all the servers that will be part of running an application. Each of these servers may be running a different operating system (Unix, Linux and zLinux, Windows, or z/OS).
Here’s how the Co>Operating System manages a distributed set of processes across a network of servers:
1. The Co>Operating System is started on the "Master" Server which reads in application definitions, data formats, logic rules, parameters, ...
2. The Co>Operating System starts "agents" on other servers.
3. The master process informs each agent of components to be executed on that server and provides everything the components need to know to do their work.
4. Agents start components and tell them the rules, record formats, parameters, etc.
5. Components connect the data flows and begin processing data. Agents monitor the processing.
As you can see in these diagrams, a single Ab Initio application can run inside a single server or across a network of servers. The definition of the application, the graphical diagram specified by the developer, is the same in both cases. An application will run across a different set of servers just by changing the specification of which component runs where – there are no changes to the application itself. An application can therefore be rehosted from one platform to another, or from a single server to a farm of servers, for example, with no changes.
The Co>Operating System runs equally well on Unix, Windows, Linux and zLinux, and z/OS. Furthermore, a single application can run on any mix of these platforms. Every component in an Ab Initio application can be run on any platform on which the Co>Operating System has been installed (with the exception of a few platform-specific components, such as VSAM access on mainframes). The Co>Operating System takes care of the complexity of moving data between the machines, thereby providing a seamless middleware and processing capability. Furthermore, the assignment of components to target machines can be changed before every single run of an application.
Any component or group of components can be assigned to a different computer resource:
The needs for building and operating batch and real-time applications are very different, and their technologies have been different as well. Not so with the Co>Operating System. The Co>Operating System has a single architecture that applies equally well to batch, real-time, and web services (SOA) systems. Instead of requiring you to have a multitude of technologies and different development teams for each system, with potentially multiple implementations of the same business logic, the Co>Operating System, with Continuous>Flows, lets you use one technology, one development team, and one encoding of business logic that can be reused across different systems.
With the Co>Operating System, whether an application is batch or real-time depends on what the application reads and writes. Here is the same application, shown first as a batch system (hooked up to flat files for the input and output), then as a real-time queuing system (MQ for the input and JMS for the output), and finally as a web service (getting service requests from an outside system and returning results to that same system):
3. Web Services.
Learn more about Continuous>Flows.
While Ab Initio enables you to build end-to-end applications completely with the Graphical Development Environment, and run those applications completely within the Co>Operating System, users often have existing applications or 3rd party products that run fine and that are not worth reimplementing. Ab Initio makes it easy to reuse those existing applications, whether they were coded in C, C++, Cobol, Java, shell scripts, or whatever. In fact, the Co>Operating System makes it possible to integrate those applications into environments they were not originally designed for.
Legacy codes are integrated into Ab Initio applications by turning them into components that behave just like all other Ab Initio components. For most legacy codes, this is easy – all that is required is a specification for the program’s inputs and outputs, along with command-line parameters. The example below shows how you can take a Cobol program that can read and write flat files and plug it into an Ab Initio application. The Cobol code, along with its Copybooks and JCL, is turned into an Ab Initio component; the component is placed in an Ab Initio application that spans Unix servers and a mainframe; and the application connects to various data sources and targets (SAS and a database). Finally, for performance, the workload is partitioned so that the Cobol code can have multiple instances running in parallel.
One of the hardest parts of building and maintaining real systems is dealing with unexpected or bad data. It happens all the time, and when fed such data, most applications behave badly – if you're lucky, they just crash. If you're unlucky, these apps may do something strange with the bad data, and it’s possible that nobody will find out until the bad results have gone to downstream systems and contaminated them. Cleaning up that kind of mess takes time and really eats into productivity.
The Co>Operating System is inherently resilient to bad data and constantly checks the data to verify that it meets a variety of user- and system-specified criteria. If the data does not conform to those criteria, it is not allowed to move forward without some kind of intervention. And it is easy for the developer to build into the application automated intervention and reporting. The intervention can be designed to correct erroneous data and feed it back into the system, or it can be designed to segregate all related data entities so that they can be processed later in a consistent manner.
Below is an example of how a developer builds an application in Ab Initio that 1 captures problem data, 2 runs the problem data through a “cleanup” set of rules, 3 merges the cleansed data back into the original process, 4 sequesters data that cannot be cleansed, and finally 5 generates a report about the problem data that is 6 sent via an MQ queue.
All Ab Initio components that process data in interesting ways have optional “Reject”, “Error”, and “Log” ports. The Reject port gets all problem data; the Error port gets a corresponding record that describes each problem; and the Log port gets debugging information. This makes it possible to build extremely robust data processing pipelines.
Servers fail. Networks go down. Databases refuse to load. The more servers participating in an application, the higher the probability that there will be a system failure of some type. It is hard to build applications that can reliably recover from such failures. While many developers think they can, in practice you don't discover whether the architecture is robust until it has survived a variety of failures. There can be much pain while climbing this learning curve.
The Co>Operating System, on the other hand, was designed from the beginning to reliably recover from all these types of failures. As long as the environment has been configured so that crucial data is not lost, the Co>Operating System’s checkpoint/restart facility will be able to restart an application where it stopped before the failure, even if the application spans networks, multiple servers, and multiple databases, and regardless of whether the application runs in batch or real-time. The Co>Operating System uses a two-phase-commit-like mechanism that has much higher performance than the industry-standard XA protocol. For environments that require the XA protocol, the Co>Operating System supports this as well, even across multiple databases and queuing technologies from different vendors.
Because of the nature of the problems they are solving, Ab Initio applications can become very large and complex. However, most systems include many pieces that are very similar to other pieces, or for which there may be many variations on a theme. With traditional programming technologies, developers frequently end up making many copies of these pieces and then making minor modifications to each copy. The result is a maintenance and productivity nightmare.
Ab Initio provides a number of mechanisms to radically increase reuse of these application pieces. Application pieces of all types can be stored in the Ab Initio Enterprise Meta>Environment (EME®), a centralized repository, and reused both within and across applications. Here are examples of what you can centrally store, manage, and reuse:
Ab Initio’s reuse capability is very powerful. Reused application pieces can link back to and track changes to the centralized version from which they came. Furthermore, these pieces can support locally applied customizations while still tracking the centralized version.
Below is an example of just one of these reusable application modules: the subgraph. A subgraph can contain the components of a subsection of an application (graph). There is no limit to the size or complexity of a subgraph, and subgraphs can be nested. Subgraphs behave in all the ways that normal components do. And they can be saved in a library for reuse across many applications.
This subgraph corresponds to the error-handling portion of the application described earlier:
Notice that the components in this subgraph do not explicitly connect to any input or output components. Instead, they connect to the input and output ports of the “Standard Error Handling Process” subgraph.
Here is the same application as before, now built with the new, reusable error-handling subgraph (subgraphs are visually identified in the GDE by the extra border line surrounding them):
The Co>Operating System is the base software for the entire Ab Initio architecture. All the key Ab Initio architectural concepts manifest themselves in the Co>Operating System. And because all of Ab Initio’s technologies incorporate the Co>Operating System in one way or another, they incorporate those concepts in a consistent and seamless manner.
The results of this architecture are that:
These things, and more, are possible because the Co>Operating System was designed from the beginning with a single architecture to achieve these results.