Ab Initio addresses a wide spectrum of real-time applications, ranging from “mini-batch”, to high volume “asynchronous messaging”, to service-oriented applications (SOA), to low latency “request/response” applications, all with a single technology – the Co>Operating System’s Continuous>Flows facility.
The Co>Operating System is a “dataflow engine”. This engine flows streams of records, or transactions, through a sequence of “components”. These components each do a variety of computing on input records, applying business rules for example, to produce output records. Complex processing logic is decomposed into easily understood steps, with each step carried out by a different component. Records flow through the necessary set of components until the processing is complete.
This dataflow model is perfectly suited for both batch and real-time processing. While most of a batch and corresponding real-time application may be similar if not identical, the end-point components determine whether the application is batch or real-time. A batch application connects to flat files and static database table components. A real-time application connects to messaging queues, web services, RPC’s, CICS servers, and/or special purpose components (usually via TCP/sockets).
With Ab Initio, the fact that batch and real-time applications have so much in common, and that they both use a single technology – the Co>Operating System – results in significantly lower complexity and higher performance. Lower complexity translates to higher productivity, and higher performance lowers costs.
In most cases the business-logic components between the input and output components stay the same, meaning that the same business logic can be reused across batch and real-time applications. This has big productivity implications. With non-Ab Initio approaches, the technologies and methodologies for batch and real-time processing are usually very different, so that the same business logic is reimplemented multiple times for the range of batch to real-time uses. With Ab Initio, you develop the business logic just once and then plug it into Ab Initio’s batch and real-time infrastructures:
Application architects are often challenged by the need to meet seemingly conflicting performance requirements:
The Ab Initio Co>Operating System’s Continuous>Flows capability is a single technology that is effectively used by customers in all these modes. This is because the Co>Operating System was architected, from the beginning, to meet all the requirements of these different approaches.
There are two primary differences in Ab Initio between batch (including mini-batch) and real-time applications:
Termination: Batch applications terminate once they have finished processing all the data in their inputs. After termination, no system resources remain associated with a batch application.
Once started, real-time applications stay up indefinitely to receive and process transactions that arrive on their inputs. If there is a lull in the flow of new transactions, a real-time application waits until new transactions appear.
Checkpointing and recovery: Batch applications take checkpoints at predetermined locations in an application, and all data that passes through one of these locations is saved to disk (or the checkpoint has not been successfully taken). Recovery is just simply restarting an application at the last successful checkpoint.
Real-time applications can take checkpoints between transactions, as often as every transaction or infrequently based on other criteria (such as elapsed time or number of transactions). A restarted application automatically picks up at the last transaction that was successfully checkpointed.
Ab Initio’s Continuous>Flows provides interfaces to a wide range of real-time systems:
Ab Initio applications can easily implement web services in a service-oriented architecture (SOA). This is accomplished through an Ab Initio-provided servlet that is installed in a standard application server (WebSphere, WebLogic, IIS, or Apache/Tomcat) of the customer’s choosing. The servlet provides a registration mechanism for associating particular web service calls with specific Ab Initio applications. When a web service call comes in, the servlet communicates with the Ab Initio application via TCP (the Ab Initio application runs in its own set of processes and is outside the application server) and returns to the original requestor the results returned by the Ab Initio application.
The Co>Operating System also provides full support for parsing messages defined via WSDL.
Commercial queuing products and web services are relatively new to the industry, and their performance rates are modest. Customers with large messaging volumes, or whose needs pre-date the existence of commercial queuing products, have built their own in-house high-performance messaging solutions.
Continuous>Flows supports robust interfacing to these legacy solutions through special processing components (“Universal Subscribe” and “Universal Publish”). These components call custom C++ subroutines that interface with the legacy messaging system. The components also handle rollback and recovery in the event of failures.
Ab Initio, in concert with special-purpose messaging systems, can achieve extremely high performance rates – sustained rates of over 500,000 messages per second in mission-critical applications have been measured.
Furthermore, the Universal Subscribe and Universal Publish components are used in just the same way as Continuous>Flows components for 3rd party queuing products. This provides users with the option of switching from their in-house queuing solution to a 3rd party queuing product with minimal changes to their Ab Initio applications.
The Co>Operating System checkpointing facility provides robust handling of application failure. A checkpoint allows an application to commit changes to multiple databases and input and output systems (queues). In the event of an application failure, the Co>Operating System does a “rollback” of the environment back to the last successful checkpoint. Once the underlying problem has been cleared up (database refuses to load, out of disk space, network failure, …), the application can be restarted, and it will automatically pick back up after the last successfully processed transaction.
Most checkpointing schemes are notoriously expensive from a computational perspective, and developers’ efforts to try to minimize that cost often result in complex, unstable applications. The Co>Operating System was architected to both have extremely high performance and to be robust. The Co>Operating System provides a number of checkpointing alternatives that trade off transaction latency against recovery time. In all cases, the Co>Operating System guarantees that all transactions will ultimately be written once and only once to all target devices (databases, files, and queues).
The Co>Operating System provides two basic mechanisms for checkpointing. The best known is the XA standard protocol for 2-phase-commit. The Co>Operating System includes a transaction manager that will coordinate commits across disparate databases and queuing products, and can even batch transactions into a single commit to increase throughput.
However, XA has its limitations: it has high computational overhead; it is complex to administer; and it is not supported by all desired input and output devices. As a consequence, most Ab Initio users depend on the Co>Operating System’s native checkpointing mechanism, which is very much like 2-phase-commit. This mechanism works with all input and output devices (databases, files, and queues) to which Ab Initio connects, works across heterogeneous and distributed servers, and is very performant and extremely robust. Ab Initio has even built into its connector components ways of compensating for limitations of certain 3rd party queuing products – for example, in certain products a queue manager crash can result in transactions being delivered more than once.
With the Co>Operating System’s native checkpointing system, developers can control the frequency of checkpoints in a number of ways, such as the number of transactions and elapsed time, or as a result of an event such as a special token in the message stream. Furthermore, they can control the degree of transactional consistency at checkpoint time across multiple output devices. Default settings yield very high performance; transactions are never lost; and the correctness of the application is never sacrificed.
The Co>Operating System seriously addresses operational requirements for mission-critical, real-time applications. Some example mechanisms include:
The Continuous>Flows approach to real-time processing brings together the productivity increases of graphical dataflow implementation, a truly general model for connecting to data streams, and robust mechanisms for reliably checkpointing work and coordinating transactions.