Oswald Regular
OpenSans Regular

Real-time business data processing is truly challenging, and few software products actually tackle the requirements. Ab Initio does.

Ab Initio addresses a wide spectrum of real-time applications, ranging from “mini-batch”, to high volume “asynchronous messaging”, to service-oriented applications (SOA), to low latency “request/response” applications, all with a single technology – the Co>Operating System’s Continuous>Flows facility.

The Co>Operating System is a “dataflow engine”. This engine flows streams of records, or transactions, through a sequence of “components”. These components each do a variety of computing on input records, applying business rules for example, to produce output records. Complex processing logic is decomposed into easily understood steps, with each step carried out by a different component. Records flow through the necessary set of components until the processing is complete.

This dataflow model is perfectly suited for both batch and real-time processing. While most of a batch and corresponding real-time application may be similar if not identical, the end-point components determine whether the application is batch or real-time. A batch application connects to flat files and static database table components. A real-time application connects to messaging queues, web services, RPC’s, CICS servers, and/or special purpose components (usually via TCP/sockets).

With Ab Initio, the fact that batch and real-time applications have so much in common, and that they both use a single technology – the Co>Operating System – results in significantly lower complexity and higher performance. Lower complexity translates to higher productivity, and higher performance lowers costs.

Reusability of business logic across batch and real-time applications

In most cases the business-logic components between the input and output components stay the same, meaning that the same business logic can be reused across batch and real-time applications. This has big productivity implications. With non-Ab Initio approaches, the technologies and methodologies for batch and real-time processing are usually very different, so that the same business logic is reimplemented multiple times for the range of batch to real-time uses. With Ab Initio, you develop the business logic just once and then plug it into Ab Initio’s batch and real-time infrastructures:

Achieving performance for different real-time execution models

Application architects are often challenged by the need to meet seemingly conflicting performance requirements:

  • Batch applications need to process data as efficiently as possible. A batch job may take a long time to run because there are so many transactions to process, and none of the results are available until the job has completed. But while it is running, a batch job is expected to process a very high number of records per second.
  • “Mini-batch” applications are collections of batch jobs that individually process small volumes of data. However, there may be thousands or even tens of thousands of such small jobs that run each day. By limiting the amount of data processed by a job, the response time for each job is minimized. This approach also allows the same applications to process very large data volumes efficiently in a traditional batch setting. (Managing tens of thousands of jobs a day presents its own set of complexities, which are addressed by Ab Initio’s Conduct>It.)
  • Asynchronous messaging applications connect to message queues and also need to process transactions as efficiently as possible. However, the downstream systems usually expect their response messages within a few seconds to tens of seconds. Indeed, if an asynchronous application can respond within a second or two, it can support interactive use.
  • “Request/response” or synchronous messaging applications are expected to process a transaction as soon as it shows up and to respond as quickly as possible, usually with a latency of less than a second. If multiple such applications work together to process a transaction, individual applications may need to turn around responses in tenths to hundredths of a second. Ab Initio directly addresses this “sweet spot” of reliably performing meaningful units of work in the tens of milliseconds range (in contrast to the extremes that some narrow, specialized systems go to).

The Ab Initio Co>Operating System’s Continuous>Flows capability is a single technology that is effectively used by customers in all these modes. This is because the Co>Operating System was architected, from the beginning, to meet all the requirements of these different approaches.

There are two primary differences in Ab Initio between batch (including mini-batch) and real-time applications:

  • Termination: Batch applications terminate once they have finished processing all the data in their inputs. After termination, no system resources remain associated with a batch application.

    Once started, real-time applications stay up indefinitely to receive and process transactions that arrive on their inputs. If there is a lull in the flow of new transactions, a real-time application waits until new transactions appear.

  • Checkpointing and recovery: Batch applications take checkpoints at predetermined locations in an application, and all data that passes through one of these locations is saved to disk (or the checkpoint has not been successfully taken). Recovery is just simply restarting an application at the last successful checkpoint.

    Real-time applications can take checkpoints between transactions, as often as every transaction or infrequently based on other criteria (such as elapsed time or number of transactions). A restarted application automatically picks up at the last transaction that was successfully checkpointed.

Interfacing with a wide range of real-time systems

Ab Initio’s Continuous>Flows provides interfaces to a wide range of real-time systems:

  • 3rd party queuing systems: IBM MQ, JMS, TIBCO Rendezvous, and Microsoft MQ. Ab Initio provides components for directly connecting to all of these queuing systems
  • Web services: WebSphere, WebLogic, IIS, and Apache/Tomcat
  • Ab Initio queuing and RPC for low latency and high volume point-to-point connections between Ab Initio applications
  • Legacy / in-house messaging software

Native support for web services and SOA

Ab Initio applications can easily implement web services in a service-oriented architecture (SOA). This is accomplished through an Ab Initio-provided servlet that is installed in a standard application server (WebSphere, WebLogic, IIS, or Apache/Tomcat) of the customer’s choosing. The servlet provides a registration mechanism for associating particular web service calls with specific Ab Initio applications. When a web service call comes in, the servlet communicates with the Ab Initio application via TCP (the Ab Initio application runs in its own set of processes and is outside the application server) and returns to the original requestor the results returned by the Ab Initio application.

The Co>Operating System also provides full support for parsing messages defined via WSDL.

Interfacing with special purpose and legacy messaging systems

Commercial queuing products and web services are relatively new to the industry, and their performance rates are modest. Customers with large messaging volumes, or whose needs pre-date the existence of commercial queuing products, have built their own in-house high-performance messaging solutions.

Continuous>Flows supports robust interfacing to these legacy solutions through special processing components (“Universal Subscribe” and “Universal Publish”). These components call custom C++ subroutines that interface with the legacy messaging system. The components also handle rollback and recovery in the event of failures.

Ab Initio, in concert with special-purpose messaging systems, can achieve extremely high performance rates – sustained rates of over 500,000 messages per second in mission-critical applications have been measured.

Furthermore, the Universal Subscribe and Universal Publish components are used in just the same way as Continuous>Flows components for 3rd party queuing products. This provides users with the option of switching from their in-house queuing solution to a 3rd party queuing product with minimal changes to their Ab Initio applications.

Robustness in the event of failures

The Co>Operating System checkpointing facility provides robust handling of application failure. A checkpoint allows an application to commit changes to multiple databases and input and output systems (queues). In the event of an application failure, the Co>Operating System does a “rollback” of the environment back to the last successful checkpoint. Once the underlying problem has been cleared up (database refuses to load, out of disk space, network failure, …), the application can be restarted, and it will automatically pick back up after the last successfully processed transaction.

Most checkpointing schemes are notoriously expensive from a computational perspective, and developers’ efforts to try to minimize that cost often result in complex, unstable applications. The Co>Operating System was architected to both have extremely high performance and to be robust. The Co>Operating System provides a number of checkpointing alternatives that trade off transaction latency against recovery time. In all cases, the Co>Operating System guarantees that all transactions will ultimately be written once and only once to all target devices (databases, files, and queues).

The Co>Operating System provides two basic mechanisms for checkpointing. The best known is the XA standard protocol for 2-phase-commit. The Co>Operating System includes a transaction manager that will coordinate commits across disparate databases and queuing products, and can even batch transactions into a single commit to increase throughput.

However, XA has its limitations: it has high computational overhead; it is complex to administer; and it is not supported by all desired input and output devices. As a consequence, most Ab Initio users depend on the Co>Operating System’s native checkpointing mechanism, which is very much like 2-phase-commit. This mechanism works with all input and output devices (databases, files, and queues) to which Ab Initio connects, works across heterogeneous and distributed servers, and is very performant and extremely robust. Ab Initio has even built into its connector components ways of compensating for limitations of certain 3rd party queuing products – for example, in certain products a queue manager crash can result in transactions being delivered more than once.

With the Co>Operating System’s native checkpointing system, developers can control the frequency of checkpoints in a number of ways, such as the number of transactions and elapsed time, or as a result of an event such as a special token in the message stream. Furthermore, they can control the degree of transactional consistency at checkpoint time across multiple output devices. Default settings yield very high performance; transactions are never lost; and the correctness of the application is never sacrificed.

Operational robustness

The Co>Operating System seriously addresses operational requirements for mission-critical, real-time applications. Some example mechanisms include:

  • Running multiple instances of a real-time application simultaneously for load-balancing and failover
  • Bringing down pieces of an application system so that updated modules can be initiated without suspending a 7×24 nonstop system
  • Pooling connections to databases to limit resource usage
  • “Folding” multiple components into a single process to lower CPU overhead and memory footprint
  • Using “micrographs” – dynamically loadable graph logic – to dramatically reduce the use of operating system resources


The Continuous>Flows approach to real-time processing brings together the productivity increases of graphical dataflow implementation, a truly general model for connecting to data streams, and robust mechanisms for reliably checkpointing work and coordinating transactions.