Oswald Regular
OpenSans Regular

As the number and complexity of IT applications within a business grows, the importance of operational management grows as well. And the expectations of the business for timely and reliable results grow even more. But, as any operations manager will tell you, getting those timely and reliable results is easier said than done when there are thousands and thousands of moving parts across multiple inter-dependent applications that may span several servers and geographic locations.

To be successful, the operational team needs to:

  • Understand, articulate, and enforce all the key dependencies within an application and across applications. For example, the team needs to be able to say that B can only run when A has completed, and that C should run if A fails.
  • Manage all the actions that can trigger a part of the process. A trigger could be a specific time on a particular day, the arrival of one or more files, the availability of one or more resources – or, perhaps, a combination of all of these.
  • Proactively monitor all the dependencies so that alerts can be automatically raised and sent to the appropriate people. Alerts should be triggered if a dependency or event has not been satisfied within stated times, allowing business service level agreements (SLAs) to be tracked and reported against.
  • Monitor the low-level processing characteristics of key parts of an application, such as the number of records being rejected, the latency of messages being processed, or the amount of CPU time being consumed by the processing logic. Again, alerts should be raised when thresholds are exceeded.
  • Develop and test the end-to-end operational process in a dedicated test environment before formally promoting the process to the production environment.
  • Record and analyze detailed operational statistics, such as actual start/end times for each part of an application over time, so that processing trends can be identified to support capacity-planning activities.

Ab Initio’s Conduct>It provides all these capabilities, and more.

Conduct>It is a process automation facility that provides the monitoring and execution environment for deploying complex applications in complex environments. It facilitates the definition of arbitrary, hierarchical job steps for large, multistage applications, as well as the dependencies, sequencing, and scheduling of these job steps. These applications can be composed of Ab Initio graphs and job definitions, as well as custom executables and third-party products, all of which are managed by Conduct>It.

Conduct>It has two main elements. The first is a process automation server, called the Operational Console, that provides monitoring and job control in complex processing environments. Second, where sophisticated process management logic is required, Conduct>It offers the ability to graphically develop and execute complex control flow logic.

Let’s look at the Operational Console first.

Operational Console

The Operational Console provides a wide range of capabilities that are essential to daily operations. These include job scheduling, monitoring, alerting – and performing job-level actions such as starting, stopping, and rerunning jobs. Across all applications, the Operational Console collects, integrates, and manages the associated operational metadata, helping the operational team and business analysts in planning and maintaining efficient operations.

It all starts with the Home page on the Operational Console’s browser-based interface, as shown below. The Home page provides a summary of all of today’s jobs by application, system, or host server, showing whether they are running (green), completed (blue), failed (red), or waiting (yellow). It also lists all the issues or warnings that have been raised and are still outstanding.

From the Home page you may drill down to see different types of information for any job in the environment – the reason for the failure, what a job is waiting on, when a job completed, or when it is expected to complete etc. For example, you may wish to see all the jobs related to a specific application to understand how it is progressing:

This monitoring screen shot shows the dependencies between the different tasks within the selected application and the progress of each task. At any stage you can drill down further to see the tracking details of one of the tasks:

As shown above, low-level tracking information on CPU seconds consumed, as well as record and data volumes processed, are available for every component within a specific run of an Ab Initio job. Alternatively, you may wish to understand how one of your jobs has performed over time, together with a trend line for planning purposes:

The Operational Console collects a wide range of statistics for each task, from the ability to meet the stated SLAs through to the amount of user and system CPU time being consumed.

However, it doesn’t stop there. Using the full power of the Ab Initio Data Manipulation Language (DML), the operations team can also define their own operational probes – called “custom metrics” – for alerting and tracking purposes. You can add probes without changing or disturbing an application, and they have no impact on the application’s execution. These metrics can be computed using any combination of tracking information for any flow or component within a graph. Hence, it is easy to add a custom metric that reports and alerts on the number of records processed by a specific component, or the amount of CPU time it is consuming, or the latency of the real-time messages being processed.

All of the Operational Console’s monitoring capabilities are available for Ab Initio jobs, whether they have been initiated by the Operational Console or a by a 3rd party scheduler.

For customers who don’t have access to a corporate scheduler, the Operational Console also provides full day/time, event, and file-based scheduling capabilities, allowing sophisticated applications to be fully scheduled without the need to write and maintain traditional scripts. The following screen shot shows the same application shown earlier, but with all the time- and event-based dependencies shown against the expanded tasks:

Because dependencies between tasks can become extremely complex in large applications, Conduct>It also provides a fully graphical environment to help developers define advanced job control flow.

A control flow is a way of expressing detailed logic about the sequence of execution. It uses a collection of connected tasks, called a plan, to describe what should be run – the connection between these tasks specifies an execution dependency (run this before that, for example):

The above plan shows that Task 2 can execute only after Task 1 has completed. It also shows that a condition is subsequently evaluated (“Should the intra day process run?”), and if found to be “No”, then the end-of-day position is calculated for each office by iterating over the highlighted “subplan”. A subplan, as the name suggests, is itself a collection of tasks and dependencies.

Custom tasks can also be triggered on the failure of other tasks – this is illustrated in the above plan by the “Error Actions” task, which is run if Task 2 fails for any reason. In a similar manner, each task can have “methods” associated with it that are executed when certain events occur, such as “At Start”, “At Success”, “At Shutdown”, “At Failure”, or “At Trigger”, allowing reporting and logging capabilities to be easily added to the end-to-end process.

With plans, Conduct>It provides a development-time framework for deconstructing a complex application into manageable units of work that constitute a single, recoverable system. These plans are then available to be scheduled, monitored, and managed using the Operational Console. The result is a sophisticated end-to-end operational environment.