Triggering diagnostic data collection and healing actions via Heal’s Action API
Enterprises always must be ready to expect the unexpected. The world today is poised at the cusp of a significant change in the way people conduct business – with more people working remotely, availing of broadband and cable services, viewing streaming platforms and carrying out internet banking and online shopping, among other things. It becomes imperative for enterprises to be available 24×7, minimize unavailability of critical services and avoid any outages due to high loads in their data centres.
In our previous blogs, we spoke about how Heal helps you move from “Break and Fix” to “Predict and Prevent” via its patented technique of learning workload-behaviour correlations. Although Self-healing platforms provide you with all the necessary information and context to perform an accurate root cause analysis and minimize MTTR (Mean Time to Resolve), their true focus is not on fixing outages; it is on preventing incidents altogether.
In this blog, we talk about an integral cog in the Heal machine that helps your enterprise self-heal – the Heal Action API.
Heal Data Architecture Recap
As detailed in our previous blog on Heal Data Architecture, our MLE (Machine Learning Engine) triggers anomalies which are processed by an Action Trigger to issue notifications and integrate with ITSM tools to execute orchestration workflows.
Fig 1: The flows in bold black indicate the anomalies generated by MLE being send to the Action Trigger to initiate notifications/ITSM workflows or a pre-configured forensic/healing action via the Agent Controller
In addition, the Action Trigger performs two all-important functions that give Heal the edge over other AIOps tools – that of sending a trigger to an Agent Controller which (a) facilitates the collection of diagnostic data i.e. Just-in-Time Forensics, which aid in Proactive Healing of early warning signals and intelligent rectification of problems, and (b) execution of healing actions on target servers to bring anomalous system parameters back to normal.
Fig 2: The Agent Controller triggers Forensic and Healing actions on the target servers
The configuration of these orchestration workflows, forensic triggers as well as healing actions is done via the Action API. In this blog, we dive into the 3 types of healing that our product can perform, and how the Action API assists the proactive and autonomous healing capabilities.
The Three Types of Healing
Heal offers 3 types of healing actions – autonomous, proactive and projected. As this graphic illustrates, each is used in a specific scenario:
In this blog, we discuss healing scenarios pertinent to the Action API:
- Autonomous: The Action API can help you configure actions to automatically raise tickets and orchestrate remedial workflows, the most common of which could include shaping of workloads to reduce resource contention. In Cloud environments it could also include triggering scripts to dynamically change infrastructure and deployment configurations.
- Proactive: The Action API lets you configure forensic actions corresponding to anomalies, which collect relevant diagnostic data to allow the user to initiate healing with all required information and context at hand.
Overview of the Action API
There are 4 main calls in the Action API:
Here is a typical flow of how these APIs are invoked when a signal is raised by MLE:
Whenever a new signal is generated in the system, the Signal_Created event will create a ticket or incident in the ITSM. When the information collected by the Forensics script would be required as input or drive your healing script, you can choose to integrate your code with Anomaly_Forensic Event. This will display the Forensics output on the Signal screen. When you want the Heal event to be executed by exposing the Forensic output on the Signals screen, you can choose to integrate your code with the Anomaly_Heal Event. In the absence of Forensics and a pre-configured healing action, a manual RCA will have to be performed to ascertain and fix the root cause of a signal.
Usage and Invocation of the Action APIs
The action trigger library is exposed for you to create your own module or plugin in ‘.jar’ format. Your modules should be created for the event in which the integration is required. Once your module is ready for use, you can integrate the same with the Action APIs by simply copying the JAR file in the plugins folder of the action trigger and modifying the plugin details JSON file to point to your implementation of the action logic.
For instance, the JSON sample to be added in the details-plugins.json for an enterprise XR Travels to create a ticket in the support system on the event of notification would be as follows:
Conclusion
Heal’s Action API makes it extremely easy and intuitive for your enterprise to integrate with ITSM systems, collect diagnostic data and execute custom healing actions in the event of any anomaly occurring in your system. This makes Heal a powerful self-healing enabler for your enterprise, allowing you to be ready to tackle a plethora of business challenges while delivering on the promise of less outages and higher availability.
In our next blog, we focus on another dimension of healing which helps you plan for unexpected traffic and scale up pre-emptively. With projected healing, we help you project transaction growth trends to view corresponding infrastructure and system requirements to implement capacity forecasting intelligently. More on that in our next technology blog. Keep reading!