How Can We Help?

A batch job is a program or a set of programs that HEAL processes in sequence without requiring human intervention. Batch jobs are typically used to perform common or repetitive tasks such as creating backup files, running a series of calculations periodically, etc. There can be issues with batch jobs such as failures of jobs or they can run slower than expected. In an environment where there are multiple batch jobs running at different times of the day in multiple systems, it can be tedious to keep track of all of them. Batch Job Monitoring uses ML-based insights to help users keep track of the progress of various batch jobs running in their environments and highlight any abnormal behavior.

A batch job is a set of multiple background processes. You may set up batch jobs as per your infrastructure and requirements. You can ship the packaged software with several background processes. You can group these processes together as batch jobs. You can identify a process by a process identifier which may have arguments. You can schedule a batch job or it can be ad-hoc.

Navigating to Batch Job Monitoring

If the Batch Job Monitoring module is installed and configured, you can view the BJM link. Select on  icon. Select External Links. HEAL supports multiple external links. Multiple applications can use batch jobs. HEAL displays all the batch jobs in external links. Select a specific batch job application. Batch Job URL opens up in a new tab. In this tab, you can view the BJM application.

Summary

 

Summary

 

Summary

Viewing Batch Job Summary

Summary

 

1You can select any date to view the details of jobs which are scheduled on that date.
2You can select any of the available groups from Filter by Group box to view jobs for that group.
3You can select any of the available servers from Filter by SOL box to view jobs for that SOL Id. This displays the batch jobs per server.
4At any point in time, a job can have one of the following status –

A) Yet to Run – It indicates a job is scheduled to run today.

B) Running – It indicates a job is running right now.

C) Not Started – It indicates a job was scheduled to be running right now, but for some reason, it is not.

D) Completed – It indicates a job has completed the run. A run could end with either of these sub-status.

Success – Job execution is successful.

Failure – Job execution failed.

Unknown – Due to some internal technical error, or status code mapping issues, application is unable to determine whether the job ended successfully, or in error.

5This displays top five groups with highest number of failed jobs.

Group name – This lists names of the top five worst performing groups.

Total Jobs – This lists total number of jobs belonging to a group.

Completed – This lists the number of jobs from a group which completed the execution.

Time Taken – This lists the time taken in seconds to execute the jobs from a group which completed the execution. This is the time difference between ‘Last job end time’ and ‘First job start time’.

Failed – This lists the percentage of failed jobs from a group.

6This displays top five SOL Ids with highest number of failed jobs.

SOL Id – This lists names of the top five worst performing SOLs.

Total Jobs – This lists total number of jobs belonging to a server.

Completed – This lists the number of jobs from a server which completed the execution.

Total Time Taken – This lists the time taken in seconds to execute the jobs from a SOL which completed the execution. This is the time difference between ‘Last job end time’ and ‘First job start time’.

Failed – This lists the percentage of failed jobs from a SOL.

A job process runs on a server. Log files are created per server. Log file names are unique across the servers.

If the SOL name is long, only a few characters are displayed. Hover on the SOL name to see the full name.

Summary

You can filter the data by server name. Select the server name in Filter by SOL box.

Summary

You can filter the data by group name. Select the server name in Filter by Group box. One group can be part of multiple SOLs.

Summary

You can download the group or server summary in a CSV file.  Select icon. Select Download as CSV file.

Batch Jobs

Viewing Job Details

Select Job Details to view details of all the jobs configured.

Job Details

Select see more in Job Summary screen to view all the jobs belonging to that server. Select on the page numbers to navigate between the pages.

Job Details

Select on the SOL name in Job Summary screen to view all the jobs belonging to that server. Select on the page numbers to navigate between the pages. You can see the SOL name mentioned in SOL Id box.

Job Details

Select on a group name in Job Summary screen to view all the jobs in that group.

Job Details

Select on a status bar in Job Summary screen to view all the jobs having that particular status. Say, you select on Success status bar, then following screen is displayed.

Job Details

Say, you select on Completed status bar, then following screen is displayed.

Job Details

You can search for a group or a server. Details of all the jobs that are part of the group or server are displayed. You can search for a specific or a partial group or a server name.

Job Details

Viewing Process Details

Select on a job id in Job Details screen to view the process details associated with the job. A process can have arguments. Process Details section displays process name, arguments for the process, and the number of servers on which the process executes. Select on a process to view the details and the paths of the servers.

Batch Job Summary

If there are no processes associated with a job, following screen is displayed when you select on a job id.

Batch Job Summary

Viewing Historical Data

Select Historical Data to view the jobs for past dates.  If you manually select present date, following screen is displayed.

Historical Data

If you select a run on date other than present date in Job Summary screen, and then select Historical Data, date remains same. Otherwise, historical data displayed is for yesterday.

There are two views in which you can see the historical data for jobs – Table View and Tile View.

Historical Data

In Tile view, there is a widget per group defined in the system.

Select 1 day to view historical data for last 1 day with respect to the run on date you select.

Historical Data

Hover on a SOL Id to view the full name of the SOL.

Historical Data

Select see more to navigate to Job Details screen for the respective group.

Historical data

You can download the group or server summary in a CSV file.  Select  icon. Select Download as CSV file.

Historical data

Select 30 days to view historical data for past 30 days with respect to the run on date you select.

Historical data

Hover on a data point to view following details for a particular date.

Total Jobs – Total number of jobs on a particular day.

Failed Jobs – Total number of failed jobs on a particular day.

Jobs Processed – Total number of processed jobs on a particular day.

Total Time Taken – Time taken in minutes to process the jobs.

Historical data

Batch Job Alerts

HEAL raises batch job alerts when –

Batch job doesn’t start at the expected start time (delay in starting a batch job)

Batch job takes more time than expected to complete (delay in completing a batch job) – This can be due to delay in starting a batch job or it runs slow or both conditions apply.

Batch job deviates from the normal behavior

Table of Contents