In the world of digital banking, maintaining a seamless and efficient online experience is paramount. However, even the most robust systems can encounter issues that disrupt service and degrade performance.
Let us delve into a recent incident that impacted eBanking services of one of our customers, highlighting the criticality of database management and the steps taken to resolve the issue.
Issue Overview:
Our monitoring systems flagged an unusual surge in database activity even on a normal incoming workload, characterized by high physical and logical reads, that began to affect the performance of eBanking services. This sudden spike was first detected in the evening, triggering alerts across multiple metrics, including CPU utilization and Disk I/O operations on database servers.
Unraveling the Cause:
Initial analysis pointed towards an SQL query that was operating without the critical “where” clause. This lack of specificity in the query led to a full table scan, a process where the database system reads every row of a table to retrieve the required data. Such operations are resource-intensive and can significantly impact the performance of the database, as was evident in the escalated physical reads and the subsequent strain on system resources.
Observations and Impact:
The timeline of events painted a concerning picture:
- A drastic increase in physical and logical reads, indicating that the database was processing more data than usual.
- A spike in CPU utilization on the database server, with usage soaring to 74%, which is an indicator of the server struggling to manage the load.
- An increase in Disk I/O operations, suggesting that the server was performing more read and write operations to the disk due to the inefficient query.
These symptoms collectively pointed towards an underlying inefficiency in the database operation, which needed immediate attention to prevent further degradation of service quality.
Addressing the Issue:
Despite the impact, no immediate action was taken to rectify the problem. However, a comprehensive analysis was conducted to identify the root cause and develop a long-term solution. It was determined that the absence of a “where” clause in a query, particularly one performing an aggregate function, was causing the unnecessary full table scans.
Implementing a Solution:
To address this inefficiency, a patch was developed and released to the bank. This update included a modification to replace the problematic query with a more efficient database sequence. This change was aimed at optimizing the query execution by introducing a filter mechanism that would prevent the need for full table scans, thereby reducing the physical reads and alleviating the strain on our database servers.
The incident served as a stark reminder of the importance of efficient query design and the potential impacts of overlooked details in SQL queries. It underscores the necessity for continuous monitoring, timely analysis, and proactive maintenance to ensure the resilience and performance of eBanking services. Through swift identification and resolution of the issue, we restored services to their optimal performance, reaffirming our commitment to providing seamless banking experience to our customers.
Lessons Learned:
- The critical role of efficient SQL query design in database performance.
- The importance of continuous monitoring and alert systems in identifying potential issues.
- The need for a swift and effective response mechanism to address and rectify performance issues and root cause
- This incident has provided valuable insights into database management practices, prompting a review and reinforcement of systems to prevent similar occurrences in the future.
About HEAL Software
HEAL Software is a renowned provider of AIOps (Artificial Intelligence for IT Operations) solutions. HEAL Software’s unwavering dedication to leveraging AI and automation empowers IT teams to address IT challenges, enhance incident management, reduce downtime, and ensure seamless IT operations. Through the analysis of extensive data, our solutions provide real-time insights, predictive analytics, and automated remediation, thereby enabling proactive monitoring and solution recommendation. Other features include anomaly detection, capacity forecasting, root cause analysis, and event correlation. With the state-of-the-art AIOps solutions, HEAL Software consistently drives digital transformation and delivers significant value to businesses across diverse industries.