{"id":37586,"date":"2025-07-31T19:36:08","date_gmt":"2025-07-31T19:36:08","guid":{"rendered":"https:\/\/www.newsbeep.com\/ca\/37586\/"},"modified":"2025-07-31T19:36:08","modified_gmt":"2025-07-31T19:36:08","slug":"uncomplicate-cloud-management-with-ai-led-observability","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/ca\/37586\/","title":{"rendered":"Uncomplicate Cloud Management with AI-Led Observability"},"content":{"rendered":"<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">By Vijay Verma, Persistent Systems<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">Enterprises pivot to the cloud with various goals, such as cost reduction, competitive agility, or scaling AI, among others. But the likelihood of these expectations falling flat remains high because enterprises are yet to align their cloud operating model with business goals.<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">For instance, they source monitoring tools to flag drifts in application behavior, performance, and responsiveness based on predetermined thresholds, like they would in an on-premises setup. However, cloud environments are dynamic, especially hybrid deployments, with containers and <a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_self\" href=\"https:\/\/www.itprotoday.com\/software-development\/what-are-microservices-\" rel=\"nofollow noopener\">microservices<\/a> that rapidly assemble and disassemble.<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">Static thresholds create high-dimensional data that only adds to the complexity. Without risk-based prioritization, this data deluge creates numerous alerts that lead to <a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/www.techtarget.com\/whatis\/definition\/alert-fatigue\" rel=\"nofollow noopener\">alert fatigue<\/a>, false positives\/negatives, and noise. A <a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/www.bcg.com\/publications\/2021\/an-effective-it-monitoring-strategy-is-now-a-business-imperative\" rel=\"nofollow noopener\">BCG<\/a> survey revealed that more than 14% of organizations deploy 50 or more monitoring tools. More than 70% of the data they collect is unnecessary, leading to:<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">Inflated costs: The global market for commercial IT monitoring grew at a compound annual rate of 11.2% between 2018 and 2020, while the total IT market grew by less than 3%.<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">Risks of downtime: Failure to predict or timely remediate issues can increase the possibility of <a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_self\" href=\"https:\/\/www.itprotoday.com\/it-operations\/how-to-cut-the-hidden-costs-of-it-downtime\" rel=\"nofollow noopener\">downtime<\/a>. Global 2000 companies lose an estimated <a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/www.splunk.com\/en_us\/campaigns\/the-hidden-costs-of-downtime.html\" rel=\"nofollow noopener\">$400 billion annually<\/a> to system failure or crawl, compounded by the loss of customer trust and reputational damage.<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">Complexity loops: With numerous tools, teams, and dynamic environments, 74% of CIOs state that the growing complexity makes efficient management extremely challenging.<\/p>\n<p data-component=\"related-article\" class=\"RelatedArticle\">Related:<a class=\"RelatedArticle-RelatedContent\" href=\"https:\/\/www.itprotoday.com\/hybrid-cloud\/the-great-cloud-reversal-why-it-teams-are-moving-back-to-dedicated-infrastructure\" target=\"_self\" data-discover=\"true\" rel=\"nofollow noopener\">The Great Cloud Reversal: Why IT Teams Are Moving Back to Dedicated Infrastructure<\/a><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">Proactively identifying and remediating operational issues requires AI-infused observability systems driven by machine learning (ML), natural language processing (NLP), and generative AI (GenAI). With foundational visibility, these AI-embedded systems can help triage issues, improve performance, reduce expenses, and sustainably achieve cloud objectives.<\/p>\n<p>The Missing Part: Data Analysis &amp; Actionable Insights<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">As cloud monitoring evolves into observability, the health, performance, and behavior of applications, systems, and infrastructure are gauged through telemetry data, such as events, logs, and traces that are correlated with timestamps, resource IDs, and error codes. Observability platforms help IT teams with holistically represented signals from multiple cloud environments that can map how different elements mutually interact to influence performance, security, and management.<\/p>\n<p data-component=\"related-article\" class=\"RelatedArticle\">Related:<a class=\"RelatedArticle-RelatedContent\" href=\"https:\/\/www.itprotoday.com\/it-management\/edge-computing-trends-adoption-challenges-and-future-outlook\" target=\"_self\" data-discover=\"true\" rel=\"nofollow noopener\">Edge Computing Trends: Adoption, Challenges, and Future Outlook<\/a><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">Although observability platforms consolidate signals, they do not provide a path to remediation or insights that predict failures or detect anomalies, leaving cloud monitoring still reactive. It still requires a human to analyze these signals. These systems inherently cannot map how changes made in one part of the cloud estate impact the larger ecosystem, or how, for instance, a utilization strategy can help meet financial and sustainability goals.<\/p>\n<p>Fixing the &#8216;Observability Gap&#8217; with AI<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">The classic dilemma with having a lot of data is knowing what to do with it. While observability consolidates telemetry in a single-pane view to remove data silos and point toward a real-time trend, it still does not answer how to pre-empt or triage risks. This know-how exists as tribal knowledge or is primarily dependent on a trial-and-error method, creating bottlenecks to timely resolution. An <a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/uk.newsroom.ibm.com\/IBM-Report-Soaring-Data-Breach-Disruption-Drive-Costs-to-Record-Levels\" rel=\"nofollow noopener\">IBM study<\/a> found more than half of the organizations globally had severe or high-level staffing shortages and experienced an average of $1.76 million in higher security breach costs as a result.\u00a0<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">Solutions exist that help clients simplify cloud observability with conversational AI that interacts with observability data, eliminating the need for manual sifting through alerts and dashboards. The underlying AI correlates across all four dimensions of metrics, events, logs, and traces and leverages <a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_self\" href=\"https:\/\/www.itprotoday.com\/it-operations\/when-to-use-opentelemetry-and-ebpf-for-modern-observability\" rel=\"nofollow noopener\">OpenTelemetry<\/a> to create a unified view of signals across cloud environments.<\/p>\n<p data-component=\"related-article\" class=\"RelatedArticle\">Related:<a class=\"RelatedArticle-RelatedContent\" href=\"https:\/\/www.itprotoday.com\/cloud-computing\/cloud-computing-vs-cloud-networking-related-but-distinct-it-domains\" target=\"_self\" data-discover=\"true\" rel=\"nofollow noopener\">Cloud Computing vs. Cloud Networking: Related but Distinct IT Domains<\/a><\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">Embedding AI into cloud observability can help:<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">Identify issues with weak drifts: ML models learn from baseline behavior (e.g., daily traffic patterns, resource consumption, number of inference requests, etc.) to flag minor deviations that point to a brewing issue. They analyze historical data as well as real-time telemetry to pick up anomalies, even with weak signals, triggering mitigation plans to pre-empt issues such as latency, scalability, security breaches, or cost overruns. AI helps reduce false alerts, i.e., signals that point to a non-issue or signals that fail to point to a critical issue, with up to 98% accuracy, and can potentially <a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/venturebeat.com\/security\/crowdstrikes-ai-slashes-soc-workloads-over-40-hours-a-week\/\" rel=\"nofollow noopener\">reduce 40 hours of manual triage per week<\/a>.<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">Significantly improve resolution times: By cross-referencing a large volume of telemetry data (logs, traces, metrics) to identify hidden dependencies, AI can segregate alerts with severity, allowing teams to cut noise and focus on the ones that point to a real incident. Organizations that deploy AI into observability are <a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_blank\" href=\"https:\/\/www.splunk.com\/en_us\/blog\/devops\/state-of-observability-2024-reveals-how-leaders-outpace-their-peers.html\" rel=\"nofollow noopener\">2.3 times more<\/a> likely than non-AI observability users to detect and mitigate issues within minutes or hours, compared to weeks or even months. AI can help teams collaborate more effectively by translating technical issues into actionable steps and can also ensure similar issues do not occur in the future.<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">Tap into competitive agility: AI observability systems provide a deeper overlay of the infrastructure, application, and data layers, using time-series forecasting to identify issues such as gradual resource leaks and spikes in latency, and proactively predicting failure before critical thresholds are breached. AI automatically triggers corrective measures or alerts concerned teams to make modifications before an outage causes downtime. This frees up developer bandwidth to focus on the core job instead of troubleshooting incidents, with organizations with mature AI observability generating 2.6x more code on demand than their non-AI counterparts.<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">Improve scalability and flexibility:AI can map service dependencies to identify cascading failures across distributed systems. Traditional monitoring setups often fail to detect these issues because they only evaluate components in isolation. By handling petabytes of data across distributed systems, AI can analyze the dynamic usage and patterns of containers, <a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link\" target=\"_self\" href=\"https:\/\/www.itprotoday.com\/cloud-computing\/kubernetes-1-32-penelope-introduces-key-innovations-for-open-source-cloud-deployment\" rel=\"nofollow noopener\">Kubernetes<\/a>, and other serverless architectures in the cloud, adapting to changing contexts and providing real-time insights into how components behave, ensuring scalability and flexibility.<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">Create context-aware alerting: AI can help prioritize alerts based on business impact (e.g., payment gateway errors over non-critical service delays) and suppress low-priority alerts during peak business hours, ensuring the team&#8217;s resources and bandwidth are dedicated to handling business-critical exceptions.<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">Ensure continuous learning: Models retrain with new data, adapting to infrastructure changes (e.g., dynamically scaling Kubernetes clusters), leading to self-healing, self-learning cloud operations that are proactive and fault-resistant.<\/p>\n<p>Way Forward for AI-Shy Enterprises<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">AI-based observability is still evolving and has not yet achieved full maturity. Despite the benefits, organizations express concerns about safeguarding privacy and sensitive information when implementing AI for observability. CIOs acknowledge that AI-enhanced observability is still in its developmental phase, and comprehending how AI models process data and derive insights while ensuring security can be challenging.<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">To address these challenges, organizations can implement measures, even if that involves trade-offs. For instance, issues related to governance, security, and compliance can be mitigated through techniques such as data redaction and private hosting to restrict the dissemination of sensitive data. However, achieving complete control over the workings of the models remains an ongoing endeavor. To enhance reliability and accuracy, continuous training and learning have improved the accuracy of root cause predictions, along with accompanying analysis.<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">Enabling AI to take autonomous actions based on the proactive identification of anomalies or metric drifts can create self-healing, cost-optimized, and performance-tuned cloud environments, significantly reducing the need for manual intervention and allowing teams to focus on strategic initiatives.<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">About the author:<\/p>\n<p class=\"ContentParagraph ContentParagraph_align_left\" data-testid=\"content-paragraph\">Vijay Verma is Chief Revenue Officer \u2013 Service Lines at <a class=\"ContentText-BodyTextChunk ContentText-BodyTextChunk_link ContentText-BodyTextChunk_italic\" target=\"_blank\" href=\"https:\/\/www.persistent.com\/\" rel=\"nofollow noopener\">Persistent Systems<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"By Vijay Verma, Persistent Systems Enterprises pivot to the cloud with various goals, such as cost reduction, competitive&hellip;\n","protected":false},"author":2,"featured_media":37587,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21],"tags":[49,48,285,61],"class_list":{"0":"post-37586","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-computing","8":"tag-ca","9":"tag-canada","10":"tag-computing","11":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts\/37586","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/comments?post=37586"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/posts\/37586\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/media\/37587"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/media?parent=37586"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/categories?post=37586"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ca\/wp-json\/wp\/v2\/tags?post=37586"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}