Survey on Models and Techniques for Root-Cause Analysis

01/30/2017
by   Marc Solé, et al.
0

Automation and computer intelligence to support complex human decisions becomes essential to manage large and distributed systems in the Cloud and IoT era. Understanding the root cause of an observed symptom in a complex system has been a major problem for decades. As industry dives into the IoT world and the amount of data generated per year grows at an amazing speed, an important question is how to find appropriate mechanisms to determine root causes that can handle huge amounts of data or may provide valuable feedback in real-time. While many survey papers aim at summarizing the landscape of techniques for modelling system behavior and infering the root cause of a problem based in the resulting models, none of those focuses on analyzing how the different techniques in the literature fit growing requirements in terms of performance and scalability. In this survey, we provide a review of root-cause analysis, focusing on these particular aspects. We also provide guidance to choose the best root-cause analysis strategy depending on the requirements of a particular system and application.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2021

Anomaly Detection and Failure Root Cause Analysis in (Micro)Service-Based Cloud Applications: A Survey

The momentum gained by microservices and cloud-native software architect...
research
01/10/2018

BigRoots: An Effective Approach for Root-cause Analysis of Stragglers in Big Data System

Stragglers are commonly believed to have a great impact on the performan...
research
12/12/2021

Sage: Leveraging ML to Diagnose Unpredictable Performance in Cloud Microservices

Cloud applications are increasingly shifting from large monolithic servi...
research
06/11/2019

ROOT I/O compression algorithms and their performance impact within Run 3

The LHCs Run3 will push the envelope on data-intensive workflows and, si...
research
09/11/2023

PACE-LM: Prompting and Augmentation for Calibrated Confidence Estimation with GPT-4 in Cloud Incident Root Cause Analysis

In recent years, the transition to cloud-based platforms in the IT secto...
research
04/21/2022

Mining Root Cause Knowledge from Cloud Service Incident Investigations for AIOps

Root Cause Analysis (RCA) of any service-disrupting incident is one of t...
research
08/13/2018

Simple Root Cause Analysis by Separable Likelihoods

Root Cause Analysis for Anomalies is challenging because of the trade-of...

Please sign up or login with your details

Forgot password? Click here to reset