Generic and Robust Root Cause Localization for Multi-Dimensional Data in Online Service Systems

05/05/2023
by   Zeyan Li, et al.
0

Localizing root causes for multi-dimensional data is critical to ensure online service systems' reliability. When a fault occurs, only the measure values within specific attribute combinations are abnormal. Such attribute combinations are substantial clues to the underlying root causes and thus are called root causes of multidimensional data. This paper proposes a generic and robust root cause localization approach for multi-dimensional data, PSqueeze. We propose a generic property of root cause for multi-dimensional data, generalized ripple effect (GRE). Based on it, we propose a novel probabilistic cluster method and a robust heuristic search method. Moreover, we identify the importance of determining external root causes and propose an effective method for the first time in literature. Our experiments on two real-world datasets with 5400 faults show that the F1-score of PSqueeze outperforms baselines by 32.89 F1-score in determining external root causes of PSqueeze achieves 0.90. Furthermore, case studies in several production systems demonstrate that PSqueeze is helpful to fault diagnosis in the real world.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2022

RiskLoc: Localization of Multi-dimensional Root Causes by Weighted Risk

Failures and anomalies in large-scale software systems are unavoidable i...
research
01/31/2023

BALANCE: Bayesian Linear Attribution for Root Cause Localization

Root Cause Analysis (RCA) plays an indispensable role in distributed dat...
research
03/30/2022

CMMD: Cross-Metric Multi-Dimensional Root Cause Analysis

In large-scale online services, crucial metrics, a.k.a., key performance...
research
04/07/2020

DiagNet: towards a generic, Internet-scale root cause analysis solution

Diagnosing problems in Internet-scale services remains particularly diff...
research
08/08/2022

Constructing Large-Scale Real-World Benchmark Datasets for AIOps

Recently, AIOps (Artificial Intelligence for IT Operations) has been wel...
research
09/13/2023

Finding Morton-Like Layouts for Multi-Dimensional Arrays Using Evolutionary Algorithms

The layout of multi-dimensional data can have a significant impact on th...
research
05/13/2022

Automatic Root Cause Quantification for Missing Edges in JavaScript Call Graphs (Extended Version)

Building sound and precise static call graphs for real-world JavaScript ...

Please sign up or login with your details

Forgot password? Click here to reset