FIRED: a fine-grained robust performance diagnosis framework for cloud applications

09/05/2022
by   Ruyue Xin, et al.
0

To run a cloud application with the required service quality, operators have to continuously monitor the cloud application's run-time status, detect potential performance anomalies, and diagnose the root causes of anomalies. However, existing models of performance anomaly detection often suffer from low re-usability and robustness due to the diversity of system-level metrics being monitored and the lack of high-quality labeled monitoring data for anomalies. Moreover, the current coarse-grained analysis models make it difficult to locate system-level root causes of the application performance anomalies for effective adaptation decisions. We provide a FIne-grained Robust pErformance Diagnosis (FIRED) framework to tackle those challenges. The framework offers an ensemble of several well-selected base models for anomaly detection using a deep neural network, which adopts weakly-supervised learning considering fewer labels exist in reality. The framework also employs a real-time fine-grained analysis model to locate dependent system metrics of the anomaly. Our experiments show that the framework can achieve the best detection accuracy and algorithm robustness, and it can predict anomalies in four minutes with F1 score higher than 0.8. In addition, the framework can accurately localize the first root causes, and with an average accuracy higher than 0.7 of locating first four root causes.

READ FULL TEXT

page 9

page 16

research
09/06/2022

CausalRCA: Causal Inference based Precise Fine-grained Root Cause Localization for Microservice Applications

For microservice applications with detected performance anomalies, local...
research
08/20/2023

Demystifying the Performance of Data Transfers in High-Performance Research Networks

High-speed research networks are built to meet the ever-increasing needs...
research
07/20/2023

Performance Issue Identification in Cloud Systems with Relational-Temporal Anomaly Detection

Performance issues permeate large-scale cloud service systems, which can...
research
05/26/2021

Anomaly Detection and Failure Root Cause Analysis in (Micro)Service-Based Cloud Applications: A Survey

The momentum gained by microservices and cloud-native software architect...
research
11/24/2021

Efficient Anomaly Detection Using Self-Supervised Multi-Cue Tasks

Deep anomaly detection has proven to be an efficient and robust approach...
research
06/17/2019

Slicing the IO execution with ReLayTracer

Analyzing IO performance anomalies is a crucial task in various computin...
research
09/26/2019

RADE: Resource-Efficient Supervised Anomaly Detection Using Decision Tree-Based Ensemble Methods

Decision-tree-based ensemble classification methods (DTEMs) are a preval...

Please sign up or login with your details

Forgot password? Click here to reset