Adaptive Performance Anomaly Detection for Online Service Systems via Pattern Sketching

01/09/2022
by   Zhuangbin Chen, et al.
0

To ensure the performance of online service systems, their status is closely monitored with various software and system metrics. Performance anomalies represent the performance degradation issues (e.g., slow response) of the service systems. When performing anomaly detection over the metrics, existing methods often lack the merit of interpretability, which is vital for engineers and analysts to take remediation actions. Moreover, they are unable to effectively accommodate the ever-changing services in an online fashion. To address these limitations, in this paper, we propose ADSketch, an interpretable and adaptive performance anomaly detection approach based on pattern sketching. ADSketch achieves interpretability by identifying groups of anomalous metric patterns, which represent particular types of performance issues. The underlying issues can then be immediately recognized if similar patterns emerge again. In addition, an adaptive learning algorithm is designed to embrace unprecedented patterns induced by service updates or user behavior changes. The proposed approach is evaluated with public data as well as industrial data collected from a representative online service system in Huawei Cloud. The experimental results show that ADSketch outperforms state-of-the-art approaches by a significant margin, and demonstrate the effectiveness of the online algorithm in new pattern discovery. Furthermore, our approach has been successfully deployed in industrial practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2023

Practical Anomaly Detection over Multivariate Monitoring Metrics for Online Services

As modern software systems continue to grow in terms of complexity and v...
research
07/20/2023

Performance Issue Identification in Cloud Systems with Relational-Temporal Anomaly Detection

Performance issues permeate large-scale cloud service systems, which can...
research
12/13/2021

Challenges and Solutions to Build a Data Pipeline to Identify Anomalies in Enterprise System Performance

We discuss how VMware is solving the following challenges to harness dat...
research
01/05/2021

Label Augmentation via Time-based Knowledge Distillation for Financial Anomaly Detection

Detecting anomalies has become increasingly critical to the financial se...
research
10/21/2021

DeLag: Detecting Latency Degradation Patterns in Service-based Systems

Performance debugging in production is a fundamental activity in modern ...
research
02/26/2022

Regional-Local Adversarially Learned One-Class Classifier Anomalous Sound Detection in Global Long-Term Space

Anomalous sound detection (ASD) is one of the most significant tasks of ...
research
08/27/2021

Graph-based Incident Aggregation for Large-Scale Online Service Systems

As online service systems continue to grow in terms of complexity and vo...

Please sign up or login with your details

Forgot password? Click here to reset