Practical Anomaly Detection over Multivariate Monitoring Metrics for Online Services

08/19/2023
by   Jinyang Liu, et al.
0

As modern software systems continue to grow in terms of complexity and volume, anomaly detection on multivariate monitoring metrics, which profile systems' health status, becomes more and more critical and challenging. In particular, the dependency between different metrics and their historical patterns plays a critical role in pursuing prompt and accurate anomaly detection. Existing approaches fall short of industrial needs for being unable to capture such information efficiently. To fill this significant gap, in this paper, we propose CMAnomaly, an anomaly detection framework on multivariate monitoring metrics based on collaborative machine. The proposed collaborative machine is a mechanism to capture the pairwise interactions along with feature and temporal dimensions with linear time complexity. Cost-effective models can then be employed to leverage both the dependency between monitoring metrics and their historical patterns for anomaly detection. The proposed framework is extensively evaluated with both public data and industrial data collected from a large-scale online service system of Huawei Cloud. The experimental results demonstrate that compared with state-of-the-art baseline models, CMAnomaly achieves an average F1 score of 0.9494, outperforming baselines by 6.77 10.68 of deploying CMAnomaly in Huawei Cloud.

READ FULL TEXT

page 1

page 8

research
01/09/2022

Adaptive Performance Anomaly Detection for Online Service Systems via Pattern Sketching

To ensure the performance of online service systems, their status is clo...
research
07/20/2023

Performance Issue Identification in Cloud Systems with Relational-Temporal Anomaly Detection

Performance issues permeate large-scale cloud service systems, which can...
research
03/15/2022

Practical data monitoring in the internet-services domain

Large-scale monitoring, anomaly detection, and root cause analysis of me...
research
08/27/2021

Graph-based Incident Aggregation for Large-Scale Online Service Systems

As online service systems continue to grow in terms of complexity and vo...
research
02/06/2019

KISS methodologies for network management and anomaly detection

Current networks are increasingly growing in size and complexity and so ...
research
04/30/2023

Two-phase Dual COPOD Method for Anomaly Detection in Industrial Control System

Critical infrastructures like water treatment facilities and power plant...
research
10/26/2022

A Hierarchical Approach to Conditional Random Fields for System Anomaly Detection

Anomaly detection to recognize unusual events in large scale systems in ...

Please sign up or login with your details

Forgot password? Click here to reset