Maat: Performance Metric Anomaly Anticipation for Cloud Services with Conditional Diffusion

08/15/2023
by   Cheryl Lee, et al.
0

Ensuring the reliability and user satisfaction of cloud services necessitates prompt anomaly detection followed by diagnosis. Existing techniques for anomaly detection focus solely on real-time detection, meaning that anomaly alerts are issued as soon as anomalies occur. However, anomalies can propagate and escalate into failures, making faster-than-real-time anomaly detection highly desirable for expediting downstream analysis and intervention. This paper proposes Maat, the first work to address anomaly anticipation of performance metrics in cloud services. Maat adopts a novel two-stage paradigm for anomaly anticipation, consisting of metric forecasting and anomaly detection on forecasts. The metric forecasting stage employs a conditional denoising diffusion model to enable multi-step forecasting in an auto-regressive manner. The detection stage extracts anomaly-indicating features based on domain knowledge and applies isolation forest with incremental learning to detect upcoming anomalies. Thus, our method can uncover anomalies that better conform to human expertise. Evaluation on three publicly available datasets demonstrates that Maat can anticipate anomalies faster than real-time comparatively or more effectively compared with state-of-the-art real-time anomaly detectors. We also present cases highlighting Maat's success in forecasting abnormal metrics and discovering anomalies.

READ FULL TEXT
research
01/09/2018

Precision and Recall for Range-Based Anomaly Detection

Classical anomaly detection is principally concerned with point-based an...
research
01/24/2020

RePAD: Real-time Proactive Anomaly Detection for Time Series

During the past decade, many anomaly detection approaches have been intr...
research
06/14/2019

Intelligent Anomaly Detection and Mitigation in Data Centers

Data centers play a key role in today's Internet. Cloud applications are...
research
05/18/2020

Anomaly Detection in Cloud Components

Cloud platforms, under the hood, consist of a complex inter-connected st...
research
04/06/2020

Moving Metric Detection and Alerting System at eBay

At eBay, there are thousands of product health metrics for different dom...
research
06/19/2023

Machine Learning for Real-Time Anomaly Detection in Optical Networks

This work proposes a real-time anomaly detection scheme that leverages t...
research
12/24/2020

Improving Predictability of User-Affecting Metrics to Support Anomaly Detection in Cloud Services

Anomaly detection systems aim to detect and report attacks or unexpected...

Please sign up or login with your details

Forgot password? Click here to reset