Online Self-Evolving Anomaly Detection in Cloud Computing Environments

11/16/2021
by   Haili Wang, et al.
0

Modern cloud computing systems contain hundreds to thousands of computing and storage servers. Such a scale, combined with ever-growing system complexity, is causing a key challenge to failure and resource management for dependable cloud computing. Autonomic failure detection is a crucial technique for understanding emergent, cloud-wide phenomena and self-managing cloud resources for system-level dependability assurance. To detect failures, we need to monitor the cloud execution and collect runtime performance data. These data are usually unlabeled, and thus a prior failure history is not always available in production clouds. In this paper, we present a self-evolving anomaly detection (SEAD) framework for cloud dependability assurance. Our framework self-evolves by recursively exploring newly verified anomaly records and continuously updating the anomaly detector online. As a distinct advantage of our framework, cloud system administrators only need to check a small number of detected anomalies, and their decisions are leveraged to update the detector. Thus, the detector evolves following the upgrade of system hardware, update of the software stack, and change of user workloads. Moreover, we design two types of detectors, one for general anomaly detection and the other for type-specific anomaly detection. With the help of self-evolving techniques, our detectors can achieve 88.94% in sensitivity and 94.60% in specificity on average, which makes them suitable for real-world deployment.

READ FULL TEXT
research
02/14/2022

DeCorus: Hierarchical Multivariate Anomaly Detection at Cloud-Scale

Multivariate anomaly detection can be used to identify outages within la...
research
10/21/2020

Anomaly Detection in a Large-scale Cloud Platform

Cloud computing is ubiquitous: more and more companies are moving the wo...
research
09/18/2019

Anomaly Detection As-a-Service

Cloud systems are complex, large, and dynamic systems whose behavior mus...
research
08/20/2021

CloudShield: Real-time Anomaly Detection in the Cloud

In cloud computing, it is desirable if suspicious activities can be dete...
research
11/08/2017

Collaborative Anomaly Detection Framework for handling Big Data of Cloud Computing

With the ubiquitous computing of providing services and applications at ...
research
06/29/2021

Enhancing the Analysis of Software Failures in Cloud Computing Systems with Deep Learning

Identifying the failure modes of cloud computing systems is a difficult ...
research
05/18/2020

Anomaly Detection in Cloud Components

Cloud platforms, under the hood, consist of a complex inter-connected st...

Please sign up or login with your details

Forgot password? Click here to reset