Intelligent Anomaly Detection and Mitigation in Data Centers

06/14/2019
by   Ashkan Aghdai, et al.
0

Data centers play a key role in today's Internet. Cloud applications are mainly hosted on multi-tenant warehouse-scale data centers. Anomalies pose a serious threat to data centers' operations. If not controlled properly, a simple anomaly can spread throughout the data center, resulting in a cascading failure. Amazon AWS had been affected by such incidents recently. Although some solutions are proposed to detect anomalies and prevent cascading failures, they mainly rely on application-specific metrics and case-based diagnosis to detect the anomalies. Given the variety of applications on a multi-tenant data center, proposed solutions are not capable of detecting anomalies in a timely manner. In this paper we design an application-agnostic anomaly detection scheme. More specifically, our design uses a highly distributed data mining scheme over network-level traffic metrics to detect anomalies. Once anomalies are detected, simple actions are taken to mitigate the damage. This ensures that errors are confined and prevents cascading failures before administrators intervene.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2019

Anomaly Detection As-a-Service

Cloud systems are complex, large, and dynamic systems whose behavior mus...
research
08/15/2023

Maat: Performance Metric Anomaly Anticipation for Cloud Services with Conditional Diffusion

Ensuring the reliability and user satisfaction of cloud services necessi...
research
02/19/2019

Secure Distributed Dynamic State Estimation in Wide-Area Smart Grids

Smart grid is a large complex network with a myriad of vulnerabilities, ...
research
11/03/2020

Heartbeat Diagnosis of Performance Anomaly in OpenMP Multi-Threaded Systems

This paper presents a novel heartbeat diagnosis regarding performance an...
research
03/09/2021

Learning Dependencies in Distributed Cloud Applications to Identify and Localize Anomalies

Operation and maintenance of large distributed cloud applications can qu...
research
06/12/2014

Event and Anomaly Detection Using Tucker3 Decomposition

Failure detection in telecommunication networks is a vital task. So far,...
research
02/18/2023

Anomaly Detection of UAV State Data Based on Single-class Triangular Global Alignment Kernel Extreme Learning Machine

Unmanned Aerial Vehicles (UAVs) are widely used and meet many demands in...

Please sign up or login with your details

Forgot password? Click here to reset