Anomaly Detection and Failure Root Cause Analysis in (Micro)Service-Based Cloud Applications: A Survey

05/26/2021
by   Jacopo Soldani, et al.
0

The momentum gained by microservices and cloud-native software architecture pushed nowadays enterprise IT towards multi-service applications. The proliferation of services and service interactions within applications, often consisting of hundreds of interacting services, makes it harder to detect failures and to identify their possible root causes, which is on the other hand crucial to promptly recover and fix applications. Various techniques have been proposed to promptly detect failures based on their symptoms, viz., observing anomalous behaviour in one or more application services, as well as to analyse logs or monitored performance of such services to determine the possible root causes for observed anomalies. The objective of this survey is to provide a structured overview and a qualitative analysis of currently available techniques for anomaly detection and root cause analysis in modern multi-service applications. Some open challenges and research directions stemming out from the analysis are also discussed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/16/2020

Anomalous Instance Detection in Deep Learning: A Survey

Deep Learning (DL) is vulnerable to out-of-distribution and adversarial ...
research
11/05/2021

CloudRCA: A Root Cause Analysis Framework for Cloud Computing Platforms

As business of Alibaba expands across the world among various industries...
research
04/07/2020

DiagNet: towards a generic, Internet-scale root cause analysis solution

Diagnosing problems in Internet-scale services remains particularly diff...
research
01/30/2017

Survey on Models and Techniques for Root-Cause Analysis

Automation and computer intelligence to support complex human decisions ...
research
04/21/2022

Mining Root Cause Knowledge from Cloud Service Incident Investigations for AIOps

Root Cause Analysis (RCA) of any service-disrupting incident is one of t...
research
09/05/2022

FIRED: a fine-grained robust performance diagnosis framework for cloud applications

To run a cloud application with the required service quality, operators ...
research
10/04/2018

Clustering-based Anomaly Detection for microservices

Anomaly detection is an important step in the management and monitoring ...

Please sign up or login with your details

Forgot password? Click here to reset