Enhancing Failure Propagation Analysis in Cloud Computing Systems

08/30/2019
by   Domenico Cotroneo, et al.
0

In order to plan for failure recovery, the designers of cloud systems need to understand how their system can potentially fail. Unfortunately, analyzing the failure behavior of such systems can be very difficult and time-consuming, due to the large volume of events, non-determinism, and reuse of third-party components. To address these issues, we propose a novel approach that joins fault injection with anomaly detection to identify the symptoms of failures. We evaluated the proposed approach in the context of the OpenStack cloud computing platform. We show that our model can significantly improve the accuracy of failure analysis in terms of false positives and negatives, with a low computational cost.

READ FULL TEXT
research
09/30/2020

Fault Injection Analytics: A Novel Approach to Discover Failure Modes in Cloud-Computing Systems

Cloud computing systems fail in complex and unexpected ways due to unexp...
research
06/29/2021

Enhancing the Analysis of Software Failures in Cloud Computing Systems with Deep Learning

Identifying the failure modes of cloud computing systems is a difficult ...
research
07/09/2019

How Bad Can a Bug Get? An Empirical Analysis of Software Failures in the OpenStack Cloud Computing Platform

Cloud management systems provide abstractions and APIs for programmatica...
research
01/27/2019

Anomaly detecting and ranking of the cloud computing platform by multi-view learning

Anomaly detecting as an important technical in cloud computing is applie...
research
01/18/2023

Run-time Failure Detection via Non-intrusive Event Analysis in a Large-Scale Cloud Computing Platform

Cloud computing systems fail in complex and unforeseen ways due to unexp...
research
11/21/2019

Predicting Failures in Multi-Tier Distributed Systems

Many applications are implemented as multi-tier software systems, and ar...
research
02/02/2023

MLOps with enhanced performance control and observability

The explosion of data and its ever increasing complexity in the last few...

Please sign up or login with your details

Forgot password? Click here to reset