Fault Injection Analytics: A Novel Approach to Discover Failure Modes in Cloud-Computing Systems

09/30/2020
by   Domenico Cotroneo, et al.
0

Cloud computing systems fail in complex and unexpected ways due to unexpected combinations of events and interactions between hardware and software components. Fault injection is an effective means to bring out these failures in a controlled environment. However, fault injection experiments produce massive amounts of data, and manually analyzing these data is inefficient and error-prone, as the analyst can miss severe failure modes that are yet unknown. This paper introduces a new paradigm (fault injection analytics) that applies unsupervised machine learning on execution traces of the injected system, to ease the discovery and interpretation of failure modes. We evaluated the proposed approach in the context of fault injection experiments on the OpenStack cloud computing platform, where we show that the approach can accurately identify failure modes with a low computational cost.

READ FULL TEXT

page 2

page 3

page 8

research
08/30/2019

Enhancing Failure Propagation Analysis in Cloud Computing Systems

In order to plan for failure recovery, the designers of cloud systems ne...
research
01/19/2022

ThorFI: A Novel Approach for Network Fault Injection as a Service

In this work, we present a novel fault injection solution (ThorFI) for v...
research
06/29/2021

Enhancing the Analysis of Software Failures in Cloud Computing Systems with Deep Learning

Identifying the failure modes of cloud computing systems is a difficult ...
research
01/18/2023

Run-time Failure Detection via Non-intrusive Event Analysis in a Large-Scale Cloud Computing Platform

Cloud computing systems fail in complex and unforeseen ways due to unexp...
research
04/29/2018

Investigating Power Outage Effects on Reliability of Solid-State Drives

Solid-State Drives (SSDs) are recently employed in enterprise servers an...
research
02/05/2015

OS-level Failure Injection with SystemTap

Failure injection in distributed systems has been an important issue to ...
research
10/29/2021

UDIS: Unsupervised Discovery of Bias in Deep Visual Recognition Models

Deep learning models have been shown to learn spurious correlations from...

Please sign up or login with your details

Forgot password? Click here to reset