DeepAI AI Chat
Log In Sign Up

Fault Injection Analytics: A Novel Approach to Discover Failure Modes in Cloud-Computing Systems

09/30/2020
by   Domenico Cotroneo, et al.
0

Cloud computing systems fail in complex and unexpected ways due to unexpected combinations of events and interactions between hardware and software components. Fault injection is an effective means to bring out these failures in a controlled environment. However, fault injection experiments produce massive amounts of data, and manually analyzing these data is inefficient and error-prone, as the analyst can miss severe failure modes that are yet unknown. This paper introduces a new paradigm (fault injection analytics) that applies unsupervised machine learning on execution traces of the injected system, to ease the discovery and interpretation of failure modes. We evaluated the proposed approach in the context of fault injection experiments on the OpenStack cloud computing platform, where we show that the approach can accurately identify failure modes with a low computational cost.

READ FULL TEXT

page 2

page 3

page 8

08/30/2019

Enhancing Failure Propagation Analysis in Cloud Computing Systems

In order to plan for failure recovery, the designers of cloud systems ne...
01/19/2022

ThorFI: A Novel Approach for Network Fault Injection as a Service

In this work, we present a novel fault injection solution (ThorFI) for v...
06/29/2021

Enhancing the Analysis of Software Failures in Cloud Computing Systems with Deep Learning

Identifying the failure modes of cloud computing systems is a difficult ...
01/18/2023

Run-time Failure Detection via Non-intrusive Event Analysis in a Large-Scale Cloud Computing Platform

Cloud computing systems fail in complex and unforeseen ways due to unexp...
02/05/2015

OS-level Failure Injection with SystemTap

Failure injection in distributed systems has been an important issue to ...
07/26/2018

FINJ: A Fault Injection Tool for HPC Systems

We present FINJ, a high-level fault injection tool for High-Performance ...
04/29/2018

Investigating Power Outage Effects on Reliability of Solid-State Drives

Solid-State Drives (SSDs) are recently employed in enterprise servers an...