Exhaustive Exploration of the Failure-oblivious Computing Search Space

10/25/2017
by   Thomas Durieux, et al.
0

High-availability of software systems requires automated handling of crashes in presence of errors. Failure-oblivious computing is one technique that aims to achieve high availability. We note that failure-obliviousness has not been studied in depth yet, and there is very few study that helps understand why failure-oblivious techniques work. In order to make failure-oblivious computing to have an impact in practice, we need to deeply understand failure-oblivious behaviors in software. In this paper, we study, design and perform an experiment that analyzes the size and the diversity of the failure-oblivious behaviors. Our experiment consists of exhaustively computing the search space of 16 field failures of large-scale open-source Java software. The outcome of this experiment is a much better understanding of what really happens when failure-oblivious computing is used, and this opens new promising research directions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2022

Reflections on Software Failure Analysis

Failure studies are important in revealing the root causes, behaviors, a...
research
06/27/2022

Towards a Failure-Aware SDLC for Internet of Things

Internet of Things systems carry substantial engineering risks including...
research
04/19/2023

Availability Model of a 5G-MEC System

Multi-access Edge Computing (MEC) is one of the enabling technologies of...
research
08/18/2019

Feedback-based, Automated Failure Testing of Microservice-based Applications

Modern distributed applications are moving toward a microservice archite...
research
10/13/2021

Detection Software Content Failures Using Dynamic Execution Information

Modern software systems become too complex to be tested and validated. D...
research
12/15/2022

Calculation of the High-Energy Neutron Flux for Anticipating Errors and Recovery Techniques in Exascale Supercomputer Centres

The age of exascale computing has arrived and the risks associated with ...
research
06/11/2022

Rare event failure test case generation in Learning-Enabled-Controllers

Machine learning models have prevalent applications in many real-world p...

Please sign up or login with your details

Forgot password? Click here to reset