PREVENT: An Unsupervised Approach to Predict Software Failures in Production

08/25/2022
by   Giovanni Denaro, et al.
0

This paper presents PREVENT, an approach for predicting and localizing failures in distributed enterprise applications by combining unsupervised techniques. Software failures can have dramatic consequences in production, and thus predicting and localizing failures is the essential step to activate healing measures that limit the disruptive consequences of failures. At the state of the art, many failures can be predicted from anomalous combinations of system metrics with respect to either rules provided from domain experts or supervised learning models. However, both these approaches limit the effectiveness of current techniques to well understood types of failures that can be either captured with predefined rules or observed while trining supervised models. PREVENT integrates the core ingredients of unsupervised approaches into a novel approach to predict failures and localize failing resources, without either requiring predefined rules or training with observed failures. The results of experimenting with PREVENT on a commercially-compliant distributed cloud system indicate that PREVENT provides more stable and reliable predictions, earlier than or comparably to supervised learning approaches, without requiring long and often impractical training with failures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2019

Predicting Failures in Multi-Tier Distributed Systems

Many applications are implemented as multi-tier software systems, and ar...
research
01/27/2022

Prediction of GPU Failures Under Deep Learning Workloads

Graphics processing units (GPUs) are the de facto standard for processin...
research
09/12/2022

A Causal-based Approach to Explain, Predict and Prevent Failures in Robotic Tasks

Robots working in real environments need to adapt to unexpected changes ...
research
08/25/2020

Failures and Fixes: A Study of Software System Incident Response

This paper presents the results of a research study related to software ...
research
02/17/2021

DepOwl: Detecting Dependency Bugs to Prevent Compatibility Failures

Applications depend on libraries to avoid reinventing the wheel. Librari...
research
01/06/2019

Combining Unsupervised and Supervised Learning for Asset Class Failure Prediction in Power Systems

In power systems, an asset class is a group of power equipment that has ...
research
05/12/2019

Automating chaos experiments in production

Distributed systems often face transient errors and localized component ...

Please sign up or login with your details

Forgot password? Click here to reset