Predicting Failures in Multi-Tier Distributed Systems

by   Leonardo Mariani, et al.

Many applications are implemented as multi-tier software systems, and are executed on distributed infrastructures, like cloud infrastructures, to benefit from the cost reduction that derives from dynamically allocating resources on-demand. In these systems, failures are becoming the norm rather than the exception, and predicting their occurrence, as well as locating the responsible faults, are essential enablers of preventive and corrective actions that can mitigate the impact of failures, and significantly improve the dependability of the systems. Current failure prediction approaches suffer either from false positives or limited accuracy, and do not produce enough information to effectively locate the responsible faults. In this paper, we present PreMiSE, a lightweight and precise approach to predict failures and locate the corresponding faults in multi-tier distributed systems. PreMiSE blends anomaly-based and signature-based techniques to identify multi-tier failures that impact on performance indicators, with high precision and low false positive rate. The experimental results that we obtained on a Cloud-based IP Multimedia Subsystem indicate that PreMiSE can indeed predict and locate possible failure occurrences with high precision and low overhead.



There are no comments yet.


page 1

page 2

page 3

page 4


Enhancing Failure Propagation Analysis in Cloud Computing Systems

In order to plan for failure recovery, the designers of cloud systems ne...

Cloud Failure Prediction with Hierarchical Temporal Memory: An Empirical Assessment

Hierarchical Temporal Memory (HTM) is an unsupervised learning algorithm...

Warranty Cost Estimation Using Bayesian Network

All multi-component product manufacturing companies face the problem of ...

Sample Adaptive Multiple Kernel Learning for Failure Prediction of Railway Points

Railway points are among the key components of railway infrastructure. A...

Localizing Faults in Cloud Systems

By leveraging large clusters of commodity hardware, the Cloud offers gre...

An Exploratory Study of Field Failures

Field failures, that is, failures caused by faults that escape the testi...

Software-Based Monitoring and Analysis of a USB Host Controller Subject to Electrostatic Discharge

Observing, understanding, and mitigating the effects of failure in embed...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.