Enhancing the Analysis of Software Failures in Cloud Computing Systems with Deep Learning

06/29/2021
by   Domenico Cotroneo, et al.
0

Identifying the failure modes of cloud computing systems is a difficult and time-consuming task, due to the growing complexity of such systems, and the large volume and noisiness of failure data. This paper presents a novel approach for analyzing failure data from cloud systems, in order to relieve human analysts from manually fine-tuning the data for feature engineering. The approach leverages Deep Embedded Clustering (DEC), a family of unsupervised clustering algorithms based on deep learning, which uses an autoencoder to optimize data dimensionality and inter-cluster variance. We applied the approach in the context of the OpenStack cloud computing platform, both on the raw failure data and in combination with an anomaly detection pre-processing algorithm. The results show that the performance of the proposed approach, in terms of purity of clusters, is comparable to, or in some cases even better than manually fine-tuned clustering, thus avoiding the need for deep domain knowledge and reducing the effort to perform the analysis. In all cases, the proposed approach provides better performance than unsupervised clustering when no feature engineering is applied to the data. Moreover, the distribution of failure modes from the proposed approach is closer to the actual frequency of the failure modes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2019

Enhancing Failure Propagation Analysis in Cloud Computing Systems

In order to plan for failure recovery, the designers of cloud systems ne...
research
09/30/2020

Fault Injection Analytics: A Novel Approach to Discover Failure Modes in Cloud-Computing Systems

Cloud computing systems fail in complex and unexpected ways due to unexp...
research
10/29/2021

UDIS: Unsupervised Discovery of Bias in Deep Visual Recognition Models

Deep learning models have been shown to learn spurious correlations from...
research
03/05/2019

A Deep Learning based approach to VM behavior identification in cloud systems

Cloud computing data centers are growing in size and complexity to the p...
research
11/16/2021

Online Self-Evolving Anomaly Detection in Cloud Computing Environments

Modern cloud computing systems contain hundreds to thousands of computin...
research
05/29/2023

Assess and Summarize: Improve Outage Understanding with Large Language Models

Cloud systems have become increasingly popular in recent years due to th...
research
09/25/2021

A Principled Approach to Failure Analysis and Model Repairment: Demonstration in Medical Imaging

Machine learning models commonly exhibit unexpected failures post-deploy...

Please sign up or login with your details

Forgot password? Click here to reset