FIXME: Enhance Software Reliability with Hybrid Approaches in Cloud

02/17/2021
by   Jinho Hwang, et al.
0

With the promise of reliability in cloud, more enterprises are migrating to cloud. The process of continuous integration/deployment (CICD) in cloud connects developers who need to deliver value faster and more transparently with site reliability engineers (SREs) who need to manage applications reliably. SREs feed back development issues to developers, and developers commit fixes and trigger CICD to redeploy. The release cycle is more continuous than ever, thus the code to production is faster and more automated. To provide this higher level agility, the cloud platforms become more complex in the face of flexibility with deeper layers of virtualization. However, reliability does not come for free with all these complexities. Software engineers and SREs need to deal with wider information spectrum from virtualized layers. Therefore, providing correlated information with true positive evidences is critical to identify the root cause of issues quickly in order to reduce mean time to recover (MTTR), performance metrics for SREs. Similarity, knowledge, or statistics driven approaches have been effective, but with increasing data volume and types, an individual approach is limited to correlate semantic relations of different data sources. In this paper, we introduce FIXME to enhance software reliability with hybrid diagnosis approaches for enterprises. Our evaluation results show using hybrid diagnosis approach is about 17 in precision. The results are helpful for both practitioners and researchers to develop hybrid diagnosis in the highly dynamic cloud environment.

READ FULL TEXT
research
05/25/2023

Empowering Practical Root Cause Analysis by Large Language Models for Cloud Incidents

Ensuring the reliability and availability of cloud services necessitates...
research
11/05/2021

CloudRCA: A Root Cause Analysis Framework for Cloud Computing Platforms

As business of Alibaba expands across the world among various industries...
research
07/04/2020

Towards Semantic Detection of Smells in Cloud Infrastructure Code

Automated deployment and management of Cloud applications relies on desc...
research
10/12/2020

Carbon to Diamond: An Incident Remediation Assistant System From Site Reliability Engineers' Conversations in Hybrid Cloud Operations

Conversational channels are changing the landscape of hybrid cloud servi...
research
02/09/2019

Performance Modeling of Microservice Platforms Considering the Dynamics of the underlying Cloud Infrastructure

Microservice architecture has transformed the way developers are buildin...
research
05/09/2022

Architectural Partitioning and Deployment Modeling on Hybrid Clouds

The hybrid cloud idea is increasingly gaining momentum because it brings...
research
02/25/2021

Migration of CMSWEB Cluster at CERN to Kubernetes

The CMS experiment heavily relies on the CMSWEB cluster to host critical...

Please sign up or login with your details

Forgot password? Click here to reset