Sage: Leveraging ML to Diagnose Unpredictable Performance in Cloud Microservices

12/12/2021
by   Yu Gan, et al.
0

Cloud applications are increasingly shifting from large monolithic services, to complex graphs of loosely-coupled microservices. Despite their advantages, microservices also introduce cascading QoS violations in cloud applications, which are difficult to diagnose and correct. We present Sage, a ML-driven root cause analysis system for interactive cloud microservices. Sage leverages unsupervised learning models to circumvent the overhead of trace labeling, determines the root cause of unpredictable performance online, and applies corrective actions to restore performance. On experiments on both dedicated local clusters and large GCE clusters we show that Sage achieves high root cause detection accuracy and predictable performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/01/2021

Sage: Using Unsupervised Learning for Scalable Performance Debugging in Microservices

Cloud applications are increasingly shifting from large monolithic servi...
research
05/27/2021

Sinan: Data-Driven, QoS-Aware Cluster Management for Microservices

Cloud applications are increasingly shifting from large monolithic servi...
research
09/11/2023

PACE-LM: Prompting and Augmentation for Calibrated Confidence Estimation with GPT-4 in Cloud Incident Root Cause Analysis

In recent years, the transition to cloud-based platforms in the IT secto...
research
04/24/2018

Seer: Leveraging Big Data to Navigate the Increasing Complexity of Cloud Debugging

Performance unpredictability in cloud services leads to poor user experi...
research
01/30/2017

Survey on Models and Techniques for Root-Cause Analysis

Automation and computer intelligence to support complex human decisions ...
research
05/12/2023

Monitoring and Adapting ML Models on Mobile Devices

ML models are increasingly being pushed to mobile devices, for low-laten...
research
04/26/2015

Monitoring Extreme-scale Lustre Toolkit

We discuss the design and ongoing development of the Monitoring Extreme-...

Please sign up or login with your details

Forgot password? Click here to reset