Recommending Root-Cause and Mitigation Steps for Cloud Incidents using Large Language Models

01/10/2023
by   Toufique Ahmed, et al.
0

Incident management for cloud services is a complex process involving several steps and has a huge impact on both service health and developer productivity. On-call engineers require significant amount of domain knowledge and manual effort for root causing and mitigation of production incidents. Recent advances in artificial intelligence has resulted in state-of-the-art large language models like GPT-3.x (both GPT-3.0 and GPT-3.5), which have been used to solve a variety of problems ranging from question answering to text summarization. In this work, we do the first large-scale study to evaluate the effectiveness of these models for helping engineers root cause and mitigate production incidents. We do a rigorous study at Microsoft, on more than 40,000 incidents and compare several large language models in zero-shot, fine-tuned and multi-task setting using semantic and lexical metrics. Lastly, our human evaluation with actual incident owners show the efficacy and future potential of using artificial intelligence for resolving cloud incidents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2023

Empowering Practical Root Cause Analysis by Large Language Models for Cloud Incidents

Ensuring the reliability and availability of cloud services necessitates...
research
05/29/2023

Assess and Summarize: Improve Outage Understanding with Large Language Models

Cloud systems have become increasingly popular in recent years due to th...
research
09/11/2023

PACE-LM: Prompting and Augmentation for Calibrated Confidence Estimation with GPT-4 in Cloud Incident Root Cause Analysis

In recent years, the transition to cloud-based platforms in the IT secto...
research
12/07/2022

Talking About Large Language Models

Thanks to rapid progress in artificial intelligence, we have entered an ...
research
08/28/2023

Spoken Language Intelligence of Large Language Models for Language Learning

People have long hoped for a conversational system that can assist in re...
research
09/01/2023

Large Language Models for Semantic Monitoring of Corporate Disclosures: A Case Study on Korea's Top 50 KOSPI Companies

In the rapidly advancing domain of artificial intelligence, state-of-the...
research
06/06/2023

Impact of Large Language Models on Generating Software Specifications

Software specifications are essential for ensuring the reliability of so...

Please sign up or login with your details

Forgot password? Click here to reset