CausalRCA: Causal Inference based Precise Fine-grained Root Cause Localization for Microservice Applications
For microservice applications with detected performance anomalies, localizing root causes based on monitoring data is important to enabling rapid recovery and loss mitigation. Existing research mainly focuses on coarse-grained faulty service localization. However, the fine-grained root cause localization to identify not only faulty service but also the root cause metric in the service is more helpful for operators to fix application anomalies, which is also more challenging. Recently, causal inference (CI) based methods is becoming popular but currently used CI methods have limitations, such as linear causal relations assumption. Therefore, this paper provides a framework named CausalRCA to implement fine-grained, automated, and real-time root cause localization. The CausalRCA works with a gradient-based causal structure learning method to generate weighted causal graphs and a root cause inference method to localize root cause metrics. We conduct coarse-grained and fine-grained root cause localization to validate the localization performance of CausalRCA. Experimental results show that CausalRCA performs best localization accuracy compared with baseline methods, e.g., the average AC@3 of the fine-grained root cause metric localization in the faulty service is 0.719, and the average improvement is 17% compared with baseline methods.
READ FULL TEXT