An Adaptive Resilience Testing Framework for Microservice Systems

12/25/2022
by   Tianyi Yang, et al.
0

Resilience testing, which measures the ability to minimize service degradation caused by unexpected failures, is crucial for microservice systems. The current practice for resilience testing relies on manually defining rules for different microservice systems. Due to the diverse business logic of microservices, there are no one-size-fits-all microservice resilience testing rules. As the quantity and dynamic of microservices and failures largely increase, manual configuration exhibits its scalability and adaptivity issues. To overcome the two issues, we empirically compare the impacts of common failures in the resilient and unresilient deployments of a benchmark microservice system. Our study demonstrates that the resilient deployment can block the propagation of degradation from system performance metrics (e.g., memory usage) to business metrics (e.g., response latency). In this paper, we propose AVERT, the first AdaptiVE Resilience Testing framework for microservice systems. AVERT first injects failures into microservices and collects available monitoring metrics. Then AVERT ranks all the monitoring metrics according to their contributions to the overall service degradation caused by the injected failures. Lastly, AVERT produces a resilience index by how much the degradation in system performance metrics propagates to the degradation in business metrics. The higher the degradation propagation, the lower the resilience of the microservice system. We evaluate AVERT on two open-source benchmark microservice systems. The experimental results show that AVERT can accurately and efficiently test the resilience of microservice systems.

READ FULL TEXT
research
06/14/2017

Towards Adaptive Resilience in High Performance Computing

Failure rates in high performance computers rapidly increase due to the ...
research
06/21/2023

Do Resilience Metrics of Water Distribution Systems Really Assess Resilience? A Critical Review

Having become vital to satisfying basic human needs, water distribution ...
research
11/30/2020

The LDBC Graphalytics Benchmark

In this document, we describe LDBC Graphalytics, an industrial-grade ben...
research
01/29/2021

Infrastructure Resilience Curves: Performance Measures and Summary Metrics

Resilience curves communicate system behavior and resilience properties ...
research
03/31/2023

Towards Developing Resilient and Service-oriented Mission-critical Systems

Mission-critical systems (MCSs) have embraced new design paradigms such ...
research
10/21/2021

Model-based Reinforcement Learning for Service Mesh Fault Resiliency in a Web Application-level

Microservice-based architectures enable different aspects of web applica...
research
09/06/2023

Supporting Early-Safety Analysis of IoT Systems by Exploiting Testing Techniques

IoT systems complexity and susceptibility to failures pose significant c...

Please sign up or login with your details

Forgot password? Click here to reset