FIRM: An Intelligent Fine-Grained Resource Management Framework for SLO-Oriented Microservices

08/19/2020
by   Haoran Qiu, et al.
0

Modern user-facing latency-sensitive web services include numerous distributed, intercommunicating microservices that promise to simplify software development and operation. However, multiplexing of compute resources across microservices is still challenging in production because contention for shared resources can cause latency spikes that violate the service-level objectives (SLOs) of user requests. This paper presents FIRM, an intelligent fine-grained resource management framework for predictable sharing of resources across microservices to drive up overall utilization. FIRM leverages online telemetry data and machine-learning methods to adaptively (a) detect/localize microservices that cause SLO violations, (b) identify low-level resources in contention, and (c) take actions to mitigate SLO violations via dynamic reprovisioning. Experiments across four microservice benchmarks demonstrate that FIRM reduces SLO violations by up to 16x while reducing the overall requested CPU limit by up to 62 predictability by reducing tail latencies by up to 11x.

READ FULL TEXT
research
10/06/2020

Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo

Resource provisioning in multi-tenant stream processing systems faces th...
research
10/16/2022

QStack: Re-architecting User-space Network Stack to Optimize CPU Efficiency and Service Quality

TCP/IP network stack is irreplaceable for Web services in datacenter fro...
research
07/05/2022

Fine-Grained Modeling and Optimization for Intelligent Resource Management in Big Data Processing

Big data processing at the production scale presents a highly complex en...
research
06/20/2023

Fine-grained Policy-driven I/O Sharing for Burst Buffers

A burst buffer is a common method to bridge the performance gap between ...
research
07/03/2020

Towards an Intelligent Data Delivery Service

The ATLAS Event Streaming Service (ESS) at the LHC is an approach to pre...
research
09/07/2021

Memory at Your Service: Fast Memory Allocation for Latency-critical Services

Co-location and memory sharing between latency-critical services, such a...
research
06/25/2023

A Framework for dynamically meeting performance objectives on a service mesh

We present a framework for achieving end-to-end management objectives fo...

Please sign up or login with your details

Forgot password? Click here to reset