Dagger: Accelerating RPCs in Cloud Microservices Through Tightly-Coupled Reconfigurable NICs

06/02/2021
by   Nikita Lazarev, et al.
0

The ongoing shift of cloud services from monolithic designs to microservices creates high demand for efficient and high performance datacenter networking stacks, optimized for fine-grained workloads. Commodity networking systems based on software stacks and peripheral NICs introduce high overheads when it comes to delivering small messages. We present Dagger, a hardware acceleration fabric for cloud RPCs based on FPGAs, where the accelerator is closely-coupled with the host processor over a configurable memory interconnect. The three key design principle of Dagger are: (1) offloading the entire RPC stack to an FPGA-based NIC, (2) leveraging memory interconnects instead of PCIe buses as the interface with the host CPU, and (3) making the acceleration fabric reconfigurable, so it can accommodate the diverse needs of microservices. We show that the combination of these principles significantly improves the efficiency and performance of cloud RPC systems while preserving their generality. Dagger achieves 1.3-3.8x higher per-core RPC throughput compared to both highly-optimized software stacks, and systems using specialized RDMA adapters. It also scales up to 84 Mrps with 8 threads on 4 CPU cores, while maintaining state-of-the-art us-scale tail latency. We also demonstrate that large third-party applications, like memcached and MICA KVS, can be easily ported on Dagger with minimal changes to their codebase, bringing their median and tail KVS access latency down to 2.8 - 3.5us and 5.4 - 7.8us, respectively. Finally, we show that Dagger is beneficial for multi-tier end-to-end microservices with different threading models by evaluating it using an 8-tier application implementing a flight check-in service.

READ FULL TEXT
research
07/16/2020

Dagger: Towards Efficient RPCs in Cloud Microservices with Near-Memory Reconfigurable NICs

Cloud applications are increasingly relying on hundreds of loosely-coupl...
research
10/23/2020

The nanoPU: Redesigning the CPU-Network Interface to Minimize RPC Tail Latency

The nanoPU is a new networking-optimized CPU designed to minimize tail l...
research
10/16/2022

QStack: Re-architecting User-space Network Stack to Optimize CPU Efficiency and Service Quality

TCP/IP network stack is irreplaceable for Web services in datacenter fro...
research
01/07/2023

Duet: Creating Harmony between Processors and Embedded FPGAs

The demise of Moore's Law has led to the rise of hardware acceleration. ...
research
11/05/2019

uqSim: Scalable and Validated Simulation of Cloud Microservices

Current cloud services are moving away from monolithic designs and towar...
research
02/07/2020

Breaking Band: A Breakdown of High-performance Communication

The critical path of internode communication on large-scale systems is c...
research
06/03/2022

Nezha: Deployable and High-Performance Consensus Using Synchronized Clocks

This paper presents a high-performance consensus protocol, Nezha, design...

Please sign up or login with your details

Forgot password? Click here to reset