Lamda: The Last Mile of the Datacenter Network Does matter

11/11/2022
by   Qiang Li, et al.
0

In this paper, we conduct systematic measurement studies to show that the high memory bandwidth consumption of modern distributed applications can lead to a significant drop of network throughput and a large increase of tail latency in high-speed RDMA networks.We identify its root cause as the high contention of memory bandwidth between application processes and network processes. This contention leads to frequent packet drops at the NIC of receiving hosts, which triggers the congestion control mechanism of the network and eventually results in network performance degradation. To tackle this problem, we make a key observation that given the distributed storage service, the vast majority of data it receives from the network will be eventually written to high-speed storage media (e.g., SSD) by CPU. As such, we propose to bypass host memory when processing received data to completely circumvent this performance bottleneck. In particular, we design Lamda, a novel receiver cache processing system that consumes a small amount of CPU cache to process received data from the network at line rate. We implement a prototype of Lamda and evaluate its performance extensively in a Clos-based testbed. Results show that for distributed storage applications, Lamda improves network throughput by 4.7 improves network throughput by up 17 size under the memory bandwidth pressure, respectively. Lamda can also be applied to latency-sensitive HPC applications, which reduces their communication latency by 35.1

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/21/2022

LQoCo: Learning to Optimize Cache Capacity Overloading in Storage Systems

Cache plays an important role to maintain high and stable performance (i...
research
01/29/2018

Using High-Speed WANs and Network Data Caches to Enable Remote and Distributed Visualization

Visapult is a prototype application and framework for remote visualizati...
research
04/05/2020

Kollaps: Decentralized and Dynamic Topology Emulation

The performance and behavior of large-scale distributed applications is ...
research
02/06/2023

PASCAL: A Learning-aided Cooperative Bandwidth Control Policy for Hierarchical Storage Systems

Nowadays, the Hierarchical Storage System (HSS) is considered as an idea...
research
04/25/2019

mmb: Flexible High-Speed Userspace Middleboxes

Nowadays, Internet actors have to deal with a strong increase in Interne...
research
10/23/2020

The nanoPU: Redesigning the CPU-Network Interface to Minimize RPC Tail Latency

The nanoPU is a new networking-optimized CPU designed to minimize tail l...
research
10/10/2019

High-speed Privacy Amplification Scheme using GMP in Quantum Key Distribution

Privacy amplification (PA) is the art of distilling a highly secret key ...

Please sign up or login with your details

Forgot password? Click here to reset