Dart: Divide and Specialize for Fast Response to Congestion in RDMA-based Datacenter Networks

05/28/2018
by   Jaichen Xue, et al.
0

Though Remote Direct Memory Access (RDMA) promises to reduce datacenter network latencies significantly compared to TCP (e.g., 10x), end-to-end congestion control in the presence of incasts is a challenge. Targeting the full generality of the congestion problem, previous schemes rely on slow, iterative convergence to the appropriate sending rates (e.g., TIMELY takes 50 RTTs). We leverage the result in several papers that most congestion in datacenter networks occurs at the receiver. Accordingly, we propose a divide-and-specialize approach, called Dart, which isolates the common case of receiver congestion and further subdivides the remaining in-network congestion into the simpler spatially-localized and the harder spatially-dispersed cases. For receiver congestion, Dart proposes direct apportioning of sending rates (DASR) in which a receiver for n senders directs each sender to cut its rate by a factor of n, converging in only one RTT. For the spatially-localized case, Dart employs deflection by adding novel switch hardware for in-order flow deflection (IOFD) because RDMA disallows packet reordering, providing fast (under one RTT), light-weight response. For the uncommon spatially-dispersed case, Dart falls back to DCQCN. Small- scale testbed measurements and at-scale simulations, respectively, show that Dart achieves 60 lower 99th percentile latency, and similar and 58 and TIMELY and DCQCN

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/25/2018

Pulser: Fast Congestion Response using Explicit Incast Notifications for Datacenter Networks

Datacenter applications frequently cause incast congestion, which degrad...
research
12/28/2021

PowerTCP: Pushing the Performance Limits of Datacenter Networks

Increasingly stringent throughput and latency requirements in datacenter...
research
04/30/2023

SFC: Near-Source Congestion Signaling and Flow Control

State-of-the-art congestion control algorithms for data centers alone do...
research
11/15/2022

Low Latency Techniques for Mobile Backhaul over DOCSIS

The mobile network operators (MNOs) are looking into economically viable...
research
12/27/2021

Machine Learning in Congestion Control: A Survey on Selected Algorithms and a New Roadmap to their Implementation

With the emergence of new technologies, computer networks are becoming m...
research
01/22/2022

Sliding Window Challenge Process for Congestion Detection

Many prominent smart-contract applications such as payment channels, auc...

Please sign up or login with your details

Forgot password? Click here to reset