Slytherin: Dynamic, Network-assisted Prioritization of Tail Packets in Datacenter Networks

07/05/2018
by   Hamed Rezaei, et al.
0

Datacenter applications demand both low latency and high throughput; while interactive applications (e.g., Web Search) demand low tail latency for their short messages due to their partition-aggregate software architecture, many data-intensive applications (e.g., Map-Reduce) require high throughput for long flows as they move vast amounts of data across the network. Recent proposals improve latency of short flows and throughput of long flows by addressing the shortcomings of existing packet scheduling and congestion control algorithms, respectively. We make the key observation that long tails in the Flow Completion Times (FCT) of short flows result from packets that suffer congestion at more than one switch along their paths in the network. Our proposal, Slytherin, specifically targets packets that suffered from congestion at multiple points and prioritizes them in the network. Slytherin leverages ECN mechanism which is widely used in existing datacenters to identify such tail packets and dynamically prioritizes them using existing priority queues. As compared to existing state-of-the-art packet scheduling proposals, Slytherin achieves 18.6 without any loss of throughput. Further, Slytherin drastically reduces 99th percentile queue length in switches by a factor of about 2x on average.

READ FULL TEXT
research
12/28/2021

PowerTCP: Pushing the Performance Limits of Datacenter Networks

Increasingly stringent throughput and latency requirements in datacenter...
research
07/04/2014

RepNet: Cutting Tail Latency in Data Center Networks with Flow Replication

Data center networks need to provide low latency, especially at the tail...
research
04/16/2019

Scaling TCP's Congestion Window for Small Round Trip Times

This memo explains that deploying active queue management (AQM) to count...
research
10/28/2021

Optimizing Tail Latency in Commodity Datacenters using Forward Error Correction

Long tail latency of short flows (or messages) greatly affects user-faci...
research
09/17/2023

A Survey on Congestion Control and Scheduling for Multipath TCP: Machine Learning vs Classical Approaches

Multipath TCP (MPTCP) has been widely used as an efficient way for commu...
research
07/07/2020

PINT: Probabilistic In-band Network Telemetry

Commodity network devices support adding in-band telemetry measurements ...
research
07/06/2020

Providing In-network Support to Coflow Scheduling

Many emerging distributed applications, including big data analytics, ge...

Please sign up or login with your details

Forgot password? Click here to reset