CAFT: Congestion-Aware Fault-Tolerant Load Balancing for Three-Tier Clos Data Centers

10/01/2020
by   Sultan Alanazi, et al.
0

Production data centers operate under various workload sizes ranging from latency-sensitive mice flows to long-lived elephant flows. However, the predominant load balancing scheme in data center networks, equal-cost multi-path (ECMP), is agnostic to path conditions and performs poorly in asymmetric topologies, resulting in low throughput and high latencies. In this paper, we propose CAFT, a distributed congestion-aware fault-tolerant load balancing protocol for 3-tier data center networks. It first collects, in real time, the complete congestion information of two subsets from the set of all possible paths between any two hosts. Then, the best path congestion information from each subset is carried across the switches, during the Transport Control Protocol (TCP) connection process, to make path selection decision. Having two candidate paths improve the robustness of CAFT to asymmetries caused by link failures. Large-scale ns-3 simulations show that CAFT outperforms Expeditus in mean flow completion time (FCT) and network throughput for both symmetric and asymmetric scenarios.

READ FULL TEXT

page 5

page 6

research
11/24/2018

In-network Congestion-aware Load Balancing at Transport Layer

Load balancing at transport layer is an important function in data cente...
research
07/29/2013

RepFlow: Minimizing Flow Completion Times with Replicated Flows in Data Centers

Short TCP flows that are critical for many interactive applications in d...
research
02/08/2018

PTP: Path-specified Transport Protocol for Concurrent Multipath Transmission in Named Data Networks

Named Data Networking (NDN) is a promising Future Internet architecture ...
research
09/01/2020

A Novel Software-based Multi-path RDMA Solutionfor Data Center Networks

In this paper we propose Virtuoso, a purely software-based multi-path RD...
research
02/21/2018

CECT: Computationally Efficient Congestion-avoidance and Traffic Engineering in Software-defined Cloud Data Centers

The proliferation of cloud data center applications and network function...
research
05/07/2023

Boosting Distributed Machine Learning Training Through Loss-tolerant Transmission Protocol

Distributed Machine Learning (DML) systems are utilized to enhance the s...
research
11/15/2019

Methods for Predicting Behavior of Elephant Flows in Data Center Networks

Several Traffic Engineering (TE) techniques based on SDN (Software-defin...

Please sign up or login with your details

Forgot password? Click here to reset