Saath: Speeding up CoFlows by Exploiting the Spatial Dimension

11/16/2021
by   Akshay Jajoo, et al.
0

Coflow scheduling improves data-intensive application performance by improving their networking performance. State-of-the-art Coflow schedulers in essence approximate the classic online Shortest-Job-First (SJF) scheduling, designed for a single CPU, in a distributed setting, with no coordination among how the flows of a Coflow at individual ports are scheduled, and as a result suffer two performance drawbacks: (1) The flows of a Coflow may suffer the out-of-sync problem – they may be scheduled at different times and become drifting apart, negatively affecting the Coflow completion time (CCT); (2) FIFO scheduling of flows at each port bears no notion of SJF, leading to suboptimal CCT. We propose SAATH, an online Coflow scheduler that overcomes the above drawbacks by explicitly exploiting the spatial dimension of Coflows. In SAATH, the global scheduler schedules the flows of a Coflow using an all-or-none policy which mitigates the out-of-sync problem. To order the Coflows within each queue, SAATH resorts to a Least-Contention-First (LCoF) policy which we show extends the gist of SJF to the spatial dimension, complemented with starvation freedom. Our evaluation using an Azure testbed and simulations of two production cluster traces show that compared to Aalo, SAATH reduces the CCT in median (P90) cases by 1.53x (4.5x) and 1.42x (37x), respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/25/2021

A Case for Sampling Based Learning Techniques in Coflow Scheduling

Coflow scheduling improves data-intensive application performance by imp...
research
01/22/2020

To schedule or not to schedule: when no-scheduling can beat the best-known flow scheduling algorithm in datacenter networks

Conventional wisdom for minimizing the average flow completion time (AFC...
research
08/24/2021

The Case for Task Sampling based Learning for Cluster Job Scheduling

The ability to accurately estimate job runtime properties allows a sched...
research
12/17/2018

Coflow Scheduling in Data Centers: Routing and Bandwidth Allocation

In distributed computing frameworks like MapReduce, Spark, and Dyrad, a ...
research
08/07/2019

Redundancy Scheduling in Systems with Bi-Modal Job Service Time Distribution

Queuing systems with redundant requests have drawn great attention becau...
research
07/18/2018

HyLine: a Simple and Practical Flow Scheduling for Commodity Datacenters

Today's datacenter networks (DCNs) have been built upon multipath topolo...

Please sign up or login with your details

Forgot password? Click here to reset