Terra: Scalable Cross-Layer GDA Optimizations

04/17/2019
by   Jie You, et al.
0

Geo-distributed analytics (GDA) frameworks transfer large datasets over the wide-area network (WAN). Yet existing frameworks often ignore the WAN topology. This disconnect between WAN-bound applications and the WAN itself results in missed opportunities for cross-layer optimizations. In this paper, we present Terra to bridge this gap. Instead of decoupled WAN routing and GDA transfer scheduling, Terra applies scalable cross-layer optimizations to minimize WAN transfer times for GDA jobs. We present a two-pronged approach: (i) a scalable algorithm for joint routing and scheduling to make fast decisions; and (ii) a scalable, overlay-based enforcement mechanism that avoids expensive switch rule updates in the WAN. Together, they enable Terra to quickly react to WAN uncertainties such as large bandwidth fluctuations and failures in an application-aware manner as well. Integration with the FloodLight SDN controller and Apache YARN, and evaluation on 4 workloads and 3 WAN topologies show that Terra improves the average completion times of GDA jobs by 1.55x-3.43x. GDA jobs running with Terra meets 2.82x-4.29x more deadlines and can quickly react to WAN-level events in an application-aware manner.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/25/2021

Node-Based Job Scheduling for Large Scale Simulations of Short Running Jobs

Diverse workloads such as interactive supercomputing, big data analysis,...
research
04/09/2018

Prompt Scheduling for Selfish Agents

We give a prompt online mechanism for minimizing the sum of [weighted] c...
research
11/11/2018

On SDN-Enabled Online and Dynamic Bandwidth Allocation for Stream Analytics

Data communication in cloud-based distributed stream data analytics ofte...
research
05/02/2018

GraphIt - A High-Performance DSL for Graph Analytics

The performance bottlenecks of graph applications depend not only on the...
research
09/03/2022

HammingMesh: A Network Topology for Large-Scale Deep Learning

Numerous microarchitectural optimizations unlocked tremendous processing...
research
05/22/2018

DRESS: Dynamic RESource-reservation Scheme for Congested Data-intensive Computing Platforms

In the past few years, we have envisioned an increasing number of busine...
research
05/22/2018

Cache-based Multi-query Optimization for Data-intensive Scalable Computing Frameworks

In modern large-scale distributed systems, analytics jobs submitted by v...

Please sign up or login with your details

Forgot password? Click here to reset