Do the Hard Stuff First: Scheduling Dependent Computations in Data-Analytics Clusters

by   Robert Grandl, et al.

We present a scheduler that improves cluster utilization and job completion times by packing tasks having multi-resource requirements and inter-dependencies. While the problem is algorithmically very hard, we achieve near-optimality on the job DAGs that appear in production clusters at a large enterprise and in benchmarks such as TPC-DS. A key insight is that carefully handling the long-running tasks and those with tough-to-pack resource needs will produce good-enough schedules. However, which subset of tasks to treat carefully is not clear (and intractable to discover). Hence, we offer a search procedure that evaluates various possibilities and outputs a preferred schedule order over tasks. An online component enforces the schedule orders desired by the various jobs running on the cluster. In addition, it packs tasks, overbooks the fungible resources and guarantees bounded unfairness for a variety of desirable fairness schemes. Relative to the state-of-the art schedulers, we speed up 50


page 1

page 2

page 3

page 4


PingAn: An Insurance Scheme for Job Acceleration in Geo-distributed Big Data Analytics System

Geo-distributed data analysis in a cloud-edge system is emerging as a da...

Two stage cluster for resource optimization with Apache Mesos

As resource estimation for jobs is difficult, users often overestimate t...

Linear Time Algorithms for Multiple Cluster Scheduling and Multiple Strip Packing

We study the Multiple Cluster Scheduling problem and the Multiple Strip ...

The Case for Task Sampling based Learning for Cluster Job Scheduling

The ability to accurately estimate job runtime properties allows a sched...

A Sum-of-Ratios Multi-Dimensional-Knapsack Decomposition for DNN Resource Scheduling

In recent years, to sustain the resource-intensive computational needs f...

DeepPlace: Learning to Place Applications in Multi-Tenant Clusters

Large multi-tenant production clusters often have to handle a variety of...

Towards Collaborative Optimization of Cluster Configurations for Distributed Dataflow Jobs

Analyzing large datasets with distributed dataflow systems requires the ...