Efficient Two-Level Scheduling for Concurrent Graph Processing

by   Jin Zhao, et al.

With the rapidly growing demand of graph processing in the real scene, they have to efficiently handle massive concurrent jobs. Although existing work enable to efficiently handle single graph processing job, there are plenty of memory access redundancy caused by ignoring the characteristic of data access correlations. Motivated such an observation, we proposed two-level scheduling strategy in this paper, which enables to enhance the efficiency of data access and to accelerate the convergence speed of concurrent jobs. Firstly, correlations-aware job scheduling allows concurrent jobs to process the same graph data in Cache, which fundamentally alleviates the challenge of CPU repeatedly accessing the same graph data in memory. Secondly, multiple priority-based data scheduling provides the support of prioritized iteration for concurrent jobs, which is based on the global priority generated by individual priority of each job. Simultaneously, we adopt block priority instead of fine-grained priority to schedule graph data to decrease the computation cost. In particular, two-level scheduling significantly advance over the state-of-the-art because it works in the interlayer between data and systems.


page 1

page 2

page 3

page 4


SEH: Size Estimate Hedging for Single-Server Queues

For a single server system, Shortest Remaining Processing Time (SRPT) is...

A Scalable Deep Reinforcement Learning Model for Online Scheduling Coflows of Multi-Stage Jobs for High Performance Computing

Coflow is a recently proposed networking abstraction to help improve the...

Workflow Scheduling in the Cloud with Weighted Upward-rank Priority Scheme Using Random Walk and Uniform Spare Budget Splitting

We study a difficult problem of how to schedule complex workflows with p...

Flow-time Optimization For Concurrent Open-Shop and Precedence Constrained Scheduling Models

Scheduling a set of jobs over a collection of machines is a fundamental ...

Differential Approximation and Sprinting for Multi-Priority Big Data Engines

Today's big data clusters based on the MapReduce paradigm are capable of...

Hybrid Job-driven Scheduling for Virtual MapReduce Clusters

It is cost-efficient for a tenant with a limited budget to establish a v...

A Case Study: Using Genetic Algorithm for Job Scheduling Problem

Nowadays, DevOps pipelines of huge projects are getting more and more co...