Scheduling of Graph Queries: Controlling Intra- and Inter-query Parallelism for a High System Throughput

10/20/2021
by   Matthias Hauck, et al.
0

The vast amounts of data used in social, business or traffic networks, biology and other natural sciences are often managed in graph-based data sets, consisting of a few thousand up to billions and trillions of vertices and edges, respectively. Typical applications utilizing such data either execute one or a few complex queries or many small queries at the same time interactively or as batch jobs. Furthermore, graph processing is inherently complex, as data sets can substantially differ (scale free vs. constant degree), and algorithms exhibit diverse behavior (computational intensity, local or global, push- or pull-based). This work is concerned with multi-query execution by automatically controlling the degree of parallelization, with overall objectives including high system utilization, low synchronization cost, and highly efficient concurrent execution. The underlying concept is three-fold: (1) sampling is used to determine graph statistics, (2) parallelization constraints are derived from algorithm and system properties, and (3) suitable work packages are generated based on the previous two aspects. We evaluate the proposed concept using different algorithms on synthetic and real world data sets, with up to 16 concurrent sessions (queries). The results demonstrate a robust performance in spite of these various configurations, and in particular that the performance is always close to or even slightly ahead of the performance of manually optimized implementations. Furthermore, the similar performance to manually optimized implementations under extreme configurations, which require either a full parallelization (few large queries) or complete sequential execution (many small queries), shows that the proposed concept exhibits a particularly low overhead.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/27/2021

Cache-Efficient Fork-Processing Patterns on Large Graphs

As large graph processing emerges, we observe a costly fork-processing p...
research
09/15/2017

Fast OLAP Query Execution in Main Memory on Large Data in a Cluster

Main memory column-stores have proven to be efficient for processing ana...
research
09/01/2018

Pay One, Get Hundreds for Free: Reducing Cloud Costs through Shared Query Execution

Cloud-based data analysis is nowadays common practice because of the low...
research
04/30/2021

Fast Compilation and Execution of SQL Queries with WebAssembly

Interpreted execution of queries, as in the vectorized model, suffers fr...
research
09/04/2018

A Simple and Practical Concurrent Non-blocking Unbounded Graph with Reachability Queries

Graph algorithms applied in many applications, including social networks...
research
12/16/2021

Predictive Price-Performance Optimization for Serverless Query Processing

We present an efficient, parametric modeling framework for predictive re...

Please sign up or login with your details

Forgot password? Click here to reset