Cut to Fit: Tailoring the Partitioning to the Computation

04/20/2018
by   Iacovos Kolokasis, et al.
0

Social Graph Analytics applications are very often built using off-the-shelf analytics frameworks. These, however, are profiled and optimized for the general case and have to perform for all kinds of graphs. This paper investigates how knowledge of the application and the dataset can help optimize performance with minimal effort. We concentrate on the impact of partitioning strategies on the performance of computations on social graphs. We evaluate six graph partitioning algorithms on a set of six social graphs, using four standard graph algorithms by measuring a set of five partitioning metrics. We analyze the performance of each partitioning strategy with respect to (i) the properties of the graph dataset, (ii) each analytics computation,of partitions. We discover that communication cost is the best predictor of performance for most -but not all- analytics computations. We also find that the best partitioning strategy for a particular kind of algorithm may not be the best for another, and that optimizing for the general case of all algorithms may not select the optimal partitioning strategy for a given graph algorithm. We conclude with insights on selecting the right data partitioning strategy, which has significant impact on the performance of large graph analytics computations; certainly enough to warrant optimization of the partitioning strategy to the computation and to the dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/03/2022

Clustering-based Partitioning for Large Web Graphs

Graph partitioning plays a vital role in distributedlarge-scale web grap...
research
09/09/2022

Machine Learning-based Selection of Graph Partitioning Strategy Using the Characteristics of Graph Data and Algorithm

Analyzing large graph data is an essential part of many modern applicati...
research
03/02/2019

GAP: Generalizable Approximate Graph Partitioning Framework

Graph partitioning is the problem of dividing the nodes of a graph into ...
research
06/30/2020

Lachesis: Automated Generation of Persistent Partitionings for UDF-Centric Analytics

Persistent partitioning is effective in avoiding expensive shuffling ope...
research
04/01/2020

Assessing Impact of Data Partitioning for Approximate Memory in C/C++ Code

Approximate memory is a technique to mitigate the performance gap betwee...
research
07/06/2020

Prioritized Restreaming Algorithms for Balanced Graph Partitioning

Balanced graph partitioning is a critical step for many large-scale dist...
research
09/26/2019

Heuristics for Symmetric Rectilinear Matrix Partitioning

Partitioning sparse matrices and graphs is a common and important proble...

Please sign up or login with your details

Forgot password? Click here to reset