Kudu: An Efficient and Scalable Distributed Graph Pattern Mining Engine

05/08/2021
by   Jingji Chen, et al.
0

This paper proposes Kudu, a general distributed execution engine with a well-defined abstraction that can be integrated with various existing single-machine graph pattern mining (GPM) systems. With this approach, the programming interfaces and codes based on existing GPM systems do not change and Kudu can transparently enable the distributed execution. The key novelty is extendable embedding which can express pattern enumeration algorithm and enable fine-grained task scheduling. To enable efficient scheduling, we propose a novel BFS-DFS hybrid exploration method that generates sufficient concurrent tasks without incurring high memory consumption. The computation and communication of Kudu can be further optimized with several effective techniques. We implemented two scalable distributed GPM systems by porting Automine and GraphPi on Kudu. Our evaluation shows that Kudu-based systems significantly outperform state-of-the-art graph partition-based GPM systems by up to three orders of magnitude, achieve similar or even better performance compared with the fastest graph replication-based systems, and scale to large datasets with graph partitioning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/18/2021

DFOGraph: An I/O- and Communication-Efficient System for Distributed Fully-out-of-Core Graph Processing

With the magnitude of graph-structured data continually increasing, grap...
research
04/06/2020

Peregrine: A Pattern-Aware Graph Mining System

Graph mining workloads aim to extract structural properties of a graph b...
research
08/04/2021

UniGPS: A Unified Programming Framework for Distributed Graph Processing

The industry and academia have proposed many distributed graph processin...
research
08/21/2020

DwarvesGraph: A High-Performance Graph Mining System with Pattern Decomposition

Graph mining tasks, which focus on extracting structural information fro...
research
03/02/2020

Graph3S: A Simple, Speedy and Scalable Distributed Graph Processing System

Graph is a ubiquitous structure in many domains. The rapidly increasing ...
research
09/14/2018

Graph Pattern Mining and Learning through User-defined Relations (Extended Version)

In this work we propose R-GPM, a parallel computing framework for graph ...
research
06/21/2018

Experimental Analysis of Distributed Graph Systems

This paper evaluates eight parallel graph processing systems: Hadoop, Ha...

Please sign up or login with your details

Forgot password? Click here to reset