Kaleido: An Efficient Out-of-core Graph Mining System on A Single Machine

05/23/2019
by   Cheng Zhao, et al.
0

Graph mining is one of the most important categories of graph algorithms. However, exploring the subgraphs of an input graph produces a huge amount of intermediate data. The 'think like a vertex' programming paradigm, pioneered by Pregel, cannot readily formulate mining problems, which is designed to produce graph computation problems like PageRank. Existing mining systems like Arabesque and RStream need large amounts of computing and memory resources. In this paper, we present Kaleido, an efficient single machine, out-of-core graph mining system which treats disks as an extension of memory. Kaleido treats intermediate data in graph mining tasks as a tensor and adopts a succinct data structure for the intermediate data. Kaleido utilizes the eigenvalue of the adjacency matrix of a subgraph to efficiently solve the subgraph isomorphism problems with an acceptable constraint that the vertex number of a subgraph is less than 9. Kaleido implements half-memory-half-disk storage for storing large intermediate data, which treats the disk as an extension of the memory. Comparing with two state-of-the-art mining systems, Arabesque and RStream, Kaleido outperforms them by a GeoMean 12.3× and 40.0× respectively.

READ FULL TEXT

page 10

page 12

research
04/06/2020

Peregrine: A Pattern-Aware Graph Mining System

Graph mining workloads aim to extract structural properties of a graph b...
research
07/03/2020

Finding Densest k-Connected Subgraphs

Dense subgraph discovery is an important graph-mining primitive with a v...
research
07/24/2018

An Efficient System for Subgraph Discovery

Subgraph discovery in a single data graph---finding subsets of vertices ...
research
12/08/2022

Efficient Strategies for Graph Pattern Mining Algorithms on GPUs

Graph Pattern Mining (GPM) is an important, rapidly evolving, and comput...
research
11/16/2019

Pangolin: An Efficient and Flexible Graph Mining System on CPU and GPU

There is growing interest in graph mining algorithms such as motif count...
research
01/23/2019

Fast and Robust Distributed Subgraph Enumeration

We study the classic subgraph enumeration problem under distributed sett...
research
02/14/2022

Gain-loss ratio of storing intermediate data from workflows

Sequentially, the systematic processing of a significant amount of data ...

Please sign up or login with your details

Forgot password? Click here to reset