PartitionedVC: Partitioned External Memory Graph Analytics Framework for SSDs

by   Kiran Kumar Matam, et al.
University of Southern California

Graphs analytics are at the heart of a broad range of applications such as drug discovery, page ranking, transportation systems, and recommendation systems. When graph size exceeds memory size, out-of-core graph processing is needed. For the widely used external memory graph processing systems, accessing storage becomes the bottleneck. We make the observation that nearly all graph algorithms have a dynamically varying number of active vertices that must be processed in each iteration. However, existing graph processing frameworks, such as GraphChi, load the entire graph in each iteration even if a small fraction of the graph is active. This limitation is due to the structure of the data storage used by these systems. In this work, we propose to use a compressed sparse row (CSR) based graph storage that is more amenable for selectively loading only a few active vertices in each iteration. But CSR based graph processing suffers from random update propagation to many target vertices. To solve this challenge we propose to use a multi-log update mechanism which logs updates separately, rather than directly update the active edge in a graph. Our proposed multi-log system maintains a separate log per each vertex interval. This separation enables us to efficiently process each vertex interval by just loading the corresponding log. Over the current state of the art out-of-core graph processing framework, our evaluation results show that the PartitionedVC framework improves performance by up to 16.40×, 1.13×, 1.64×, 1.38×, and 2.76× for the widely used breadth-first search, pagerank, community detection, graph coloring, and the maximal independent set applications, respectively.


page 10

page 11


BigSparse: High-performance external graph analytics

We present BigSparse, a fully external graph analytics system that picks...

Fully Dynamic Maximal Independent Set with Polylogarithmic Update Time

We present the first algorithm for maintaining a maximal independent set...

Dynamic Maximal Independent Set

Given a stream S of insertions and deletions of edges of an underlying g...

TurboGraph++: A Scalable and Fast Graph Analytics System

Existing distributed graph analytics systems are categorized into two ma...

GraphMP: I/O-Efficient Big Graph Analytics on a Single Commodity Machine

Recent studies showed that single-machine graph processing systems can b...

A Structure-aware Approach for Efficient Graph Processing

With the advent of the big data, graph are processed in an iterative man...

Simplified Algorithms for Order-Based Core Maintenance

Graph analytics attract much attention from both research and industry c...

Please sign up or login with your details

Forgot password? Click here to reset