TurboGraph++: A Scalable and Fast Graph Analytics System

by   seongyun, et al.

Existing distributed graph analytics systems are categorized into two main groups: those that focus on efficiency with a risk of out-of memory error and those that focus on scale-up with a fixed memory budget and a sacrifice in performance. While the former group keeps a partitioned graph resident in memory of each machine and uses an in-memory processing technique, the latter stores the partitioned graph in external memory of each machine and exploits a streaming processing technique. Gemini and Chaos are the state-of-the-art distributed graph systems in each group, respectively. We present TurboGraph++, a scalable and fast graph analytics system which efficiently processes large graphs by exploiting external memory for scale-up without compromising efficiency. First, TurboGraph++ provides a new graph processing abstraction for efficiently supporting neighborhood analytics that requires processing multi-hop neighborhoods of vertices, such as triangle counting and local clustering coefficient computation, with a fixed memory budget. Second, TurboGraph++ provides a balanced and buffer-aware partitioning scheme for ensuring balanced workloads across machines with reasonable cost. Lastly, TurboGraph++ leverages three-level parallel and overlapping processing for fully utilizing three hardware resources, CPU, disk, and network, in a cluster. Extensive experiments show that TurboGraph++ is designed to scale well to very large graphs, like Chaos, while its performance is comparable to Gemini.


page 1

page 2

page 3

page 4


Hybrid Edge Partitioner: Partitioning Large Power-Law Graphs under Memory Constraints

Distributed systems that manage and process graph-structured data intern...

System G Distributed Graph Database

Motivated by the need to extract knowledge and value from interconnected...

Experimental Analysis of Distributed Graph Systems

This paper evaluates eight parallel graph processing systems: Hadoop, Ha...

BigSparse: High-performance external graph analytics

We present BigSparse, a fully external graph analytics system that picks...

Multi-Dimensional Balanced Graph Partitioning via Projected Gradient Descent

Motivated by performance optimization of large-scale graph processing sy...

PartitionedVC: Partitioned External Memory Graph Analytics Framework for SSDs

Graphs analytics are at the heart of a broad range of applications such ...

Graph Sampling with Distributed In-Memory Dataflow Systems

Given a large graph, a graph sample determines a subgraph with similar c...

Please sign up or login with your details

Forgot password? Click here to reset