TurboGraph++: A Scalable and Fast Graph Analytics System

11/17/2020
by   seongyun, et al.
0

Existing distributed graph analytics systems are categorized into two main groups: those that focus on efficiency with a risk of out-of memory error and those that focus on scale-up with a fixed memory budget and a sacrifice in performance. While the former group keeps a partitioned graph resident in memory of each machine and uses an in-memory processing technique, the latter stores the partitioned graph in external memory of each machine and exploits a streaming processing technique. Gemini and Chaos are the state-of-the-art distributed graph systems in each group, respectively. We present TurboGraph++, a scalable and fast graph analytics system which efficiently processes large graphs by exploiting external memory for scale-up without compromising efficiency. First, TurboGraph++ provides a new graph processing abstraction for efficiently supporting neighborhood analytics that requires processing multi-hop neighborhoods of vertices, such as triangle counting and local clustering coefficient computation, with a fixed memory budget. Second, TurboGraph++ provides a balanced and buffer-aware partitioning scheme for ensuring balanced workloads across machines with reasonable cost. Lastly, TurboGraph++ leverages three-level parallel and overlapping processing for fully utilizing three hardware resources, CPU, disk, and network, in a cluster. Extensive experiments show that TurboGraph++ is designed to scale well to very large graphs, like Chaos, while its performance is comparable to Gemini.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro