An I/O-Efficient Disk-based Graph System for Scalable Second-Order Random Walk of Large Graphs

03/30/2022
by   Hongzheng Li, et al.
0

Random walk is widely used in many graph analysis tasks, especially the first-order random walk. However, as a simplification of real-world problems, the first-order random walk is poor at modeling higher-order structures in the data. Recently, second-order random walk-based applications (e.g., Node2vec, Second-order PageRank) have become attractive. Due to the complexity of the second-order random walk models and memory limitations, it is not scalable to run second-order random walk-based applications on a single machine. Existing disk-based graph systems are only friendly to the first-order random walk models and suffer from expensive disk I/Os when executing the second-order random walks. This paper introduces an I/O-efficient disk-based graph system for the scalable second-order random walk of large graphs, called GraSorw. First, to eliminate massive light vertex I/Os, we develop a bi-block execution engine that converts random I/Os into sequential I/Os by applying a new triangular bi-block scheduling strategy, the bucket-based walk management, and the skewed walk storage. Second, to improve the I/O utilization, we design a learning-based block loading model to leverage the advantages of the full-load and on-demand load methods. Finally, we conducted extensive experiments on six large real datasets as well as several synthetic datasets. The empirical results demonstrate that the end-to-end time cost of popular tasks in GraSorw is reduced by more than one order of magnitude compared to the existing disk-based graph systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2018

Algorithmic aspects of graph-indexed random walks

We study three problems regarding the so called graph-indexed random wal...
research
07/26/2021

ThunderRW: An In-Memory Graph Random Walk Engine (Complete Version)

As random walk is a powerful tool in many graph processing, mining and l...
research
11/18/2019

RWNE: A Scalable Random-Walk based Network Embedding Framework with Personalized Higher-order Proximity Preserved

Higher-order proximity preserved network embedding has attracted increas...
research
09/25/2021

Random Walk-steered Majority Undersampling

In this work, we propose Random Walk-steered Majority Undersampling (RWM...
research
04/16/2018

Walk-Steered Convolution for Graph Classification

Graph classification is a fundamental but challenging problem due to the...
research
09/18/2020

C-SAW: A Framework for Graph Sampling and Random Walk on GPUs

Many applications require to learn, mine, analyze and visualize large-sc...
research
11/03/2020

Random Walk Bandits

Bandit learning problems find important applications ranging from medica...

Please sign up or login with your details

Forgot password? Click here to reset