GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding

03/02/2019
by   Zhaocheng Zhu, et al.
4

Learning continuous representations of nodes is attracting growing interest in both academia and industry recently, due to their simplicity and effectiveness in a variety of applications. Most of existing node embedding algorithms and systems are capable of processing networks with hundreds of thousands or a few millions of nodes. However, how to scale them to networks that have tens of millions or even hundreds of millions of nodes remains a challenging problem. In this paper, we propose GraphVite, a high-performance CPU-GPU hybrid system for training node embeddings, by co-optimizing the algorithm and the system. On the CPU end, augmented edge samples are parallelly generated by random walks in an online fashion on the network, and serve as the training data. On the GPU end, a novel parallel negative sampling is proposed to leverage multiple GPUs to train node embeddings simultaneously, without much data transfer and synchronization. Moreover, an efficient collaboration strategy is proposed to further reduce the synchronization cost between CPUs and GPUs. Experiments on multiple real-world networks show that GraphVite is super efficient. It takes only about one minute for a network with 1 million nodes and 5 million edges on a single machine with 4 GPUs, and takes around 20 hours for a network with 66 million nodes and 1.8 billion edges. Compared to the current fastest system, GraphVite is about 50 times faster without any sacrifice on performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2020

A Distributed Multi-GPU System for Large-Scale Node Embedding at Tencent

Scaling node embedding systems to efficiently process networks in real-w...
research
05/18/2021

OpenGraphGym-MG: Using Reinforcement Learning to Solve Large Graph Optimization Problems on MultiGPU Systems

Large scale graph optimization problems arise in many fields. This paper...
research
11/25/2022

HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement

2.5D integration is an important technique to tackle the growing cost of...
research
10/13/2021

Scalable Graph Embedding LearningOn A Single GPU

Graph embedding techniques have attracted growing interest since they co...
research
02/24/2020

Optimizing High Performance Markov Clustering for Pre-Exascale Architectures

HipMCL is a high-performance distributed memory implementation of the po...
research
04/10/2023

Faster Lead Optimization Mapper Algorithm for Large-Scale Relative Free Energy Perturbation

In recent years, free energy perturbation (FEP) calculations have garner...
research
08/27/2023

SPEED: Streaming Partition and Parallel Acceleration for Temporal Interaction Graph Embedding

Temporal Interaction Graphs (TIGs) are widely employed to model intricat...

Please sign up or login with your details

Forgot password? Click here to reset