A Distributed Multi-GPU System for Large-Scale Node Embedding at Tencent

05/28/2020

∙

Scaling node embedding systems to efficiently process networks in real-world applications that often contain hundreds of billions of edges with high-dimension node features remains a challenging problem. In this paper we present a high-performance multi-GPU node embedding system that uses hybrid model data parallel training. We propose a hierarchical data partitioning strategy and an embedding training pipeline to optimize both communication and memory usage on a GPU cluster. With the decoupled design of our random walk engine and embedding training engine, we can run both random walk and embedding training with high flexibility to fully utilize all computing resources on a GPU cluster. We evaluate the system on real-world and synthesized networks with various node embedding tasks. Using 40 NVIDIA V100 GPUs on a network with over two hundred billion edges and one billion nodes, our implementation requires only 200 seconds to finish one training epoch. We also achieve 5.9x-14.4x speedup on average over the current state-of-the-art multi-GPU single-node embedding system with competitive or better accuracy on open datasets.

READ FULL TEXT

A Distributed Multi-GPU System for Large-Scale Node Embedding at Tencent

Sign in with Google

Consider DeepAI Pro