Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

03/12/2020
by   Weijie Zhao, et al.
28

Neural networks of ads systems usually take input from multiple resources, e.g., query-ad relevance, ad features and user portraits. These inputs are encoded into one-hot or multi-hot binary features, with typically only a tiny fraction of nonzero feature values per example. Deep learning models in online advertising industries can have terabyte-scale parameters that do not fit in the GPU memory nor the CPU main memory on a computing node. For example, a sponsored online advertising system can contain more than 10^11 sparse features, making the neural network a massive model with around 10 TB parameters. In this paper, we introduce a distributed GPU hierarchical parameter server for massive scale deep learning ads systems. We propose a hierarchical workflow that utilizes GPU High-Bandwidth Memory, CPU main memory and SSD as 3-layer hierarchical storage. All the neural network training computations are contained in GPUs. Extensive experiments on real-world data confirm the effectiveness and the scalability of the proposed system. A 4-node hierarchical GPU parameter server can train a model more than 2X faster than a 150-node in-memory distributed parameter server in an MPI cluster. In addition, the price-performance ratio of our proposed system is 4-9 times better than an MPI-cluster solution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/05/2022

Communication-Efficient TeraByte-Scale Model Training Framework for Online Advertising

Click-Through Rate (CTR) prediction is a crucial component in the online...
research
11/29/2018

Data-parallel distributed training of very large models beyond GPU capacity

GPUs have limited memory and it is difficult to train wide and/or deep m...
research
12/10/2022

Elixir: Train a Large Language Model on a Small GPU Cluster

In recent years, the number of parameters of one deep learning (DL) mode...
research
10/17/2022

A GPU-specialized Inference Parameter Server for Large-Scale Deep Recommendation Models

Recommendation systems are of crucial importance for a variety of modern...
research
05/28/2020

A Distributed Multi-GPU System for Large-Scale Node Embedding at Tencent

Scaling node embedding systems to efficiently process networks in real-w...
research
02/13/2020

Training Large Neural Networks with Constant Memory using a New Execution Algorithm

Widely popular transformer-based NLP models such as BERT and GPT have en...
research
02/10/2020

Learning@home: Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts

Many recent breakthroughs in deep learning were achieved by training inc...

Please sign up or login with your details

Forgot password? Click here to reset