Horovod: fast and easy distributed deep learning in TensorFlow

by   Alexander Sergeev, et al.

Training modern deep learning models requires large amounts of computation, often provided by GPUs. Scaling computation from one GPU to many can enable much faster training and research progress but entails two complications. First, the training library must support inter-GPU communication. Depending on the particular methods employed, this communication may entail anywhere from negligible to significant overhead. Second, the user must modify his or her training code to take advantage of inter-GPU communication. Depending on the training library's API, the modification required may be either significant or minimal. Existing methods for enabling multi-GPU training under the TensorFlow library entail non-negligible communication overhead and require users to heavily modify their model-building code, leading many researchers to avoid the whole mess and stick with slower single-GPU training. In this paper we introduce Horovod, an open source library that improves on both obstructions to scaling: it employs efficient inter-GPU communication via ring reduction and requires only a few lines of modification to user code, enabling faster, easier distributed training in TensorFlow. Horovod is available under the Apache 2.0 license at https://github.com/uber/horovod.


page 1

page 2

page 3

page 4


Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters

Deep learning models can take weeks to train on a single GPU-equipped ma...


As deep neural networks become more complex and input datasets grow larg...

TensorX: Extensible API for Neural Network Model Design and Deployment

TensorX is a Python library for prototyping, design, and deployment of c...

AdaNet: A Scalable and Flexible Framework for Automatically Learning Ensembles

AdaNet is a lightweight TensorFlow-based (Abadi et al., 2015) framework ...

DeepLab2: A TensorFlow Library for Deep Labeling

DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a ...

MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs

New architecture GPUs like A100 are now equipped with multi-instance GPU...

Non-Determinism in TensorFlow ResNets

We show that the stochasticity in training ResNets for image classificat...

Code Repositories


Distributed training framework for TensorFlow.

view repo

Please sign up or login with your details

Forgot password? Click here to reset