Libra: In-network Gradient Aggregation for Speeding up Distributed Sparse Deep Training

05/11/2022
by   Heng Pan, et al.
0

Distributed sparse deep learning has been widely used in many internet-scale applications. Network communication is one of the major hurdles for the training performance. In-network gradient aggregation on programmable switches is a promising solution to speed up the performance. Nevertheless,existing in-network aggregation solutions are designed for the distributed dense deep training, and fall short when used for the sparse deep training.To address this gap, we present Libra based on our key observation of the extremely biased update frequency of parameters in distributed deep sparse training. Specifically, Libra offloads only the aggregation for "hot" parameters that are updated frequently onto programmable switches. To enable this offloading and achieve high aggregation throughput, we propose solutions to address the challenges related to hot parameter identification, parameter orchestration, floating-point summation on switches as well as system reliability. We implemented Libra on Intel Tofino switches and integrated it with PS-lite. Finally, we evaluate Libra's performance through extensive experiments and show that Libra can speed up the gradient aggregation by 1.5 4 times.

READ FULL TEXT

page 2

page 3

page 5

page 6

page 7

page 8

page 13

page 14

research
12/11/2021

Unlocking the Power of Inline Floating-Point Operations on Programmable Switches

The advent of switches with programmable dataplanes has enabled the rapi...
research
10/05/2021

S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning

Distributed stochastic gradient descent (SGD) approach has been widely u...
research
06/29/2021

Flare: Flexible In-Network Allreduce

The allreduce operation is one of the most commonly used communication r...
research
03/28/2019

SwitchAgg:A Further Step Towards In-Network Computation

Many distributed applications adopt a partition/aggregation pattern to a...
research
01/19/2022

Near-Optimal Sparse Allreduce for Distributed Deep Learning

Communication overhead is one of the major obstacles to train large deep...
research
05/17/2023

Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation

Distributed training of GNNs enables learning on massive graphs (e.g., s...

Please sign up or login with your details

Forgot password? Click here to reset