Flare: Flexible In-Network Allreduce

06/29/2021
by   Daniele De Sensi, et al.
1

The allreduce operation is one of the most commonly used communication routines in distributed applications. To improve its bandwidth and to reduce network traffic, this operation can be accelerated by offloading it to network switches, that aggregate the data received from the hosts, and send them back the aggregated result. However, existing solutions provide limited customization opportunities and might provide suboptimal performance when dealing with custom operators and data types, with sparse data, or when reproducibility of the aggregation is a concern. To deal with these problems, in this work we design a flexible programmable switch by using as a building block PsPIN, a RISC-V architecture implementing the sPIN programming model. We then design, model, and analyze different algorithms for executing the aggregation on this architecture, showing performance improvements compared to state-of-the-art approaches.

READ FULL TEXT
research
03/28/2019

SwitchAgg:A Further Step Towards In-Network Computation

Many distributed applications adopt a partition/aggregation pattern to a...
research
05/11/2022

Libra: In-network Gradient Aggregation for Speeding up Distributed Sparse Deep Training

Distributed sparse deep learning has been widely used in many internet-s...
research
01/17/2022

Efficient Data-Plane Memory Scheduling for In-Network Aggregation

As the scale of distributed training grows, communication becomes a bott...
research
05/10/2023

P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAs

Generalized linear models (GLMs) are a widely utilized family of machine...
research
08/10/2018

Efficient Measurement on Programmable SwitchesUsing Probabilistic Recirculation

Programmable network switches promise flexibility and high throughput, e...
research
01/13/2021

ZipLine: In-Network Compression at Line Speed

Network appliances continue to offer novel opportunities to offload proc...
research
04/05/2021

Meta-level issues in Offloading: Scoping, Composition, Development, and their Automation

This paper argues for an accelerator development toolchain that takes in...

Please sign up or login with your details

Forgot password? Click here to reset