TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Training

04/11/2023
by   William Won, et al.
0

Collective communications are an indispensable part of distributed training. Running a topology-aware collective algorithm is crucial for optimizing communication performance by minimizing congestion. Today such algorithms only exist for a small set of simple topologies, limiting the topologies employed in training clusters and handling irregular topologies due to network failures. In this paper, we propose TACOS, an automated topology-aware collective synthesizer for arbitrary input network topologies. TACOS synthesized 3.73x faster All-Reduce algorithm over baselines, and synthesized collective algorithms for 512-NPU system in just 6.1 minutes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2022

Optimal Direct-Connect Topologies for Collective Communications

We consider the problem of distilling optimal network topologies for col...
research
08/19/2020

Synthesizing Optimal Collective Algorithms

Collective communication algorithms are an important component of distri...
research
06/14/2018

PADS: Practical Attestation for Highly Dynamic Swarm Topologies

Remote attestation protocols are widely used to detect device configurat...
research
07/10/2018

Optimal Network Topology for Effective Collective Response

Natural, social, and artificial multi-agent systems usually operate in d...
research
05/22/2023

Rethinking Machine Learning Collective Communication as a Multi-Commodity Flow Problem

We show communication schedulers' recent work proposed for ML collective...
research
02/20/2023

TA-MoE: Topology-Aware Large Scale Mixture-of-Expert Training

Sparsely gated Mixture-of-Expert (MoE) has demonstrated its effectivenes...
research
02/13/2023

Expediting Distributed DNN Training with Device Topology-Aware Graph Deployment

This paper presents TAG, an automatic system to derive optimized DNN tra...

Please sign up or login with your details

Forgot password? Click here to reset