Learning@home: Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts

02/10/2020
by   Maksim Riabinin, et al.
0

Many recent breakthroughs in deep learning were achieved by training increasingly larger models on massive datasets. However, training such models can be prohibitively expensive. For instance, Megatron Language Model with 8.3B parameters was trained on a GPU cluster worth $25 million. As a result, most researchers cannot afford to train state of the art models and contribute to their development. Hypothetically, a researcher could crowdsource the training of large neural networks with thousands of regular PCs provided by volunteers. The raw computing power of ten thousand $2500 desktops dwarfs that of a $25M server pod, but one cannot utilize that power efficiently with conventional distributed training methods. In this work, we propose Learning@home: a neural network training paradigm designed to handle millions of poorly connected participants. We analyze the performance, reliability, and architectural constraints of this paradigm and compare it against existing distributed training techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2022

Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts

Many recent breakthroughs in deep learning were achieved by training inc...
research
06/13/2022

Modern Distributed Data-Parallel Large-Scale Pre-training Strategies For NLP models

Distributed deep learning is becoming increasingly popular due to the ex...
research
06/18/2021

Distributed Deep Learning in Open Collaborations

Modern deep learning applications require increasingly more compute to t...
research
11/09/2021

How to Train Your Neural Network: A Comparative Evaluation

The field of deep learning has witnessed a remarkable shift towards extr...
research
07/08/2020

Distributed Training of Deep Learning Models: A Taxonomic Perspective

Distributed deep learning systems (DDLS) train deep neural network model...
research
03/12/2020

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Neural networks of ads systems usually take input from multiple resource...
research
11/12/2019

92c/MFlops/s, Ultra-Large-Scale Neural-Network Training on a PIII Cluster

Artificial neural networks with millions of adjustable parameters and a ...

Please sign up or login with your details

Forgot password? Click here to reset