Project CGX: Scalable Deep Learning on Commodity GPUs

11/16/2021
by   Ilia Markov, et al.
0

The ability to scale out training workloads has been one of the key performance enablers of deep learning. The main scaling approach is data-parallel GPU-based training, which has been boosted by hardware and software support for highly efficient inter-GPU communication, in particular via bandwidth overprovisioning. This support comes at a price: there is an order of magnitude cost difference between "cloud-grade" servers with such support, relative to their "consumer-grade" counterparts, although server-grade and consumer-grade GPUs can have similar computational envelopes. In this paper, we investigate whether the expensive hardware overprovisioning approach can be supplanted via algorithmic and system design, and propose a framework called CGX, which provides efficient software support for communication compression. We show that this framework is able to remove communication bottlenecks from consumer-grade multi-GPU systems, in the absence of hardware support: when training modern models and tasks to full accuracy, our framework enables self-speedups of 2-3X on a commodity system using 8 consumer-grade NVIDIA RTX 3090 GPUs, and enables it to surpass the throughput of an NVIDIA DGX-1 server, which has similar peak FLOPS but benefits from bandwidth overprovisioning.

READ FULL TEXT

page 10

page 16

research
11/29/2018

Data-parallel distributed training of very large models beyond GPU capacity

GPUs have limited memory and it is difficult to train wide and/or deep m...
research
09/03/2023

FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs

The rapid growth of memory and computation requirements of large languag...
research
10/25/2021

AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning

In the last few years, the memory requirements to train state-of-the-art...
research
08/07/2017

PowerAI DDL

As deep neural networks become more complex and input datasets grow larg...
research
05/21/2019

Performance Analysis of Deep Learning Workloads on Leading-edge Systems

This work examines the performance of leading-edge systems designed for ...
research
09/22/2021

GPU4S: Embedded GPUs in Space – Latest Project Updates

Following the trend of other safety-critical industries like automotive ...
research
07/08/2020

Accelerating Multigrid-based Hierarchical Scientific Data Refactoring on GPUs

Rapid growth in scientific data and a widening gap between computational...

Please sign up or login with your details

Forgot password? Click here to reset