8-Bit Approximations for Parallelism in Deep Learning

11/14/2015
by   Tim Dettmers, et al.
0

The creation of practical deep learning data-products often requires parallelization across processors and computers to make deep learning feasible on large data sets, but bottlenecks in communication bandwidth make it difficult to attain good speedups through parallelism. Here we develop and test 8-bit approximation algorithms which make better use of the available bandwidth by compressing 32-bit gradients and nonlinear activations to 8-bit approximations. We show that these approximations do not decrease predictive performance on MNIST, CIFAR10, and ImageNet for both model and data parallelism and provide a data transfer speedup of 2x relative to 32-bit parallelism. We build a predictive model for speedups based on our experimental data, verify its validity on known speedup data, and show that we can obtain a speedup of 50x and more on a system of 96 GPUs compared to a speedup of 23x for 32-bit. We compare our data types with other methods and show that 8-bit approximations achieve state-of-the-art speedups for model parallelism. Thus 8-bit approximation is an efficient method to parallelize convolutional networks on very large systems of GPUs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/08/2018

Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform

The training process of Deep Neural Network (DNN) is compute-intensive, ...
research
06/22/2020

LAMP: Large Deep Nets with Automated Model Parallelism for Image Segmentation

Deep Learning (DL) models are becoming larger, because the increase in m...
research
03/13/2013

Parallelizing Probabilistic Inference: Some Early Explorations

We report on an experimental investigation into opportunities for parall...
research
04/11/2020

Bit-Parallel Vector Composability for Neural Acceleration

Conventional neural accelerators rely on isolated self-sufficient functi...
research
09/15/2023

Neural Network Exemplar Parallelization with Go

This paper presents a case for exemplar parallelism of neural networks u...
research
06/30/2020

Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs

Despite foreseeing tremendous speedups over conventional deep neural net...
research
11/13/2022

FullPack: Full Vector Utilization for Sub-Byte Quantized Inference on General Purpose CPUs

Although prior art has demonstrated negligible accuracy drop in sub-byte...

Please sign up or login with your details

Forgot password? Click here to reset