An argument in favor of strong scaling for deep neural networks with small datasets

07/24/2018
by   Renato L. de F. Cunha, et al.
0

In recent years, with the popularization of deep learning frameworks and large datasets, researchers have started parallelizing their models in order to train faster. This is crucially important, because they typically explore many hyperparameters in order to find the best ones for their applications. This process is time consuming and, consequently, speeding up training improves productivity. One approach to parallelize deep learning models followed by many researchers is based on weak scaling. The minibatches increase in size as new GPUs are added to the system. In addition, new learning rates schedules have been proposed to fix optimization issues that occur with large minibatch sizes. In this paper, however, we show that the recommendations provided by recent work do not apply to models that lack large datasets. In fact, we argument in favor of using strong scaling for achieving reliable performance in such cases. We evaluated our approach with up to 32 GPUs and show that weak scaling not only does not have the same accuracy as the sequential model, it also fails to converge most of time. Meanwhile, strong scaling has good scalability while having exactly the same accuracy of a sequential implementation.

READ FULL TEXT
research
06/08/2017

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Deep learning thrives with large neural networks and large datasets. How...
research
05/01/2018

Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems

We propose two new methods to address the weak scaling problems of KRR: ...
research
08/03/2018

Large Scale Language Modeling: Converging on 40GB of Text in Four Hours

Recent work has shown how to train Convolutional Neural Networks (CNNs) ...
research
07/25/2020

The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism

We present scalable hybrid-parallel algorithms for training large-scale ...
research
10/23/2018

Language Modeling at Scale

We show how Zipf's Law can be used to scale up language modeling (LM) to...
research
05/27/2017

AMPNet: Asynchronous Model-Parallel Training for Dynamic Neural Networks

New types of machine learning hardware in development and entering the m...
research
04/20/2023

Scaling the leading accuracy of deep equivariant models to biomolecular simulations of realistic size

This work brings the leading accuracy, sample efficiency, and robustness...

Please sign up or login with your details

Forgot password? Click here to reset