Data optimization for large batch distributed training of deep neural networks

12/16/2020
by   Shubhankar Gahlot, et al.
0

Distributed training in deep learning (DL) is common practice as data and models grow. The current practice for distributed training of deep neural networks faces the challenges of communication bottlenecks when operating at scale, and model accuracy deterioration with an increase in global batch size. Present solutions focus on improving message exchange efficiency as well as implementing techniques to tweak batch sizes and models in the training process. The loss of training accuracy typically happens because the loss function gets trapped in a local minima. We observe that the loss landscape minimization is shaped by both the model and training data and propose a data optimization approach that utilizes machine learning to implicitly smooth out the loss landscape resulting in fewer local minima. Our approach filters out data points which are less important to feature learning, enabling us to speed up the training of models on larger batch sizes to improved accuracy.

READ FULL TEXT

page 3

page 6

research
12/06/2017

AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks

Training deep neural networks with Stochastic Gradient Descent, or its v...
research
02/25/2021

An introduction to distributed training of deep neural networks for segmentation tasks with large seismic datasets

Deep learning applications are drastically progressing in seismic proces...
research
04/19/2023

A Theory on Adam Instability in Large-Scale Machine Learning

We present a theory for the previously unexplained divergent behavior no...
research
07/07/2022

A Solver + Gradient Descent Training Algorithm for Deep Neural Networks

We present a novel hybrid algorithm for training Deep Neural Networks th...
research
04/14/2023

Who breaks early, looses: goal oriented training of deep neural networks based on port Hamiltonian dynamics

The highly structured energy landscape of the loss as a function of para...
research
02/22/2021

Non-Convex Optimization with Spectral Radius Regularization

We develop a regularization method which finds flat minima during the tr...
research
06/14/2021

Extracting Global Dynamics of Loss Landscape in Deep Learning Models

Deep learning models evolve through training to learn the manifold in wh...

Please sign up or login with your details

Forgot password? Click here to reset