Accelerating Data Loading in Deep Neural Network Training

10/02/2019
by   Chih-Chieh Yang, et al.
0

Data loading can dominate deep neural network training time on large-scale systems. We present a comprehensive study on accelerating data loading performance in large-scale distributed training. We first identify performance and scalability issues in current data loading implementations. We then propose optimizations that utilize CPU resources to the data loader design. We use an analytical model to characterize the impact of data loading on the overall training time and establish the performance trend as we scale up distributed training. Our model suggests that I/O rate limits the scalability of distributed training, which inspires us to design a locality-aware data loading method. By utilizing software caches, our method can drastically reduce the data loading communication volume in comparison with the original data loading implementation. Finally, we evaluate the proposed optimizations with various experiments. We achieved more than 30x speedup in data loading using 256 nodes with 1,024 learners.

READ FULL TEXT

page 4

page 9

research
04/12/2021

Accelerating Neural Network Training with Distributed Asynchronous and Selective Optimization (DASO)

With increasing data and model complexities, the time required to train ...
research
03/06/2020

Communication Optimization Strategies for Distributed Deep Learning: A Survey

Recent trends in high-performance computing and deep learning lead to a ...
research
03/10/2020

Communication-Efficient Distributed Deep Learning: A Comprehensive Survey

Distributed deep learning becomes very common to reduce the overall trai...
research
01/25/2023

Accelerating Domain-aware Deep Learning Models with Distributed Training

Recent advances in data-generating techniques led to an explosive growth...
research
10/01/2019

Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos

Deep video recognition is more computationally expensive than image reco...
research
06/15/2023

Sampling-Based Techniques for Training Deep Neural Networks with Limited Computational Resources: A Scalability Evaluation

Deep neural networks are superior to shallow networks in learning comple...
research
11/10/2015

Reducing the Training Time of Neural Networks by Partitioning

This paper presents a new method for pre-training neural networks that c...

Please sign up or login with your details

Forgot password? Click here to reset