Analyzing and Mitigating Data Stalls in DNN Training

07/14/2020
by   Jayashree Mohan, et al.
0

Training Deep Neural Networks (DNNs) is resource-intensive and time-consuming. While prior research has explored many different ways of reducing DNN training time, the impact of input data pipeline, i.e., fetching raw data items from storage and performing data pre-processing in memory, has been relatively unexplored. This paper makes the following contributions: (1) We present the first comprehensive analysis of how the input data pipeline affects the training time of widely-used computer vision and audio Deep Neural Networks (DNNs), that typically involve complex data preprocessing. We analyze nine different models across three tasks and four datasets while varying factors such as the amount of memory, number of CPU threads, storage device, GPU generation etc on servers that are a part of a large production cluster at Microsoft. We find that in many cases, DNN training time is dominated by data stall time: time spent waiting for data to be fetched and preprocessed. (2) We build a tool, DS-Analyzer to precisely measure data stalls using a differential technique, and perform predictive what-if analysis on data stalls. (3) Finally, based on the insights from our analysis, we design and implement three simple but effective techniques in a data-loading library, CoorDL, to mitigate data stalls. Our experiments on a range of DNN tasks, models, datasets, and hardware configs show that when PyTorch uses CoorDL instead of the state-of-the-art DALI data loading library, DNN training time is reduced significantly (by as much as 5x on a single server).

READ FULL TEXT

page 7

page 11

page 19

page 22

page 24

research
10/24/2022

Selecting and Composing Learning Rate Policies for Deep Neural Networks

The choice of learning rate (LR) functions and policies has evolved from...
research
10/06/2022

Enabling Deep Learning on Edge Devices

Deep neural networks (DNNs) have succeeded in many different perception ...
research
04/18/2023

Understand Data Preprocessing for Effective End-to-End Training of Deep Neural Networks

In this paper, we primarily focus on understanding the data preprocessin...
research
10/07/2019

Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

Modern neural networks are increasingly bottlenecked by the limited capa...
research
03/16/2018

TBD: Benchmarking and Analyzing Deep Neural Network Training

The recent popularity of deep neural networks (DNNs) has generated a lot...
research
11/03/2017

Accelerating Training of Deep Neural Networks via Sparse Edge Processing

We propose a reconfigurable hardware architecture for deep neural networ...
research
06/12/2022

Learning-Based Data Storage [Vision] (Technical Report)

Deep neural network (DNN) and its variants have been extensively used fo...

Please sign up or login with your details

Forgot password? Click here to reset