Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines

02/17/2022
by   Alexander Isenko, et al.
0

Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with hardware innovations (e.g., faster GPUs, TPUs, and inter-connects) and advanced parallelization techniques that yield better scalability. At the same time, the amount of training data needed in order to train increasingly complex models is growing. As a consequence of this development, data preprocessing and provisioning are becoming a severe bottleneck in end-to-end deep learning pipelines. In this paper, we provide an in-depth analysis of data preprocessing pipelines from four different machine learning domains. We introduce a new perspective on efficiently preparing datasets for end-to-end deep learning pipelines and extract individual trade-offs to optimize throughput, preprocessing time, and storage consumption. Additionally, we provide an open-source profiling library that can automatically decide on a suitable preprocessing strategy to maximize throughput. By applying our generated insights to real-world use-cases, we obtain an increased throughput of 3x to 13x compared to an untuned system while keeping the pipeline functionally identical. These findings show the enormous potential of data pipeline tuning.

READ FULL TEXT
research
11/09/2022

RecD: Deduplication for End-to-End Deep Learning Recommendation Model Training Infrastructure

We present RecD (Recommendation Deduplication), a suite of end-to-end in...
research
07/25/2020

Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics

While deep neural networks (DNNs) are an increasingly popular way to que...
research
04/18/2023

Understand Data Preprocessing for Effective End-to-End Training of Deep Neural Networks

In this paper, we primarily focus on understanding the data preprocessin...
research
11/01/2022

Strategies for Optimizing End-to-End Artificial Intelligence Pipelines on Intel Xeon Processors

End-to-end (E2E) artificial intelligence (AI) pipelines are composed of ...
research
08/21/2017

nuts-flow/ml: data pre-processing for deep learning

Data preprocessing is a fundamental part of any machine learning applica...
research
08/24/2023

IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency

Efficiently optimizing multi-model inference pipelines for fast, accurat...
research
05/03/2021

OpTorch: Optimized deep learning architectures for resource limited environments

Deep learning algorithms have made many breakthroughs and have various a...

Please sign up or login with your details

Forgot password? Click here to reset