An Overview of the Data-Loader Landscape: Comparative Performance Analysis

09/27/2022
by   Iason Ofeidis, et al.
0

Dataloaders, in charge of moving data from storage into GPUs while training machine learning models, might hold the key to drastically improving the performance of training jobs. Recent advances have shown promise not only by considerably decreasing training time but also by offering new features such as loading data from remote storage like S3. In this paper, we are the first to distinguish the dataloader as a separate component in the Deep Learning (DL) workflow and to outline its structure and features. Finally, we offer a comprehensive comparison of the different dataloading libraries available, their trade-offs in terms of functionality, usability, and performance and the insights derived from them.

READ FULL TEXT

page 6

page 7

page 8

page 13

page 14

page 15

page 16

page 17

research
05/17/2022

Moving Stuff Around: A study on efficiency of moving documents into memory for Neural IR models

When training neural rankers using Large Language Models, it's expected ...
research
01/07/2020

High Performance I/O For Large Scale Deep Learning

Training deep learning (DL) models on petascale datasets is essential fo...
research
12/06/2018

A Review of Homomorphic Encryption Libraries for Secure Computation

In this paper we provide a survey of various libraries for homomorphic e...
research
10/01/2020

PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep Learning Clusters

DNN learning jobs are common in today's clusters due to the advances in ...
research
11/25/2022

Deep Learning Training Procedure Augmentations

Recent advances in Deep Learning have greatly improved performance on va...
research
12/14/2018

Data Provenance for Sport

Data analysts often discover irregularities in their underlying dataset,...

Please sign up or login with your details

Forgot password? Click here to reset