High Performance I/O For Large Scale Deep Learning

01/07/2020
by   Alex Aizman, et al.
0

Training deep learning (DL) models on petascale datasets is essential for achieving competitive and state-of-the-art performance in applications such as speech, video analytics, and object recognition. However, existing distributed filesystems were not developed for the access patterns and usability requirements of DL jobs. In this paper, we describe AIStore, a highly scalable, easy-to-deploy storage system, and WebDataset, a standards-based storage format and library that permits efficient access to very large datasets. We compare system performance experimentally using image classification workloads and storing training data on a variety of backends, including local SSDs, single-node NFS, and two identical bare-metal clusters: HDFS and AIStore.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/27/2018

FanStore: Enabling Efficient and Scalable I/O for Distributed Deep Learning

Emerging Deep Learning (DL) applications introduce heavy I/O workloads o...
research
01/04/2023

Analyzing I/O Performance of a Hierarchical HPC Storage System for Distributed Deep Learning

Today, deep learning is an essential technology for our life. To solve m...
research
09/27/2022

An Overview of the Data-Loader Landscape: Comparative Performance Analysis

Dataloaders, in charge of moving data from storage into GPUs while train...
research
05/17/2018

Dependability in a Multi-tenant Multi-framework Deep Learning as-a-Service Platform

Deep learning (DL), a form of machine learning, is becoming increasingly...
research
08/13/2021

Quantifying and Improving Performance of Distributed Deep Learning with Cloud Storage

Cloud computing provides a powerful yet low-cost environment for distrib...
research
05/18/2020

HaoCL: Harnessing Large-scale Heterogeneous Processors Made Easy

The pervasive adoption of Deep Learning (DL) and Graph Processing (GP) m...
research
09/14/2019

FfDL : A Flexible Multi-tenant Deep Learning Platform

Deep learning (DL) is becoming increasingly popular in several applicati...

Please sign up or login with your details

Forgot password? Click here to reset