L3: Accelerator-Friendly Lossless Image Format for High-Resolution, High-Throughput DNN Training

08/18/2022
by   Jonghyun Bae, et al.
0

The training process of deep neural networks (DNNs) is usually pipelined with stages for data preparation on CPUs followed by gradient computation on accelerators like GPUs. In an ideal pipeline, the end-to-end training throughput is eventually limited by the throughput of the accelerator, not by that of data preparation. In the past, the DNN training pipeline achieved a near-optimal throughput by utilizing datasets encoded with a lightweight, lossy image format like JPEG. However, as high-resolution, losslessly-encoded datasets become more popular for applications requiring high accuracy, a performance problem arises in the data preparation stage due to low-throughput image decoding on the CPU. Thus, we propose L3, a custom lightweight, lossless image format for high-resolution, high-throughput DNN training. The decoding process of L3 is effectively parallelized on the accelerator, thus minimizing CPU intervention for data preparation during DNN training. L3 achieves a 9.29x higher data preparation throughput than PNG, the most popular lossless image format, for the Cityscapes dataset on NVIDIA A100 GPU, which leads to 1.71x higher end-to-end training throughput. Compared to JPEG and WebP, two popular lossy image formats, L3 provides up to 1.77x and 2.87x higher end-to-end training throughput for ImageNet, respectively, at equivalent metric performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2023

D-STACK: High Throughput DNN Inference by Effective Multiplexing and Spatio-Temporal Scheduling of GPUs

Hardware accelerators such as GPUs are required for real-time, low-laten...
research
09/09/2016

Nanosurveyor: a framework for real-time data processing

Scientists are drawn to synchrotrons and accelerator based light sources...
research
07/25/2020

Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics

While deep neural networks (DNNs) are an increasingly popular way to que...
research
02/18/2020

Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration

The research interest in specialized hardware accelerators for deep neur...
research
04/16/2015

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

We present Caffe con Troll (CcT), a fully compatible end-to-end version ...
research
07/06/2023

OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload

Modern Deep Neural Networks (DNNs) exhibit profound efficiency and accur...
research
04/23/2019

Replay attack detection with complementary high-resolution information using end-to-end DNN for the ASVspoof 2019 Challenge

In this study, we concentrate on replacing the process of extracting han...

Please sign up or login with your details

Forgot password? Click here to reset