μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching

04/13/2018
by   Yosuke Oyama, et al.
0

NVIDIA cuDNN is a low-level library that provides GPU kernels frequently used in deep learning. Specifically, cuDNN implements several equivalent convolution algorithms, whose performance and memory footprint may vary considerably, depending on the layer dimensions. When an algorithm is automatically selected by cuDNN, the decision is performed on a per-layer basis, and thus it often resorts to slower algorithms that fit the workspace size constraints. We present μ-cuDNN, a transparent wrapper library for cuDNN, which divides layers' mini-batch computation into several micro-batches. Based on Dynamic Programming and Integer Linear Programming, μ-cuDNN enables faster algorithms by decreasing the workspace requirements. At the same time, μ-cuDNN keeps the computational semantics unchanged, so that it decouples statistical efficiency from the hardware efficiency safely. We demonstrate the effectiveness of μ-cuDNN over two frameworks, Caffe and TensorFlow, achieving speedups of 1.63x for AlexNet and 1.21x for ResNet-18 on P100-SXM2 GPU. These results indicate that using micro-batches can seamlessly increase the performance of deep learning, while maintaining the same memory footprint.

READ FULL TEXT

page 7

page 8

page 9

research
06/25/2023

Im2win: An Efficient Convolution Paradigm on GPU

Convolution is the most time-consuming operation in deep neural network ...
research
03/27/2018

Diagonalwise Refactorization: An Efficient Training Method for Depthwise Convolutions

Depthwise convolutions provide significant performance benefits owing to...
research
03/19/2020

TF-Coder: Program Synthesis for Tensor Manipulations

The success and popularity of deep learning is on the rise, partially du...
research
09/06/2022

Mimose: An Input-Aware Checkpointing Planner for Efficient Training on GPU

Larger deep learning models usually lead to higher model quality with an...
research
10/17/2020

TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems

Deep learning inference on embedded devices is a burgeoning field with m...
research
05/22/2018

Backpropagation for long sequences: beyond memory constraints with constant overheads

Naive backpropagation through time has a memory footprint that grows lin...

Please sign up or login with your details

Forgot password? Click here to reset