Profiling based Out-of-core Hybrid Method for Large Neural Networks

07/11/2019
by   Yuki Ito, et al.
0

GPUs are widely used to accelerate deep learning with NNs (NNs). On the other hand, since GPU memory capacity is limited, it is difficult to implement efficient programs that compute large NNs on GPU. To compute NNs exceeding GPU memory capacity, data-swapping method and recomputing method have been proposed in existing work. However, in these methods, performance overhead occurs due to data movement or increase of computation. In order to reduce the overhead, it is important to consider characteristics of each layer such as sizes and cost for recomputation. Based on this direction, we proposed Profiling based out-of-core Hybrid method (PoocH). PoocH determines target layers of swapping or recomputing based on runtime profiling. We implemented PoocH by extending a deep learning framework, Chainer, and we evaluated its performance. With PoocH, we successfully computed an NN requiring 50 GB memory on a single GPU with 16 GB memory. Compared with in-core cases, performance degradation was 38 % on x86 machine and 28 % on POWER9 machine.

READ FULL TEXT
research
04/24/2022

Compression-Based Optimizations for Out-of-Core GPU Stencil Computation

An out-of-core stencil computation code handles large data whose size is...
research
09/12/2021

Accelerating GPU-Based Out-of-Core Stencil Computation with On-the-Fly Compression

Stencil computation is an important class of scientific applications tha...
research
02/19/2019

Efficient Memory Management for GPU-based Deep Learning Systems

GPU (graphics processing unit) has been used for many data-intensive app...
research
09/11/2020

Hierarchical Roofline Performance Analysis for Deep Learning Applications

This paper presents a practical methodology for collecting performance d...
research
02/02/2022

Harmony: Overcoming the hurdles of GPU memory capacity to train massive DNN models on commodity servers

Deep neural networks (DNNs) have grown exponentially in complexity and s...
research
01/14/2021

Enabling Large Neural Networks on Tiny Microcontrollers with Swapping

Running neural networks (NNs) on microcontroller units (MCUs) is becomin...
research
08/01/2018

CRUM: Checkpoint-Restart Support for CUDA's Unified Memory

Unified Virtual Memory (UVM) was recently introduced on recent NVIDIA GP...

Please sign up or login with your details

Forgot password? Click here to reset