Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

10/07/2019
by   Paras Jain, et al.
12

Modern neural networks are increasingly bottlenecked by the limited capacity of on-device GPU memory. Prior work explores dropping activations as a strategy to scale to larger neural networks under memory constraints. However, these heuristics assume uniform per-layer costs and are limited to simple architectures with linear graphs, limiting their usability. In this paper, we formalize the problem of trading-off DNN training time and memory requirements as the tensor rematerialization optimization problem, a generalization of prior checkpointing strategies. We introduce Checkmate, a system that solves for optimal schedules in reasonable times (under an hour) using off-the-shelf MILP solvers, then uses these schedules to accelerate millions of training iterations. Our method scales to complex, realistic architectures and is hardware-aware through the use of accelerator-specific, profile-based cost models. In addition to reducing training cost, Checkmate enables real-world networks to be trained with up to 5.1× larger input sizes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2022

DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation

The further development of deep neural networks is hampered by the limit...
research
07/14/2020

Analyzing and Mitigating Data Stalls in DNN Training

Training Deep Neural Networks (DNNs) is resource-intensive and time-cons...
research
06/17/2023

Breaking On-device Training Memory Wall: A Systematic Survey

On-device training has become an increasingly popular approach to machin...
research
07/20/2020

Improving Memory Utilization in Convolutional Neural Network Accelerators

While the accuracy of convolutional neural networks has achieved vast im...
research
10/19/2022

Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction

Training deep learning models can be computationally expensive. Prior wo...
research
09/06/2022

Mimose: An Input-Aware Checkpointing Planner for Efficient Training on GPU

Larger deep learning models usually lead to higher model quality with an...

Please sign up or login with your details

Forgot password? Click here to reset