Rockmate: an Efficient, Fast, Automatic and Generic Tool for Re-materialization in PyTorch

07/03/2023
by   Xunyi Zhao, et al.
0

We propose Rockmate to control the memory requirements when training PyTorch DNN models. Rockmate is an automatic tool that starts from the model code and generates an equivalent model, using a predefined amount of memory for activations, at the cost of a few re-computations. Rockmate automatically detects the structure of computational and data dependencies and rewrites the initial model as a sequence of complex blocks. We show that such a structure is widespread and can be found in many models in the literature (Transformer based models, ResNet, RegNets,...). This structure allows us to solve the problem in a fast and efficient way, using an adaptation of Checkmate (too slow on the whole model but general) at the level of individual blocks and an adaptation of Rotor (fast but limited to sequential models) at the level of the sequence itself. We show through experiments on many models that Rockmate is as fast as Rotor and as efficient as Checkmate, and that it allows in many cases to obtain a significantly lower memory consumption for activations (by a factor of 2 to 5) for a rather negligible overhead (of the order of 10 open source and available at https://github.com/topal-team/rockmate.

READ FULL TEXT
research
01/10/2023

Rethinking Voxelization and Classification for 3D Object Detection

The main challenge in 3D object detection from LiDAR point clouds is ach...
research
04/08/2022

ReservoirComputing.jl: An Efficient and Modular Library for Reservoir Computing Models

We introduce ReservoirComputing.jl, an open source Julia library for res...
research
06/08/2021

FastSeq: Make Sequence Generation Faster

Transformer-based models have made tremendous impacts in natural languag...
research
09/03/2019

Deep Equilibrium Models

We present a new approach to modeling sequential data: the deep equilibr...
research
03/13/2023

MetaTroll: Few-shot Detection of State-Sponsored Trolls with Transformer Adapters

State-sponsored trolls are the main actors of influence campaigns on soc...
research
06/13/2023

One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning

We present Generalized LoRA (GLoRA), an advanced approach for universal ...
research
01/24/2023

Model soups to increase inference without increasing compute time

In this paper, we compare Model Soups performances on three different mo...

Please sign up or login with your details

Forgot password? Click here to reset