PaReprop: Fast Parallelized Reversible Backpropagation

06/15/2023
by   Tyler Lixuan Zhu, et al.
4

The growing size of datasets and deep learning models has made faster and memory-efficient training crucial. Reversible transformers have recently been introduced as an exciting new method for extremely memory-efficient training, but they come with an additional computation overhead of activation re-computation in the backpropagation phase. We present PaReprop, a fast Parallelized Reversible Backpropagation algorithm that parallelizes the additional activation re-computation overhead in reversible training with the gradient computation itself in backpropagation phase. We demonstrate the effectiveness of the proposed PaReprop algorithm through extensive benchmarking across model families (ViT, MViT, Swin and RoBERTa), data modalities (Vision NLP), model sizes (from small to giant), and training batch sizes. Our empirical results show that PaReprop achieves up to 20 throughput than vanilla reversible training, largely mitigating the theoretical overhead of 25 training. Project page: https://tylerzhu.com/pareprop.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2017

The Reversible Residual Network: Backpropagation Without Storing Activations

Deep residual networks (ResNets) have significantly pushed forward the s...
research
10/24/2019

Reversible designs for extreme memory cost reduction of CNN training

Training Convolutional Neural Networks (CNN) is a resource intensive tas...
research
08/16/2023

Towards Zero Memory Footprint Spiking Neural Network Training

Biologically-inspired Spiking Neural Networks (SNNs), processing informa...
research
06/01/2023

Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning

Parameter-efficient fine-tuning (PEFT) of pre-trained language models (P...
research
10/25/2018

Reversible Recurrent Neural Networks

Recurrent neural networks (RNNs) provide state-of-the-art performance in...
research
10/29/2022

Fast Efficient Fixed-Size Memory Pool: No Loops and No Overhead

In this paper, we examine a ready-to-use, robust, and computationally fa...
research
05/22/2018

Backpropagation for long sequences: beyond memory constraints with constant overheads

Naive backpropagation through time has a memory footprint that grows lin...

Please sign up or login with your details

Forgot password? Click here to reset