GACT: Activation Compressed Training for General Architectures

06/22/2022
by   Xiaoxuan Liu, et al.
0

Training large neural network (NN) models requires extensive memory resources, and Activation Compressed Training (ACT) is a promising approach to reduce training memory footprint. This paper presents GACT, an ACT framework to support a broad range of machine learning tasks for generic NN architectures with limited domain knowledge. By analyzing a linearized version of ACT's approximate gradient, we prove the convergence of GACT without prior knowledge on operator type or model architecture. To make training stable, we propose an algorithm that decides the compression ratio for each tensor by estimating its impact on the gradient at run time. We implement GACT as a PyTorch library that readily applies to any NN architecture. GACT reduces the activation memory for convolutional NNs, transformers, and graph NNs by up to 8.1x, enabling training with a 4.2x to 24.7x larger batch size, with negligible accuracy loss.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2021

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

The increasing size of neural network models has been critical for impro...
research
05/10/2023

Compressing neural network by tensor network with exponentially fewer variational parameters

Neural network (NN) designed for challenging machine learning tasks is i...
research
03/13/2023

Provable Convergence of Tensor Decomposition-Based Neural Network Training

Advanced tensor decomposition, such as tensor train (TT), has been widel...
research
11/17/2020

Dynamic Hard Pruning of Neural Networks at the Edge of the Internet

Neural Networks (NN), although successfully applied to several Artificia...
research
05/21/2018

Faster Neural Network Training with Approximate Tensor Operations

We propose a novel technique for faster Neural Network (NN) training by ...
research
01/07/2020

Sparse Weight Activation Training

Training convolutional neural networks (CNNs) is time-consuming. Prior w...
research
12/30/2021

Two Instances of Interpretable Neural Network for Universal Approximations

This paper proposes two bottom-up interpretable neural network (NN) cons...

Please sign up or login with your details

Forgot password? Click here to reset