Hardware-Efficient Transformer Training via Piecewise Affine Operations

05/26/2023
by   Atli Kosson, et al.
0

Multiplications are responsible for most of the computational cost involved in neural network training and inference. Recent research has thus looked for ways to reduce the cost associated with them. Inspired by Mogami (2020), we replace multiplication with a cheap piecewise affine approximation that is achieved by adding the bit representation of the floating point numbers together as integers. We show that transformers can be trained with the resulting modified matrix multiplications on both vision and language tasks with little to no performance impact, and without changes to the training hyperparameters. We further replace all non-linearities in the networks making them fully and jointly piecewise affine in both inputs and weights. Finally, we show that we can eliminate all multiplications in the entire training process, including operations in the forward pass, backward pass and optimizer update, demonstrating the first successful training of modern neural network architectures in a fully multiplication-free fashion.

READ FULL TEXT
research
09/24/2018

No Multiplication? No Floating Point? No Problem! Training Networks for Efficient Inference

For successful deployment of deep neural networks on highly--resource-co...
research
12/07/2020

Deep Neural Network Training without Multiplications

Is multiplication really necessary for deep neural networks? Here we pro...
research
05/03/2018

Exploration of Numerical Precision in Deep Neural Networks

Reduced numerical precision is a common technique to reduce computationa...
research
06/21/2023

Iterated Piecewise Affine (IPA) Approximation for Language Modeling

In this work, we demonstrate the application of a simple first-order Tay...
research
10/22/2019

Neural Network Training with Approximate Logarithmic Computations

The high computational complexity associated with training deep neural n...
research
03/25/2018

Neural Nets via Forward State Transformation and Backward Loss Transformation

This article studies (multilayer perceptron) neural networks with an emp...
research
04/26/2015

Computational Cost Reduction in Learned Transform Classifications

We present a theoretical analysis and empirical evaluations of a novel s...

Please sign up or login with your details

Forgot password? Click here to reset