Accelerating Training using Tensor Decomposition

09/10/2019
by   Mostafa Elhoushi, et al.
0

Tensor decomposition is one of the well-known approaches to reduce the latency time and number of parameters of a pre-trained model. However, in this paper, we propose an approach to use tensor decomposition to reduce training time of training a model from scratch. In our approach, we train the model from scratch (i.e., randomly initialized weights) with its original architecture for a small number of epochs, then the model is decomposed, and then continue training the decomposed model till the end. There is an optional step in our approach to convert the decomposed architecture back to the original architecture. We present results of using this approach on both CIFAR10 and Imagenet datasets, and show that there can be upto 2x speed up in training time with accuracy drop of upto 1.5 training acceleration approach is independent of hardware and is expected to have similar speed ups on both CPU and GPU platforms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2023

Efficient GPT Model Pre-training using Tensor Train Matrix Representation

Large-scale transformer models have shown remarkable performance in lang...
research
08/08/2023

Quantization Aware Factorization for Deep Neural Network Compression

Tensor decomposition of convolutional and fully-connected layers is an e...
research
12/19/2014

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

We propose a simple two-step approach for speeding up convolution layers...
research
08/10/2021

Tensor Yard: One-Shot Algorithm of Hardware-Friendly Tensor-Train Decomposition for Convolutional Neural Networks

Nowadays Deep Learning became widely used in many economic, technical an...
research
12/20/2018

DAC: Data-free Automatic Acceleration of Convolutional Networks

Deploying a deep learning model on mobile/IoT devices is a challenging t...
research
03/02/2022

Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models

The state-of-the-art Mixture-of-Experts (short as MoE) architecture has ...
research
01/07/2021

Transfer Learning Between Different Architectures Via Weights Injection

This work presents a naive algorithm for parameter transfer between diff...

Please sign up or login with your details

Forgot password? Click here to reset