Provable Convergence of Tensor Decomposition-Based Neural Network Training

03/13/2023
by   Chenyang Li, et al.
0

Advanced tensor decomposition, such as tensor train (TT), has been widely studied for tensor decomposition-based neural network (NN) training, which is one of the most common model compression methods. However, training NN with tensor decomposition always suffers significant accuracy loss and convergence issues. In this paper, a holistic framework is proposed for tensor decomposition-based NN training by formulating TT decomposition-based NN training as a nonconvex optimization problem. This problem can be solved by the proposed tensor block coordinate descent (tenBCD) method, which is a gradient-free algorithm. The global convergence of tenBCD to a critical point at a rate of O(1/k) is established with the Kurdyka Łojasiewicz (KŁ) property, where k is the number of iterations. The theoretical results can be extended to the popular residual neural networks (ResNets). The effectiveness and efficiency of our proposed framework are verified through an image classification dataset, where our proposed method can converge efficiently in training and prevent overfitting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2021

Towards Efficient Tensor Decomposition-Based DNN Model Compression with Optimization Framework

Advanced tensor decomposition, such as Tensor train (TT) and Tensor ring...
research
09/07/2023

Personalized Tucker Decomposition: Modeling Commonality and Peculiarity on Tensor Data

We propose personalized Tucker decomposition (perTucker) to address the ...
research
05/21/2018

Faster Neural Network Training with Approximate Tensor Operations

We propose a novel technique for faster Neural Network (NN) training by ...
research
06/25/2021

Tensor-based framework for training flexible neural networks

Activation functions (AFs) are an important part of the design of neural...
research
06/22/2022

GACT: Activation Compressed Training for General Architectures

Training large neural network (NN) models requires extensive memory reso...
research
02/26/2009

Are Tensor Decomposition Solutions Unique? On the global convergence of HOSVD and ParaFac algorithms

For tensor decompositions such as HOSVD and ParaFac, the objective funct...
research
09/22/2020

Tensor Programs III: Neural Matrix Laws

In a neural network (NN), weight matrices linearly transform inputs into...

Please sign up or login with your details

Forgot password? Click here to reset