ADMMiRNN: Training RNN with Stable Convergence via An Efficient ADMM Approach

06/10/2020
by   Yu Tang, et al.
0

It is hard to train Recurrent Neural Network (RNN) with stable convergence and avoid gradient vanishing and exploding, as the weights in the recurrent unit are repeated from iteration to iteration. Moreover, RNN is sensitive to the initialization of weights and bias, which brings difficulty in the training phase. With the gradient-free feature and immunity to poor conditions, the Alternating Direction Method of Multipliers (ADMM) has become a promising algorithm to train neural networks beyond traditional stochastic gradient algorithms. However, ADMM could not be applied to train RNN directly since the state in the recurrent unit is repetitively updated over timesteps. Therefore, this work builds a new framework named ADMMiRNN upon the unfolded form of RNN to address the above challenges simultaneously and provides novel update rules and theoretical convergence analysis. We explicitly specify key update rules in the iterations of ADMMiRNN with deliberately constructed approximation techniques and solutions to each subproblem instead of vanilla ADMM. Numerical experiments are conducted on MNIST and text classification tasks, where ADMMiRNN achieves convergent results and outperforms compared baselines. Furthermore, ADMMiRNN trains RNN in a more stable way without gradient vanishing or exploding compared to the stochastic gradient algorithms. Source code has been available at https://github.com/TonyTangYu/ADMMiRNN.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2019

Large-Scale Classification using Multinomial Regression and ADMM

We present a novel method for learning the weights in multinomial logist...
research
11/08/2019

Learning-Accelerated ADMM for Distributed Optimal Power Flow

We propose a novel data-driven method to accelerate the convergence of A...
research
10/21/2017

Zeroth-Order Online Alternating Direction Method of Multipliers: Convergence Analysis and Applications

In this paper, we design and analyze a new zeroth-order online algorithm...
research
05/31/2019

ADMM for Efficient Deep Learning with Global Convergence

Alternating Direction Method of Multipliers (ADMM) has been used success...
research
12/22/2021

A Convergent ADMM Framework for Efficient Neural Network Training

As a well-known optimization framework, the Alternating Direction Method...
research
08/11/2023

Enhancing Generalization of Universal Adversarial Perturbation through Gradient Aggregation

Deep neural networks are vulnerable to universal adversarial perturbatio...
research
12/24/2020

Sensitivity – Local Index to Control Chaoticity or Gradient Globally

In this paper, we propose a fully local index named "sensitivity" for ea...

Please sign up or login with your details

Forgot password? Click here to reset