Stochastic Block-ADMM for Training Deep Networks

05/01/2021
by   Saeed Khorram, et al.
0

In this paper, we propose Stochastic Block-ADMM as an approach to train deep neural networks in batch and online settings. Our method works by splitting neural networks into an arbitrary number of blocks and utilizes auxiliary variables to connect these blocks while optimizing with stochastic gradient descent. This allows training deep networks with non-differentiable constraints where conventional backpropagation is not applicable. An application of this is supervised feature disentangling, where our proposed DeepFacto inserts a non-negative matrix factorization (NMF) layer into the network. Since backpropagation only needs to be performed within each block, our approach alleviates vanishing gradients and provides potentials for parallelization. We prove the convergence of our proposed method and justify its capabilities through experiments in supervised and weakly-supervised settings.

READ FULL TEXT

page 6

page 11

research
11/14/2015

Efficient Training of Very Deep Neural Networks for Supervised Hashing

In this paper, we propose training very deep neural networks (DNNs) for ...
research
06/24/2018

Beyond Backprop: Alternating Minimization with co-Activation Memory

We propose a novel online algorithm for training deep feedforward neural...
research
06/06/2020

Frank-Wolfe optimization for deep networks

Deep neural networks is today one of the most popular choices in classif...
research
08/06/2017

Training of Deep Neural Networks based on Distance Measures using RMSProp

The vanishing gradient problem was a major obstacle for the success of d...
research
10/22/2019

Vanishing Nodes: Another Phenomenon That Makes Training Deep Neural Networks Difficult

It is well known that the problem of vanishing/exploding gradients is a ...
research
03/07/2019

On Transformations in Stochastic Gradient MCMC

Stochastic gradient Langevin dynamics (SGLD) is a widely used sampler fo...
research
06/21/2019

Backpropagation-Friendly Eigendecomposition

Eigendecomposition (ED) is widely used in deep networks. However, the ba...

Please sign up or login with your details

Forgot password? Click here to reset