Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

11/20/2017
by   Ziming Zhang, et al.
0

By lifting the ReLU function into a higher dimensional space, we develop a smooth multi-convex formulation for training feed-forward deep neural networks (DNNs). This allows us to develop a block coordinate descent (BCD) training algorithm consisting of a sequence of numerically well-behaved convex optimizations. Using ideas from proximal point methods in convex analysis, we prove that this BCD algorithm will converge globally to a stationary point with R-linear convergence rate of order one. In experiments with the MNIST database, DNNs trained with this BCD algorithm consistently yielded better test-set error rates than identical DNN architectures trained via all the stochastic gradient descent (SGD) variants in the Caffe toolbox.

READ FULL TEXT
research
03/24/2018

A Proximal Block Coordinate Descent Algorithm for Deep Neural Network Training

Training deep neural networks (DNNs) efficiently is a challenge due to t...
research
05/23/2018

A Unified Framework for Training Neural Networks

The lack of mathematical tractability of Deep Neural Networks (DNNs) has...
research
11/20/2017

Block-Cyclic Stochastic Coordinate Descent for Deep Neural Networks

We present a stochastic first-order optimization algorithm, named BCSC, ...
research
06/26/2020

Is SGD a Bayesian sampler? Well, almost

Overparameterised deep neural networks (DNNs) are highly expressive and ...
research
11/02/2022

POLICE: Provably Optimal Linear Constraint Enforcement for Deep Neural Networks

Deep Neural Networks (DNNs) outshine alternative function approximators ...
research
10/28/2016

Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods

The optimization problem behind neural networks is highly non-convex. Tr...

Please sign up or login with your details

Forgot password? Click here to reset