Lifted Proximal Operator Machines

11/05/2018
by   Jia Li, et al.
0

We propose a new optimization method for training feed-forward neural networks. By rewriting the activation function as an equivalent proximal operator, we approximate a feed-forward neural network by adding the proximal operators to the objective function as penalties, hence we call the lifted proximal operator machine (LPOM). LPOM is block multi-convex in all layer-wise weights and activations. This allows us to use block coordinate descent to update the layer-wise weights and activations in parallel. Most notably, we only use the mapping of the activation function itself, rather than its derivatives, thus avoiding the gradient vanishing or blow-up issues in gradient based training methods. So our method is applicable to various non-decreasing Lipschitz continuous activation functions, which can be saturating and non-differentiable. LPOM does not require more auxiliary variables than the layer-wise activations, thus using roughly the same amount of memory as stochastic gradient descent (SGD) does. We further prove the convergence of updating the layer-wise weights and activations. Experiments on MNIST and CIFAR-10 datasets testify to the advantages of LPOM.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/25/2019

Locally adaptive activation functions with slope recovery term for deep and physics-informed neural networks

We propose two approaches of locally adaptive activation functions namel...
research
02/16/2021

Message Passing Descent for Efficient Machine Learning

We propose a new iterative optimization method for the Data-Fitting (DF)...
research
04/20/2021

Deep learning with transfer functions: new applications in system identification

This paper presents a linear dynamical operator described in terms of a ...
research
05/20/2023

A Framework for Provably Stable and Consistent Training of Deep Feedforward Networks

We present a novel algorithm for training deep neural networks in superv...
research
07/02/2020

Efficient Proximal Mapping of the 1-path-norm of Shallow Networks

We demonstrate two new important properties of the 1-path-norm of shallo...
research
11/07/2018

YASENN: Explaining Neural Networks via Partitioning Activation Sequences

We introduce a novel approach to feed-forward neural network interpretat...
research
12/07/2020

Generalised Perceptron Learning

We present a generalisation of Rosenblatt's traditional perceptron learn...

Please sign up or login with your details

Forgot password? Click here to reset