Training a Two Layer ReLU Network Analytically

04/06/2023
by   Adrian Barbu, et al.
0

Neural networks are usually trained with different variants of gradient descent based optimization algorithms such as stochastic gradient descent or the Adam optimizer. Recent theoretical work states that the critical points (where the gradient of the loss is zero) of two-layer ReLU networks with the square loss are not all local minima. However, in this work we will explore an algorithm for training two-layer neural networks with ReLU-like activation and the square loss that alternatively finds the critical points of the loss function analytically for one layer while keeping the other layer and the neuron activation pattern fixed. Experiments indicate that this simple algorithm can find deeper optima than Stochastic Gradient Descent or the Adam optimizer, obtaining significantly smaller training loss values on four out of the five real datasets evaluated. Moreover, the method is faster than the gradient descent methods and has virtually no tuning parameters.

READ FULL TEXT

page 7

page 15

page 16

page 18

research
11/21/2018

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

We study the problem of training deep neural networks with Rectified Lin...
research
02/12/2020

Training Two-Layer ReLU Networks with Gradient Descent is Inconsistent

We prove that two-layer (Leaky)ReLU networks initialized by e.g. the wid...
research
07/14/2020

Plateau Phenomenon in Gradient Descent Training of ReLU networks: Explanation, Quantification and Avoidance

The ability of neural networks to provide `best in class' approximation ...
research
10/29/2019

Learning Without Loss

We explore a new approach for training neural networks where all loss fu...
research
09/07/2022

A Greedy Algorithm for Building Compact Binary Activated Neural Networks

We study binary activated neural networks in the context of regression t...
research
02/10/2020

Deep Gated Networks: A framework to understand training and generalisation in deep learning

Understanding the role of (stochastic) gradient descent in training and ...
research
05/06/2021

The layer-wise L1 Loss Landscape of Neural Nets is more complex around local minima

For fixed training data and network parameters in the other layers the L...

Please sign up or login with your details

Forgot password? Click here to reset