Quadratic number of nodes is sufficient to learn a dataset via gradient descent

11/13/2019
by   Biswarup Das, et al.
0

We prove that if an activation function satisfies some mild conditions and number of neurons in a two-layered fully connected neural network with this activation function is beyond a certain threshold, then gradient descent on quadratic loss function finds the optimal weights of input layer for global minima in linear time. This threshold value is an improvement over previously obtained values. We hypothesise that this bound cannot be improved by the method we are using in this work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2018

Variational Neural Networks: Every Layer and Neuron Can Be Unique

The choice of activation function can significantly influence the perfor...
research
04/20/2018

A Simple Quantum Neural Net with a Periodic Activation Function

In this paper, we propose a simple neural net that requires only O(nlog_...
research
04/04/2019

Preference Neural Network

This paper proposes a preference neural network (PNN) to address the pro...
research
05/25/2019

Hebbian-Descent

In this work we propose Hebbian-descent as a biologically plausible lear...
research
03/25/2011

Distribution-Independent Evolvability of Linear Threshold Functions

Valiant's (2007) model of evolvability models the evolutionary process o...
research
02/11/2023

Global Convergence Rate of Deep Equilibrium Models with General Activations

In a recent paper, Ling et al. investigated the over-parametrized Deep E...

Please sign up or login with your details

Forgot password? Click here to reset