Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuron

10/04/2020
by   Jun-Kun Wang, et al.
0

Over-parametrization has become a popular technique in deep learning. It is observed that by over-parametrization, a larger neural network needs a fewer training iterations than a smaller one to achieve a certain level of performance – namely, over-parametrization leads to acceleration in optimization. However, despite that over-parametrization is widely used nowadays, little theory is available to explain the acceleration due to over-parametrization. In this paper, we propose understanding it by studying a simple problem first. Specifically, we consider the setting that there is a single teacher neuron with quadratic activation, where over-parametrization is realized by having multiple student neurons learn the data generated from the teacher neuron. We provably show that over-parametrization helps the iterate generated by gradient descent to enter the neighborhood of a global optimal solution that achieves zero testing error faster. On the other hand, we also point out an issue regarding the necessity of over-parametrization and study how the scaling of the output neurons affects the convergence time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2021

A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Network

While over-parameterization is widely believed to be crucial for the suc...
research
05/30/2022

Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel Methods

While deep learning has outperformed other methods for various tasks, th...
research
06/02/2021

Learning a Single Neuron with Bias Using Gradient Descent

We theoretically study the fundamental problem of learning a single neur...
research
06/27/2020

Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

We study the dynamics of optimization and the generalization properties ...
research
05/19/2016

AMSOM: Adaptive Moving Self-organizing Map for Clustering and Visualization

Self-Organizing Map (SOM) is a neural network model which is used to obt...
research
09/30/2019

Over-parameterization as a Catalyst for Better Generalization of Deep ReLU network

To analyze deep ReLU network, we adopt a student-teacher setting in whic...
research
10/12/2021

Expressivity and Trainability of Quadratic Networks

Inspired by diversity of biological neurons, quadratic artificial neuron...

Please sign up or login with your details

Forgot password? Click here to reset