On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting

06/11/2021
by   Shunta Akiyama, et al.
0

Deep learning empirically achieves high performance in many applications, but its training dynamics has not been fully understood theoretically. In this paper, we explore theoretical analysis on training two-layer ReLU neural networks in a teacher-student regression model, in which a student network learns an unknown teacher network through its outputs. We show that with a specific regularization and sufficient over-parameterization, the student network can identify the parameters of the teacher network with high probability via gradient descent with a norm dependent stepsize even though the objective function is highly non-convex. The key theoretical tool is the measure representation of the neural networks and a novel application of a dual certificate argument for sparse estimation on a measure space. We analyze the global minima and global convergence property in the measure space.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2022

Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel Methods

While deep learning has outperformed other methods for various tasks, th...
research
09/28/2018

A theoretical framework for deep locally connected ReLU network

Understanding theoretical properties of deep and locally connected nonli...
research
10/22/2019

From complex to simple : hierarchical free-energy landscape renormalized in deep neural networks

We develop a statistical mechanical approach based on the replica method...
research
09/30/2019

Over-parameterization as a Catalyst for Better Generalization of Deep ReLU network

To analyze deep ReLU network, we adopt a student-teacher setting in whic...
research
05/31/2019

Luck Matters: Understanding Training Dynamics of Deep ReLU Networks

We analyze the dynamics of training deep ReLU networks and their implica...
research
06/01/2020

The Effects of Mild Over-parameterization on the Optimization Landscape of Shallow ReLU Neural Networks

We study the effects of mild over-parameterization on the optimization l...
research
02/16/2019

How Machine (Deep) Learning Helps Us Understand Human Learning: the Value of Big Ideas

I use simulation of two multilayer neural networks to gain intuition int...

Please sign up or login with your details

Forgot password? Click here to reset