Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel Methods

05/30/2022
by   Shunta Akiyama, et al.
0

While deep learning has outperformed other methods for various tasks, theoretical frameworks that explain its reason have not been fully established. To address this issue, we investigate the excess risk of two-layer ReLU neural networks in a teacher-student regression model, in which a student network learns an unknown teacher network through its outputs. Especially, we consider the student network that has the same width as the teacher network and is trained in two phases: first by noisy gradient descent and then by the vanilla gradient descent. Our result shows that the student network provably reaches a near-global optimal solution and outperforms any kernel methods estimator (more generally, linear estimators), including neural tangent kernel approach, random feature model, and other kernel methods, in a sense of the minimax optimal rate. The key concept inducing this superiority is the non-convexity of the neural network models. Even though the loss landscape is highly non-convex, the student network adaptively learns the teacher neurons.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2020

Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods

Establishing a theoretical analysis that explains why deep learning can ...
research
06/11/2021

On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting

Deep learning empirically achieves high performance in many applications...
research
02/04/2021

A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Network

While over-parameterization is widely believed to be crucial for the suc...
research
10/04/2020

Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuron

Over-parametrization has become a popular technique in deep learning. It...
research
10/22/2019

From complex to simple : hierarchical free-energy landscape renormalized in deep neural networks

We develop a statistical mechanical approach based on the replica method...
research
01/19/2020

Optimal Rate of Convergence for Deep Neural Network Classifiers under the Teacher-Student Setting

Classifiers built with neural networks handle large-scale high-dimension...
research
09/28/2018

A theoretical framework for deep locally connected ReLU network

Understanding theoretical properties of deep and locally connected nonli...

Please sign up or login with your details

Forgot password? Click here to reset