Spatially heterogeneous learning by a deep student machine

02/15/2023
by   Hajime Yoshino, et al.
0

Despite the spectacular successes, deep neural networks (DNN) with a huge number of adjustable parameters remain largely black boxes. To shed light on the hidden layers of DNN, we study supervised learning by a DNN of width N and depth L consisting of perceptrons with c inputs by a statistical mechanics approach called the teacher-student setting. We consider an ensemble of student machines that exactly reproduce M sets of N dimensional input/output relations provided by a teacher machine. We analyze the ensemble theoretically using a replica method (H. Yoshino (2020)) and numerically performing greedy Monte Carlo simulations. The replica theory which works on high dimensional data N ≫ 1 becomes exact in 'dense limit' N ≫ c ≫ 1 and M ≫ 1 with fixed α=M/c. Both the theory and the simulation suggest learning by the DNN is quite heterogeneous in the network space: configurations of the machines are more correlated within the layers closer to the input/output boundaries while the central region remains much less correlated due to over-parametrization. Deep enough systems relax faster thanks to the less correlated central region. Remarkably both the theory and simulation suggest generalization-ability of the student machines does not vanish even in the deep limit L ≫ 1 where the system becomes strongly over-parametrized. We also consider the impact of effective dimension D(≤ N) of data by incorporating the hidden manifold model (S. Goldt et al (2020)) into our model. The replica theory implies that the loop corrections to the dense limit, which reflect correlations between different nodes in the network, become enhanced by either decreasing the width N or decreasing the effective dimension D of the data. Simulation suggests both leads to significant improvements in generalization-ability.

READ FULL TEXT

page 3

page 5

research
12/07/2020

Statistical Mechanics of Deep Linear Neural Networks: The Back-Propagating Renormalization Group

The success of deep learning in many real-world tasks has triggered an e...
research
03/13/2021

Student-Teacher Learning from Clean Inputs to Noisy Inputs

Feature-based student-teacher learning, a training method that encourage...
research
04/29/2021

Soft Mode in the Dynamics of Over-realizable On-line Learning for Soft Committee Machines

Over-parametrized deep neural networks trained by stochastic gradient de...
research
06/27/2020

Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

We study the dynamics of optimization and the generalization properties ...
research
01/09/2022

A multi-scale sampling method for accurate and robust deep neural network to predict combustion chemical kinetics

Machine learning has long been considered as a black box for predicting ...
research
02/16/2019

How Machine (Deep) Learning Helps Us Understand Human Learning: the Value of Big Ideas

I use simulation of two multilayer neural networks to gain intuition int...
research
10/22/2019

From complex to simple : hierarchical free-energy landscape renormalized in deep neural networks

We develop a statistical mechanical approach based on the replica method...

Please sign up or login with your details

Forgot password? Click here to reset