Is deeper better? It depends on locality of relevant features

05/26/2020
by   Takashi Mori, et al.
0

It has been recognized that a heavily overparameterized artificial neural network exhibits surprisingly good generalization performance in various machine-learning tasks. Recent theoretical studies have made attempts to unveil the mystery of the overparameterization. In most of those previous works, the overparameterization is achieved by increasing the width of the network, while the effect of increasing the depth has been less well understood. In this work, we investigate the effect of increasing the depth within an overparameterized regime. To gain an insight into the advantage of depth, we introduce local and global labels as abstract but simple classification rules. It turns out that the locality of the relevant feature for a given classification rule plays an important role; our experimental results suggest that deeper is better for local labels, whereas shallower is better for global labels. We also compare the results of finite networks with those of the neural tangent kernel (NTK), which is equivalent to an infinitely wide network with a proper initialization and an infinitesimal learning rate. It is shown that the NTK does not correctly capture the depth dependence of the generalization performance, which indicates the importance of the feature learning, rather than the lazy learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2022

Interplay between depth of neural networks and locality of target functions

It has been recognized that heavily overparameterized deep neural networ...
research
09/13/2019

Finite Depth and Width Corrections to the Neural Tangent Kernel

We prove the precise scaling, at finite depth and width, for the mean an...
research
06/02/2023

Network Degeneracy as an Indicator of Training Performance: Comparing Finite and Infinite Width Angle Predictions

Neural networks are powerful functions with widespread use, but the theo...
research
10/14/2019

Effects of Depth, Width, and Initialization: A Convergence Analysis of Layer-wise Training for Deep Linear Neural Networks

Deep neural networks have been used in various machine learning applicat...
research
12/11/2019

Is Feature Diversity Necessary in Neural Network Initialization?

Standard practice in training neural networks involves initializing the ...
research
07/19/2021

Over-Parameterization and Generalization in Audio Classification

Convolutional Neural Networks (CNNs) have been dominating classification...
research
11/19/2020

Deep Residual Local Feature Learning for Speech Emotion Recognition

Speech Emotion Recognition (SER) is becoming a key role in global busine...

Please sign up or login with your details

Forgot password? Click here to reset