Gradient Descent Converges to Ridgelet Spectrum

07/07/2020
by   Sho Sonoda, et al.
0

Deep learning achieves a high generalization performance in practice, despite the non-convexity of the gradient descent learning problem. Recently, the inductive bias in deep learning has been studied through the characterization of local minima. In this study, we show that the distribution of parameters learned by gradient descent converges to a spectrum of the ridgelet transform based on a ridgelet analysis, which is a wavelet-like analysis developed for neural networks. This convergence is stronger than those shown in previous results, and guarantees the shape of the parameter distribution has been identified with the ridgelet spectrum. In numerical experiments with finite models, we visually confirm the resemblance between the distribution of learned parameters and the ridgelet spectrum. Our study provides a better understanding of the theoretical background of an inductive bias theory based on lazy regimes.

READ FULL TEXT

page 7

page 15

page 16

page 17

research
03/30/2022

Convergence of gradient descent for deep neural networks

Optimization by gradient descent has been one of main drivers of the "de...
research
02/05/2022

The Implicit Bias of Gradient Descent on Generalized Gated Linear Networks

Understanding the asymptotic behavior of gradient-descent training of de...
research
07/26/2023

Controlling the Inductive Bias of Wide Neural Networks by Modifying the Kernel's Spectrum

Wide neural networks are biased towards learning certain functions, infl...
research
01/30/2019

Natural Analysts in Adaptive Data Analysis

Adaptive data analysis is frequently criticized for its pessimistic gene...
research
01/12/2021

A SOM-based Gradient-Free Deep Learning Method with Convergence Analysis

As gradient descent method in deep learning causes a series of questions...
research
02/25/2020

Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization

An open question in the Deep Learning community is why neural networks t...
research
02/22/2020

On the Inductive Bias of a CNN for Orthogonal Patterns Distributions

Training overparameterized convolutional neural networks with gradient b...

Please sign up or login with your details

Forgot password? Click here to reset