Log In Sign Up

Neural Spectrum Alignment

by   Dmitry Kopitkov, et al.

Expressiveness of deep models was recently addressed via the connection between neural networks (NNs) and kernel learning, where first-order dynamics of NN during a gradient-descent (GD) optimization were related to gradient similarity kernel, also known as Neural Tangent Kernel (NTK). In the majority of works this kernel is considered to be time-invariant, with its properties being defined entirely by NN architecture and independent of the learning task at hand. In contrast, in this paper we empirically explore these properties along the optimization and show that in practical applications the NN kernel changes in a very dramatic and meaningful way, with its top eigenfunctions aligning toward the target function learned by NN. Moreover, these top eigenfunctions serve sort of basis functions for NN output - a function represented by NN is spanned almost completely by them for the entire optimization process. Further, since the learning along top eigenfunctions is typically fast, their alignment with the target function improves the overall optimization performance. In addition, we study how the neural spectrum is affected by learning rate, typically done by practitioners, showing various trends in the kernel behavior. We argue that the presented phenomena may lead to a more complete theoretical understanding behind NN learning.


Rapid Feature Evolution Accelerates Learning in Neural Networks

Neural network (NN) training and generalization in the infinite-width li...

Input Invex Neural Network

In this paper, we present a novel method to constrain invexity on Neural...

Weighted Neural Tangent Kernel: A Generalized and Improved Network-Induced Kernel

The Neural Tangent Kernel (NTK) has recently attracted intense study, as...

Metaheuristic Design of Feedforward Neural Networks: A Review of Two Decades of Research

Over the past two decades, the feedforward neural network (FNN) optimiza...

On the Equivalence between Neural Network and Support Vector Machine

Recent research shows that the dynamics of an infinitely wide neural net...

Neural Optimization Kernel: Towards Robust Deep Learning

Recent studies show a close connection between neural networks (NN) and ...

Every Model Learned by Gradient Descent Is Approximately a Kernel Machine

Deep learning's successes are often attributed to its ability to automat...