DeepAI AI Chat
Log In Sign Up

Implicit Data-Driven Regularization in Deep Neural Networks under SGD

by   Xuran Meng, et al.
The University of Hong Kong

Much research effort has been devoted to explaining the success of deep learning. Random Matrix Theory (RMT) provides an emerging way to this end: spectral analysis of large random matrices involved in a trained deep neural network (DNN) such as weight matrices or Hessian matrices with respect to the stochastic gradient descent algorithm. In this paper, we conduct extensive experiments on weight matrices in different modules, e.g., layers, networks and data sets, to analyze the evolution of their spectra. We find that these spectra can be classified into three main types: Marčenko-Pastur spectrum (MP), Marčenko-Pastur spectrum with few bleeding outliers (MPB), and Heavy tailed spectrum (HT). Moreover, these discovered spectra are directly connected to the degree of regularization in the DNN. We argue that the degree of regularization depends on the quality of data fed to the DNN, namely Data-Driven Regularization. These findings are validated in several NNs, using Gaussian synthetic data and real data sets (MNIST and CIFAR10). Finally, we propose a spectral criterion and construct an early stopping procedure when the NN is found highly regularized without test data by using the connection between the spectra types and the degrees of regularization. Such early stopped DNNs avoid unnecessary extra training while preserving a much comparable generalization ability.


Traditional and Heavy-Tailed Self Regularization in Neural Network Models

Random Matrix Theory (RMT) is applied to analyze the weight matrices of ...

Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks

Given two or more Deep Neural Networks (DNNs) with the same or similar a...

Spectra2pix: Generating Nanostructure Images from Spectra

The design of the nanostructures that are used in the field of nano-phot...

A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks

The gradient noise (GN) in the stochastic gradient descent (SGD) algorit...

Universal characteristics of deep neural network loss surfaces from random matrix theory

This paper considers several aspects of random matrix universality in de...

Beyond Random Matrix Theory for Deep Networks

We investigate whether the Wigner semi-circle and Marcenko-Pastur distri...