Speed Limits for Deep Learning

07/27/2023
by   Inbar Seroussi, et al.
0

State-of-the-art neural networks require extreme computational power to train. It is therefore natural to wonder whether they are optimally trained. Here we apply a recent advancement in stochastic thermodynamics which allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network, based on the ratio of their Wasserstein-2 distance and the entropy production rate of the dynamical process connecting them. Considering both gradient-flow and Langevin training dynamics, we provide analytical expressions for these speed limits for linear and linearizable neural networks e.g. Neural Tangent Kernel (NTK). Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels – learning is optimal in a scaling sense. Our results are consistent with small-scale experiments with Convolutional Neural Networks (CNNs) and Fully Connected Neural networks (FCNs) on CIFAR-10, showing a short highly non-optimal regime followed by a longer optimal regime.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/19/2016

Quantized neural network design under weight capacity constraint

The complexity of deep neural network algorithms for hardware implementa...
research
07/07/2020

Doubly infinite residual networks: a diffusion process approach

When neural network's parameters are initialized as i.i.d., neural netwo...
research
03/05/2023

Reparameterization through Spatial Gradient Scaling

Reparameterization aims to improve the generalization of deep neural net...
research
10/29/2021

Neural Networks as Kernel Learners: The Silent Alignment Effect

Neural networks in the lazy training regime converge to kernel machines....
research
06/19/2019

Disentangling feature and lazy learning in deep neural networks: an empirical study

Two distinct limits for deep learning as the net width h→∞ have been pro...
research
03/24/2019

Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks

Convolutional neural networks (CNNs) have been shown to achieve optimal ...
research
12/18/2020

Deep learning and high harmonic generation

For the high harmonic generation problem, we trained deep convolutional ...

Please sign up or login with your details

Forgot password? Click here to reset