On Equivalent Optimization of Machine Learning Methods

02/17/2023
by   William T Redman, et al.
0

At the core of many machine learning methods resides an iterative optimization algorithm for their training. Such optimization algorithms often come with a plethora of choices regarding their implementation. In the case of deep neural networks, choices of optimizer, learning rate, batch size, etc. must be made. Despite the fundamental way in which these choices impact the training of deep neural networks, there exists no general method for identifying when they lead to equivalent, or non-equivalent, optimization trajectories. By viewing iterative optimization as a discrete-time dynamical system, we are able to leverage Koopman operator theory, where it is known that conjugate dynamics can have identical spectral objects. We find highly overlapping Koopman spectra associated with the application of online mirror and gradient descent to specific problems, illustrating that such a data-driven approach can corroborate the recently discovered analytical equivalence between the two optimizers. We extend our analysis to feedforward, fully connected neural networks, providing the first general characterization of when choices of learning rate, batch size, layer width, data set, and activation function lead to equivalent, and non-equivalent, evolution of network parameters during training. Among our main results, we find that learning rate to batch size ratio, layer width, nature of data set (handwritten vs. synthetic), and activation function affect the nature of conjugacy. Our data-driven approach is general and can be utilized broadly to compare the optimization of machine learning methods.

READ FULL TEXT
research
03/15/2020

Stochastic gradient descent with random learning rate

We propose to optimize neural networks with a uniformly-distributed rand...
research
09/07/2021

Revisiting Recursive Least Squares for Training Deep Neural Networks

Recursive least squares (RLS) algorithms were once widely used for train...
research
06/07/2016

Systematic evaluation of CNN advances on the ImageNet

The paper systematically studies the impact of a range of recent advance...
research
07/07/2020

Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH)

Gradient descent yields zero training loss in polynomial time for deep n...
research
11/30/2019

Learning Rate Dropout

The performance of a deep neural network is highly dependent on its trai...
research
03/18/2020

Block Layer Decomposition schemes for training Deep Neural Networks

Deep Feedforward Neural Networks' (DFNNs) weights estimation relies on t...

Please sign up or login with your details

Forgot password? Click here to reset