
YellowFin and the Art of Momentum Tuning
Hyperparameter tuning is one of the big costs of deep learning. Stateof...
read it

How do SGD hyperparameters in natural training affect adversarial robustness?
Learning rate, batch size and momentum are three important hyperparamete...
read it

Disentanglement Challenge: From Regularization to Reconstruction
The challenge of learning disentangled representation has recently attra...
read it

Supervising strong learners by amplifying weak experts
Many real world learning tasks involve complex or hardtospecify object...
read it

Scheduling the Learning Rate via Hypergradients: New Insights and a New Algorithm
We study the problem of fitting taskspecific learning rate schedules fr...
read it

On the Convergence of Adam and Adagrad
We provide a simple proof of the convergence of the optimization algorit...
read it

FASK with Interventional Knowledge Recovers Edges from the Sachs Model
We report a procedure that, in one step from continuous data with minima...
read it
Comment on Stochastic Polyak StepSize: Performance of ALIG
This is a short note on the performance of the ALIG algorithm (Berrada et al., 2020) as reported in (Loizou et al., 2021). ALIG (Berrada et al., 2020) and SPS (Loizou et al., 2021) are both adaptations of the Polyak stepsize to optimize machine learning models that can interpolate the training data. The main algorithmic differences are that (1) SPS employs a multiplicative constant in the denominator of the learningrate while ALIG uses an additive constant, and (2) SPS uses an iterationdependent maximal learningrate while ALIG uses a constant one. There are also differences in the analysis provided by the two works, with less restrictive assumptions proposed in (Loizou et al., 2021). In their experiments, (Loizou et al., 2021) did not use momentum for ALIG (which is a standard part of the algorithm) or standard hyperparameter tuning (for e.g. learningrate and regularization). Hence this note as a reference for the improved performance that ALIG can obtain with wellchosen hyperparameters. In particular, we show that when training a ResNet34 on CIFAR10 and CIFAR100, the performance of ALIG can reach respectively 93.5 (+8 method for training interpolating neural networks.
READ FULL TEXT
Comments
There are no comments yet.