
Cold Posteriors and Aleatoric Uncertainty
Recent work has observed that one can outperform exact inference in Baye...
read it

On the Generalization Benefit of Noise in Stochastic Gradient Descent
It has long been argued that minibatch stochastic gradient descent can g...
read it

Batch Normalization Biases Deep Residual Networks Towards Shallow Paths
Batch normalization has multiple benefits. It improves the conditioning ...
read it

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
We investigate how the final parameters found by stochastic gradient des...
read it

Stochastic natural gradient descent draws posterior samples in function space
Natural gradient descent (NGD) minimises the cost function on a Riemanni...
read it

Decoding Decoders: Finding Optimal Representation Spaces for Unsupervised Similarity Tasks
Experimental evidence indicates that simple models outperform complex de...
read it

Don't Decay the Learning Rate, Increase the Batch Size
It is common practice to decay the learning rate. Here we show one can u...
read it

A Bayesian Perspective on Generalization and Stochastic Gradient Descent
This paper tackles two related questions at the heart of machine learnin...
read it

Offline bilingual word vectors, orthogonal transformations and the inverted softmax
Usually bilingual word vectors are trained "online". Mikolov et al. show...
read it

Monte Carlo Sort for unreliable human comparisons
Algorithms which sort lists of real numbers into ascending order have be...
read it
Samuel L. Smith
is this you? claim profile