
Are all negatives created equal in contrastive instance discrimination?
Selfsupervised learning has recently begun to rival supervised learning...
Learning Optimal Representations with the Decodable Information Bottleneck
We address the question of characterizing and finding optimal representa...
Theory of gating in recurrent neural networks
RNNs are popular dynamical models, used for processing sequential data. ...
Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs
Batch normalization (BatchNorm) has become an indispensable tool for tra...
The Early Phase of Neural Network Training
Recent studies have shown that many important aspects of neural network ...
Gating creates slow modes and controls phasespace complexity in GRUs and LSTMs
Recurrent neural networks (RNNs) are powerful dynamical models for data ...
How noise affects the Hessian spectrum in overparameterized neural networks
Stochastic gradient descent (SGD) forms the core optimization method for...
Meanfield Analysis of Batch Normalization
Batch Normalization (BatchNorm) is an extremely useful component of mode...
A highbias, lowvariance introduction to Machine Learning for physicists
Machine Learning (ML) is one of the most exciting and dynamic areas of m...
The information bottleneck and geometric clustering
The information bottleneck (IB) approach to clustering takes a joint dis...
Comment on "Why does deep and cheap learning work so well?" [arXiv:1608.08225]
In a recent paper, "Why does deep and cheap learning work so well?", Lin...
Supervised Learning with QuantumInspired Tensor Networks
Tensor networks are efficient representations of highdimensional tensor...
The deterministic information bottleneck
Lossy compression and clustering fundamentally involve a decision about ...
An exact mapping between the Variational Renormalization Group and Deep Learning
Deep learning is a broad set of techniques that uses multiple layers of ...
David J. Schwab
