
The Deep Bootstrap: Good Online Learners are Good Offline Generalizers
We propose a new framework for reasoning about generalization in deep le...
Distributional Generalization: A New Kind of Generalization
We introduce a new notion of generalization – Distributional Generalizat...
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems
Learning rate schedule can significantly affect generalization performan...
Optimal Regularization Can Mitigate Double Descent
Recent empirical and theoretical studies have shown that many learning a...
More Data Can Hurt for Linear Regression: Samplewise Double Descent
In this expository note we describe a surprising phenomenon in overparam...
Deep Double Descent: Where Bigger Models and More Data Hurt
We show that a variety of modern deep learning tasks exhibit a "doubled...
SGD on Neural Networks Learns Functions of Increasing Complexity
We perform an experimental study of the dynamics of Stochastic Gradient ...
Adversarial Robustness May Be at Odds With Simplicity
Current techniques in machine learning are so far are unable to learn cl...
Algorithmic Polarization for Hidden Markov Models
Using a mild variant of polar codes we design linear compression schemes...
The Generic Holdout: Preventing FalseDiscoveries in Adaptive Data Science
Adaptive data analysis has posed a challenge to science due to its abili...
Tracking the ℓ_2 Norm with Constant Update Time
The ℓ_2 tracking problem is the task of obtaining a streaming algorithm ...
General Strong Polarization
Arı kan's exciting discovery of polar codes has provided an altogether n...
Predicting Positive and Negative Links with Noisy Queries: Theory & Practice
Social networks and interactions in social media involve both positive a...
