
Fast Margin Maximization via Dual Acceleration
We present and analyze a momentumbased gradient method for training lin...
read it

Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting
We consider interpolation learning in highdimensional linear regression...
read it

An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning
We present and analyze an algorithm for optimizing smooth and convex or ...
read it

Eluder Dimension and Generalized Rank
We study the relationship between the eluder dimension for a function cl...
read it

Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels
We study the relative power of learning with gradient descent on differe...
read it

On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent
Recent work has highlighted the role of initialization scale in determin...
read it

Adversarially Robust Learning with Unknown Perturbation Sets
We study the problem of learning predictors that are robust to adversari...
read it

The MinMax Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication
We resolve the minmax complexity of distributed stochastic convex optim...
read it

Does Invariant Risk Minimization Capture Invariance?
We show that the Invariant Risk Minimization (IRM) formulation of Arjovs...
read it

Reducing Adversarially Robust Learning to NonRobust PAC Learning
We study the problem of reducing adversarially robust learning to standa...
read it

Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy
We provide a detailed asymptotic study of gradient flow trajectories and...
read it

Predictive Value Generalization Bounds
In this paper, we study a bicriterion framework for assessing scoring f...
read it

On Uniform Convergence and LowNorm Interpolation Learning
We consider an underdetermined noisy linear regression model where the m...
read it

Minibatch vs Local SGD for Heterogeneous Distributed Learning
We analyze Local SGD (aka parallel or federated SGD) and Minibatch SGD i...
read it

Efficiently Learning Adversarially Robust Halfspaces with Noise
We study the problem of learning adversarially robust halfspaces in the ...
read it

Mirrorless Mirror Descent: A More Natural Discretization of Riemannian Gradient Flow
We present a direct (primal only) derivation of Mirror Descent as a "par...
read it

Approximate is Good Enough: Probabilistic Variants of Dimensional and Margin Complexity
We present and study approximate notions of dimensional and margin compl...
read it

Dropout: Explicit Forms and Capacity Control
We investigate the capacity control provided by dropout in various machi...
read it

Fair Learning with Private Demographic Data
Sensitive attributes such as race are rarely available to learners in re...
read it

Kernel and Rich Regimes in Overparametrized Models
A recent line of work studies overparametrized neural networks in the "k...
read it

Is Local SGD Better than Minibatch SGD?
We study local SGD (also known as parallel SGD and federated averaging),...
read it

Lower Bounds for NonConvex Stochastic Optimization
We lower bound the complexity of finding ϵstationary points (with gradi...
read it

A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case
A key element of understanding the efficacy of overparameterized neural ...
read it

Open Problem: The Oracle Complexity of Convex Optimization with Limited Memory
We note that known methods achieving the optimal oracle complexity for f...
read it

Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis
We design a general framework for answering adaptive statistical queries...
read it

Kernel and Deep Regimes in Overparametrized Models
A recent line of work studies overparametrized neural networks in the "k...
read it

Lexicographic and DepthSensitive Margins in Homogeneous and NonHomogeneous Deep Models
With an eye toward understanding complexity control in deep learning, we...
read it

SemiCyclic Stochastic Gradient Descent
We consider convex SGD updates with a blockcyclic structure, i.e. where...
read it

How do infinite width bounded norm networks look in function space?
We consider the question of what functions can be captured by ReLU netwo...
read it

The Complexity of Making the Gradient Small in Stochastic Convex Optimization
We give nearly matching upper and lower bounds on the oracle complexity ...
read it

VC Classes are Adversarially Robustly Learnable, but Only Improperly
We study the question of learning an adversarially robust predictor. We ...
read it

From Fair Decision Making to Social Equality
The study of fairness in intelligent decision systems has mostly ignored...
read it

On preserving nondiscrimination when combining expert advice
We study the interplay between sequential decision making and avoiding d...
read it

Training WellGeneralizing Classifiers for Fairness Metrics and Other DataDependent Constraints
Classifiers can be trained with datadependent constraints to satisfy fa...
read it

A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates
We provide tight finitetime convergence bounds for gradient descent and...
read it

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate
Stochastic Gradient Descent (SGD) is a central tool in machine learning....
read it

Implicit Bias of Gradient Descent on Linear Convolutional Networks
We show that gradient descent on fullwidth linear convolutional network...
read it

Towards Understanding the Role of OverParametrization in Generalization of Neural Networks
Despite existing work on ensuring generalization of neural networks in t...
read it

Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization
We suggest a general oraclebased framework that captures different para...
read it

The Everlasting Database: Statistical Validity at a Fair Price
The problem of handling adaptivity in data analysis, intentional or not,...
read it

Convergence of Gradient Descent on Separable Data
The implicit bias of gradient descent is not fully understood even in si...
read it

Characterizing Implicit Bias in Terms of Optimization Geometry
We study the bias of generic optimization methods, including Mirror Desc...
read it

Distributed Stochastic MultiTask Learning with Graph Regularization
We propose methods for distributed graphbased multitask learning that ...
read it

The Implicit Bias of Gradient Descent on Separable Data
We show that gradient descent on an unregularized logistic regression pr...
read it

Implicit Regularization in Matrix Factorization
We study implicit regularization when optimizing an underdetermined quad...
read it

The Marginal Value of Adaptive Gradient Methods in Machine Learning
Adaptive optimization methods, which perform local optimization with a m...
read it

Efficient coordinatewise leading eigenvector computation
We develop and analyze efficient "coordinatewise" methods for finding t...
read it

Stochastic Canonical Correlation Analysis
We tightly analyze the sample complexity of CCA, provide a learning algo...
read it

Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and Highdimensional Data
Sketching techniques have become popular for scaling up machine learning...
read it

Tight Complexity Bounds for Optimizing Composite Objectives
We provide tight upper and lower bounds on the complexity of minimizing ...
read it
Nathan Srebro
is this you? claim profile
Professor at Toyota Technological Institute at Chicago, Professor (part time) Dept. of Computer Science and Committee on Computational and Applied Mathematics at University of Chicago