
Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy
We provide a detailed asymptotic study of gradient flow trajectories and...
read it

Predictive Value Generalization Bounds
In this paper, we study a bicriterion framework for assessing scoring f...
read it

On Uniform Convergence and LowNorm Interpolation Learning
We consider an underdetermined noisy linear regression model where the m...
read it

Minibatch vs Local SGD for Heterogeneous Distributed Learning
We analyze Local SGD (aka parallel or federated SGD) and Minibatch SGD i...
read it

Efficiently Learning Adversarially Robust Halfspaces with Noise
We study the problem of learning adversarially robust halfspaces in the ...
read it

Mirrorless Mirror Descent: A More Natural Discretization of Riemannian Gradient Flow
We present a direct (primal only) derivation of Mirror Descent as a "par...
read it

Approximate is Good Enough: Probabilistic Variants of Dimensional and Margin Complexity
We present and study approximate notions of dimensional and margin compl...
read it

Dropout: Explicit Forms and Capacity Control
We investigate the capacity control provided by dropout in various machi...
read it

Fair Learning with Private Demographic Data
Sensitive attributes such as race are rarely available to learners in re...
read it

Kernel and Rich Regimes in Overparametrized Models
A recent line of work studies overparametrized neural networks in the "k...
read it

Is Local SGD Better than Minibatch SGD?
We study local SGD (also known as parallel SGD and federated averaging),...
read it

Lower Bounds for NonConvex Stochastic Optimization
We lower bound the complexity of finding ϵstationary points (with gradi...
read it

A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case
A key element of understanding the efficacy of overparameterized neural ...
read it

Open Problem: The Oracle Complexity of Convex Optimization with Limited Memory
We note that known methods achieving the optimal oracle complexity for f...
read it

Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis
We design a general framework for answering adaptive statistical queries...
read it

Kernel and Deep Regimes in Overparametrized Models
A recent line of work studies overparametrized neural networks in the "k...
read it

Lexicographic and DepthSensitive Margins in Homogeneous and NonHomogeneous Deep Models
With an eye toward understanding complexity control in deep learning, we...
read it

SemiCyclic Stochastic Gradient Descent
We consider convex SGD updates with a blockcyclic structure, i.e. where...
read it

How do infinite width bounded norm networks look in function space?
We consider the question of what functions can be captured by ReLU netwo...
read it

The Complexity of Making the Gradient Small in Stochastic Convex Optimization
We give nearly matching upper and lower bounds on the oracle complexity ...
read it

VC Classes are Adversarially Robustly Learnable, but Only Improperly
We study the question of learning an adversarially robust predictor. We ...
read it

From Fair Decision Making to Social Equality
The study of fairness in intelligent decision systems has mostly ignored...
read it

On preserving nondiscrimination when combining expert advice
We study the interplay between sequential decision making and avoiding d...
read it

Training WellGeneralizing Classifiers for Fairness Metrics and Other DataDependent Constraints
Classifiers can be trained with datadependent constraints to satisfy fa...
read it

A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates
We provide tight finitetime convergence bounds for gradient descent and...
read it

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate
Stochastic Gradient Descent (SGD) is a central tool in machine learning....
read it

Implicit Bias of Gradient Descent on Linear Convolutional Networks
We show that gradient descent on fullwidth linear convolutional network...
read it

Towards Understanding the Role of OverParametrization in Generalization of Neural Networks
Despite existing work on ensuring generalization of neural networks in t...
read it

Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization
We suggest a general oraclebased framework that captures different para...
read it

The Everlasting Database: Statistical Validity at a Fair Price
The problem of handling adaptivity in data analysis, intentional or not,...
read it

Convergence of Gradient Descent on Separable Data
The implicit bias of gradient descent is not fully understood even in si...
read it

Characterizing Implicit Bias in Terms of Optimization Geometry
We study the bias of generic optimization methods, including Mirror Desc...
read it

Distributed Stochastic MultiTask Learning with Graph Regularization
We propose methods for distributed graphbased multitask learning that ...
read it

The Implicit Bias of Gradient Descent on Separable Data
We show that gradient descent on an unregularized logistic regression pr...
read it

Implicit Regularization in Matrix Factorization
We study implicit regularization when optimizing an underdetermined quad...
read it

The Marginal Value of Adaptive Gradient Methods in Machine Learning
Adaptive optimization methods, which perform local optimization with a m...
read it

Efficient coordinatewise leading eigenvector computation
We develop and analyze efficient "coordinatewise" methods for finding t...
read it

Stochastic Canonical Correlation Analysis
We tightly analyze the sample complexity of CCA, provide a learning algo...
read it

Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and Highdimensional Data
Sketching techniques have become popular for scaling up machine learning...
read it

Tight Complexity Bounds for Optimizing Composite Objectives
We provide tight upper and lower bounds on the complexity of minimizing ...
read it

Efficient Distributed Learning with Sparsity
We propose a novel, efficient approach for distributed sparse learning i...
read it

Global Optimality of Local Search for Low Rank Matrix Recovery
We show that there are no spurious local minima in the nonconvex factor...
read it

PathNormalized Optimization of Recurrent Neural Networks with ReLU Activations
We investigate the parameterspace geometry of recurrent neural networks...
read it

Distributed MultiTask Learning with Shared Representation
We study the problem of distributed multitask learning with shared repr...
read it

Reducing Runtime by Recycling Samples
Contrary to the situation with stochastic gradient descent, we argue tha...
read it

Distributed Multitask Learning
We consider the problem of distributed multitask learning, where each m...
read it

PathSGD: PathNormalized Optimization in Deep Neural Networks
We revisit the choice of SGD for training deep neural networks by recons...
read it

NormBased Capacity Control in Neural Networks
We investigate the capacity, convexity and characterization of a general...
read it

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning
We present experiments demonstrating that some other form of capacity co...
read it

On Symmetric and Asymmetric LSHs for Inner Product Search
We consider the problem of designing locality sensitive hashes (LSH) for...
read it
Nathan Srebro
is this you? claim profile
Professor at Toyota Technological Institute at Chicago, Professor (part time) Dept. of Computer Science and Committee on Computational and Applied Mathematics at University of Chicago