
Is Local SGD Better than Minibatch SGD?
We study local SGD (also known as parallel SGD and federated averaging),...
read it

Towards Understanding the Role of OverParametrization in Generalization of Neural Networks
Despite existing work on ensuring generalization of neural networks in t...
read it

The Implicit Bias of Gradient Descent on Separable Data
We show that gradient descent on an unregularized logistic regression pr...
read it

PathNormalized Optimization of Recurrent Neural Networks with ReLU Activations
We investigate the parameterspace geometry of recurrent neural networks...
read it

Implicit Regularization in Matrix Factorization
We study implicit regularization when optimizing an underdetermined quad...
read it

The Marginal Value of Adaptive Gradient Methods in Machine Learning
Adaptive optimization methods, which perform local optimization with a m...
read it

Stochastic Canonical Correlation Analysis
We tightly analyze the sample complexity of CCA, provide a learning algo...
read it

Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and Highdimensional Data
Sketching techniques have become popular for scaling up machine learning...
read it

Tight Complexity Bounds for Optimizing Composite Objectives
We provide tight upper and lower bounds on the complexity of minimizing ...
read it

Efficient Distributed Learning with Sparsity
We propose a novel, efficient approach for distributed sparse learning i...
read it

Global Optimality of Local Search for Low Rank Matrix Recovery
We show that there are no spurious local minima in the nonconvex factor...
read it

Distributed MultiTask Learning with Shared Representation
We study the problem of distributed multitask learning with shared repr...
read it

Reducing Runtime by Recycling Samples
Contrary to the situation with stochastic gradient descent, we argue tha...
read it

Distributed Multitask Learning
We consider the problem of distributed multitask learning, where each m...
read it

NormBased Capacity Control in Neural Networks
We investigate the capacity, convexity and characterization of a general...
read it

On Symmetric and Asymmetric LSHs for Inner Product Search
We consider the problem of designing locality sensitive hashes (LSH) for...
read it

Complexity of Inference in Graphical Models
It is wellknown that inference in graphical models is hard in the worst...
read it

Communication Efficient Distributed Optimization using an Approximate Newtontype Method
We present a novel Newtontype method for distributed optimization, whic...
read it

Explicit Approximations of the Gaussian Kernel
We investigate training and using Gaussian kernel SVMs by approximating ...
read it

Stochastic Optimization of PCA with Capped MSG
We study PCA as a stochastic optimization problem and propose a novel st...
read it

PathSGD: PathNormalized Optimization in Deep Neural Networks
We revisit the choice of SGD for training deep neural networks by recons...
read it

Maximum Likelihood Bounded TreeWidth Markov Networks
Chow and Liu (1968) studied the problem of learning a maximumlikelihood ...
read it

Learning Sparse LowThreshold Linear Classifiers
We consider the problem of learning a nonnegative linear classifier wit...
read it

Matrix reconstruction with the local max norm
We introduce a new family of matrix norms, the "local max" norms, genera...
read it

Minimizing The Misclassification Error Rate Using a Surrogate Convex Loss
We carefully study how well minimizing convex surrogate loss functions, ...
read it

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning
We present experiments demonstrating that some other form of capacity co...
read it

Sparse Prediction with the kSupport Norm
We derive a novel norm that corresponds to the tightest convex relaxatio...
read it

DistributionDependent Sample Complexity of Large Margin Learning
We obtain a tight distributionspecific characterization of the sample c...
read it

Clustering using Maxnorm Constrained Optimization
We suggest using the maxnorm as a convex surrogate constraint for clust...
read it

Semisupervised Learning with Density Based Distances
We present a simple, yet effective, approach to SemiSupervised Learning...
read it

Learning with the Weighted Tracenorm under Arbitrary Sampling Distributions
We provide rigorous guarantees on learning with the weighted tracenorm ...
read it

ConcentrationBased Guarantees for LowRank Matrix Reconstruction
We consider the problem of approximately reconstructing a partiallyobse...
read it

Tight Sample Complexity of LargeMargin Learning
We obtain a tight distributionspecific characterization of the sample c...
read it

The Power of Asymmetry in Binary Hashing
When approximating binary similarity using the hamming distance between ...
read it

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm
We obtain an improved finitesample guarantee on the linear convergence ...
read it

Efficient coordinatewise leading eigenvector computation
We develop and analyze efficient "coordinatewise" methods for finding t...
read it

Distributed Stochastic MultiTask Learning with Graph Regularization
We propose methods for distributed graphbased multitask learning that ...
read it

Characterizing Implicit Bias in Terms of Optimization Geometry
We study the bias of generic optimization methods, including Mirror Desc...
read it

Convergence of Gradient Descent on Separable Data
The implicit bias of gradient descent is not fully understood even in si...
read it

The Everlasting Database: Statistical Validity at a Fair Price
The problem of handling adaptivity in data analysis, intentional or not,...
read it

Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization
We suggest a general oraclebased framework that captures different para...
read it

Implicit Bias of Gradient Descent on Linear Convolutional Networks
We show that gradient descent on fullwidth linear convolutional network...
read it

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate
Stochastic Gradient Descent (SGD) is a central tool in machine learning....
read it

A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates
We provide tight finitetime convergence bounds for gradient descent and...
read it

Training WellGeneralizing Classifiers for Fairness Metrics and Other DataDependent Constraints
Classifiers can be trained with datadependent constraints to satisfy fa...
read it

On preserving nondiscrimination when combining expert advice
We study the interplay between sequential decision making and avoiding d...
read it

The Complexity of Making the Gradient Small in Stochastic Convex Optimization
We give nearly matching upper and lower bounds on the oracle complexity ...
read it

VC Classes are Adversarially Robustly Learnable, but Only Improperly
We study the question of learning an adversarially robust predictor. We ...
read it

How do infinite width bounded norm networks look in function space?
We consider the question of what functions can be captured by ReLU netwo...
read it

From Fair Decision Making to Social Equality
The study of fairness in intelligent decision systems has mostly ignored...
read it
Nathan Srebro
is this you? claim profile
Professor at Toyota Technological Institute at Chicago, Professor (part time) Dept. of Computer Science and Committee on Computational and Applied Mathematics at University of Chicago