
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability
We empirically demonstrate that fullbatch gradient descent on neural ne...
read it

When Is Generalizable Reinforcement Learning Tractable?
Agents trained by reinforcement learning (RL) often fail to generalize b...
read it

Towards Understanding Ensemble, Knowledge Distillation and SelfDistillation in Deep Learning
We formally study how Ensemble of deep learning models can improve test ...
read it

A law of robustness for twolayers neural networks
We initiate the study of the inherent tradeoffs between the size of a ne...
read it

Learning OverParametrized TwoLayer ReLU Neural Networks beyond NTK
We consider the dynamic of gradient descent for learning a twolayer neu...
read it

Feature Purification: How Adversarial Training Performs Robust Deep Learning
Despite the great empirical success of adversarial training to defend de...
read it

When can Wasserstein GANs minimize Wasserstein Distance?
Generative Adversarial Networks (GANs) are widely used models to learn c...
read it

Backward Feature Correction: How Deep Learning Performs Deep Learning
How does a 110layer ResNet learn a highcomplexity classifier using rel...
read it

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
Stochastic gradient descent with a large initial learning rate is a wide...
read it

Complexity of Highly Parallel NonSmooth Convex Optimization
A landmark result of nonsmooth convex optimization is that gradient des...
read it

What Can ResNet Learn Efficiently, Going Beyond Kernels?
How can neural networks such as ResNet efficiently learn CIFAR10 with t...
read it

NonStochastic MultiPlayer MultiArmed Bandits: Optimal Rate With Collision Information, Sublinear Without
We consider the nonstochastic version of the (cooperative) multiplayer...
read it

Can SGD Learn Recurrent Neural Networks with Provable Generalization?
Recurrent Neural Networks (RNNs) are among the most popular models in se...
read it

Improved Pathlength Regret Bounds for Bandits
We study adaptive regret bounds in terms of the variation of the losses ...
read it

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
Neural networks have great success in many machine learning applications...
read it

A Convergence Theory for Deep Learning via OverParameterization
Deep neural networks (DNNs) have demonstrated dominating performance in ...
read it

Chasing Nested Convex Bodies Nearly Optimally
The convex body chasing problem, introduced by Friedman and Linial, is a...
read it

Competitively Chasing Convex Bodies
Let F be a family of sets in some metric space. In the Fchasing problem...
read it

On the Convergence Rate of Training Recurrent Neural Networks
Despite the huge success of deep learning, our understanding to how the ...
read it

Statistical Convergence of the EM Algorithm on Gaussian Mixture Models
We study the convergence behavior of the Expectation Maximization (EM) a...
read it

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
Neural networks have many successful applications, while much less theor...
read it

Algorithmic Framework for Modelbased Reinforcement Learning with Theoretical Guarantees
While modelbased reinforcement learning has empirically been shown to s...
read it

The Well Tempered Lasso
We study the complexity of the entire regularization path for least squa...
read it

Online Improper Learning with an Approximation Oracle
We revisit the question of reducing online learning to approximate optim...
read it

Operator Scaling via Geodesically Convex Optimization, Invariant Theory and Polynomial Identity Testing
We propose a new secondorder method for geodesically convex optimizatio...
read it

Learning Mixtures of Linear Regressions with Nearly Optimal Complexity
Mixtures of Linear Regressions (MLR) is an important mixture model with ...
read it

An Alternative View: When Does SGD Escape Local Minima?
Stochastic gradient descent (SGD) is widely used in machine learning. Al...
read it

Make the Minority Great Again: FirstOrder Regret Bound for Contextual Bandits
Regret bounds in online learning compare the player's performance to L^*...
read it

Algorithmic Regularization in Overparameterized Matrix Sensing and Neural Networks with Quadratic Activations
We show that the (stochastic) gradient descent algorithm provides an imp...
read it

Algorithmic Regularization in Overparameterized Matrix Recovery
We study the problem of recovering a lowrank matrix X^ from linear meas...
read it

Neon2: Finding Local Minima via FirstOrder Oracles
We propose a reduction for nonconvex optimization that can (1) turn a s...
read it

NearOptimal Discrete Optimization for Experimental Design: A Regret Minimization Approach
The experimental design problem concerns the selection of k points from ...
read it

An homotopy method for ℓ_p regression provably beyond selfconcordance and in inputsparsity time
We consider the problem of linear regression where the ℓ_2^n norm loss (...
read it

Linear Convergence of a FrankWolfe Type Algorithm over TraceNorm Balls
We propose a rankk variant of the classical FrankWolfe algorithm to so...
read it

A Nearly Instance Optimal Algorithm for Topk Ranking under the Multinomial Logit Model
We study the active learning problem of topk ranking from multiwise co...
read it

Provable Alternating Gradient Descent for Nonnegative Matrix Factorization with Strong Correlations
Nonnegative matrix factorization is a basic tool for decomposing data i...
read it

Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU
The online problem of computing the top eigenvector is fundamental to ma...
read it

Recovery Guarantee of Nonnegative Matrix Factorization via Alternating Updates
Nonnegative matrix factorization is a popular tool for decomposing data...
read it

Faster Principal Component Regression and Stable Matrix Chebyshev Approximation
We solve principal component regression (PCR), up to a multiplicative ac...
read it

First Efficient Convergence for Streaming kPCA: a Global, GapFree, and NearOptimal Rate
We study streaming principal component analysis (PCA), that is to find, ...
read it

Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition
We study kGenEV, the problem of finding the top k generalized eigenvect...
read it

LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain
We study kSVD that is to obtain the first k singular vectors of a matri...
read it

Approximate maximum entropy principles via GoemansWilliamson with applications to provable variational methods
The well known maximumentropy principle due to Jaynes, which states tha...
read it

Recovery guarantee of weighted lowrank approximation via alternating minimization
Many applications require recovering a ground truth lowrank matrix from...
read it

Linear Algebraic Structure of Word Senses, with Applications to Polysemy
Word embeddings are ubiquitous in NLP and information retrieval, but it'...
read it

RANDWALK: A Latent Variable Model Approach to Word Embeddings
Semantic word embeddings represent the meaning of a word via a vector, a...
read it

A Theoretical Analysis of NDCG Type Ranking Measures
A central problem in ranking is to design a ranking measure for evaluati...
read it