
Can SingleShuffle SGD be Better than Reshuffling SGD and GD?
We propose matrix norm inequalities that extend the RechtRé (2012) conj...
read it

Provably Efficient Algorithms for MultiObjective Competitive RL
We study multiobjective reinforcement learning (RL) where an agent's re...
read it

Why do classifier accuracies show linear trends under distribution shift?
Several recent studies observed that when classification models are eval...
read it

Provably Efficient Online Agnostic Learning in Markov Games
We study online agnostic learning, a problem that arises in episodic mul...
read it

Coping with Label Shift via Distributionally Robust Optimisation
The label shift problem refers to the supervised learning setting where ...
read it

Contrastive Learning with Hard Negative Samples
We consider the question: how can you sample good negative examples for ...
read it

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes
We study minimax optimal reinforcement learning in episodic factored Mar...
read it

SGD with shuffling: optimal rates without component convexity and large epoch requirements
We study withoutreplacement SGD for solving finitesum optimization pro...
read it

Stochastic Optimization with Nonstationary Noise
We investigate stochastic optimization problems under relaxed assumption...
read it

On Tight Convergence Rates of Withoutreplacement SGD
For solving finitesum optimization problems, SGD without replacement sa...
read it

Strength from Weakness: Fast Learning Using Weak Supervision
We study generalization properties of weakly supervised learning. That i...
read it

On Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions
We provide the first nonasymptotic analysis for finding stationary poin...
read it

From Nesterov's Estimate Sequence to Riemannian Acceleration
We propose the first global accelerated gradient method for Riemannian m...
read it

Why ADAM Beats SGD for Attention Models
While stochastic gradient descent (SGD) is still the de facto algorithm ...
read it

Metrics Induced by Quantum JensenShannonRenyí and Related Divergences
We study symmetric divergences on Hermitian positive definite matrices g...
read it

Nonconvex stochastic optimization on manifolds via Riemannian FrankWolfe methods
We study stochastic projectionfree methods for constrained optimization...
read it

Efficient Policy Learning for NonStationary MDPs under Adversarial Manipulation
A Markov Decision Process (MDP) is a popular model for reinforcement lea...
read it

Are deep ResNets provably better than linear predictors?
Recently, a residual network (ResNet) with a single residual block has b...
read it

Near Optimal Stratified Sampling
The performance of a machine learning system is usually evaluated by usi...
read it

Flexible Modeling of Diversity with Strongly LogConcave Distributions
Strongly logconcave (SLC) distributions are a rich class of discrete pr...
read it

Analysis of Gradient Clipping and Adaptive Scaling with a Relaxed Smoothness Condition
We provide a theoretical explanation for the fast convergence of gradien...
read it

Escaping Saddle Points with Adaptive Gradient Methods
Adaptive methods such as Adam and RMSProp are widely used in deep learni...
read it

DeepRBF Networks Revisited: Robust Classification with Rejection
One of the main drawbacks of deep neural networks, like many other class...
read it

RSPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate
We study smooth stochastic optimization problems on Riemannian manifolds...
read it

Finite sample expressive power of smallwidth ReLU networks
We study universal finite sample expressivity of neural networks, define...
read it

Efficiently testing local optimality and escaping saddles for ReLU networks
We provide a theoretical algorithm for checking local optimality and esc...
read it

Random Shuffling Beats SGD after Finite Epochs
A longstanding problem in the theory of stochastic gradient descent (SG...
read it

Towards Riemannian Accelerated Gradient Methods
We propose a Riemannian version of Nesterov's Accelerated Gradient algor...
read it

Direct RungeKutta Discretization Achieves Acceleration
We study gradientbased optimization methods obtained by directly discre...
read it

NonLinear Temporal Subspace Representations for Activity Recognition
Representations that can compactly and effectively capture the temporal ...
read it

Learning Determinantal Point Processes by Sampling Inferred Negatives
Determinantal Point Processes (DPPs) have attracted significant interest...
read it

A Critical View of Global Optimality in Deep Learning
We investigate the loss surface of deep linear and nonlinear neural netw...
read it

A Generic Approach for Escaping Saddle points
A central challenge to using firstorder methods for optimizing nonconve...
read it

Unsupervised robust nonparametric learning of hidden community properties
We consider learning of fundamental properties of communities in large n...
read it

Global optimality conditions for deep neural networks
We study the error landscape of deep linear and nonlinear neural network...
read it

An Alternative to EM for Gaussian Mixture Models: Batch and Stochastic Riemannian Optimization
We consider maximum likelihood estimation for Gaussian Mixture Models (G...
read it

Sequence Summarization Using Orderconstrained Kernelized Feature Subspaces
Representations that can compactly and effectively capture temporal evol...
read it

Polynomial Time Algorithms for Dual Volume Sampling
We study dual volume sampling, a method for selecting k columns from an ...
read it

Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling
We study probability measures induced by set functions with constraints....
read it

Stochastic FrankWolfe Methods for Nonconvex Optimization
We study FrankWolfe methods for nonconvex stochastic and finitesum opt...
read it

Geometric Mean Metric Learning
We revisit the task of learning a Euclidean metric from data. We approac...
read it

Fast Sampling for Strongly Rayleigh Measures with Application to Determinantal Point Processes
In this note we consider sampling from (nonhomogeneous) strongly Raylei...
read it

Kronecker Determinantal Point Processes
Determinantal Point Processes (DPPs) are probabilistic models over all s...
read it

Fast Stochastic Methods for Nonsmooth Nonconvex Optimization
We analyze stochastic algorithms for optimizing nonconvex, nonsmooth fin...
read it

Directional Statistics in Machine Learning: a Brief Review
The modern data analyst must cope with data encoded in various forms, ve...
read it

Combinatorial Topic Models using SmallVariance Asymptotics
Topic models have emerged as fundamental tools in unsupervised machine l...
read it

Stochastic Variance Reduction for Nonconvex Optimization
We study nonconvex finitesum problems and analyze stochastic variance r...
read it

Fast Incremental Method for Nonconvex Optimization
We analyze a fast incremental aggregated gradient method for optimizing ...
read it

Firstorder Methods for Geodesically Convex Optimization
Geodesic convexity generalizes the notion of (vector space) convexity to...
read it

Gauss quadrature for matrix inverse forms with applications
We present a framework for accelerating a spectrum of machine learning a...
read it
Suvrit Sra
is this you? claim profile
Researcher at Massachusetts Institute of Technology (MIT), Cofounder; Chief AI Officer at macroeyes