Suvrit Sra

research

∙ 07/10/2023

Invex Programs: First Order Algorithms and Their Convergence

Invex programs are a special kind of non-convex problems which attain gl...

0 Adarsh Barik, et al. ∙

research

∙ 06/01/2023

Transformers learn to implement preconditioned gradient descent for in-context learning

Motivated by the striking ability of transformers for in-context learnin...

0 Kwangjun Ahn, et al. ∙

research

∙ 05/25/2023

How to escape sharp minima

Modern machine learning applications have seen a remarkable success of o...

0 Kwangjun Ahn, et al. ∙

research

∙ 02/24/2023

On the Training Instability of Shuffling SGD with Batch Normalization

We uncover how SGD interacts with batch normalization and can exhibit un...

0 David X. Wu, et al. ∙

research

∙ 12/30/2022

Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control?

We study the task of learning state representations from potentially hig...

0 Yi Tian, et al. ∙

research

∙ 08/09/2022

Computing Brascamp-Lieb Constants through the lens of Thompson Geometry

This paper studies algorithms for efficiently computing Brascamp-Lieb co...

0 Melanie Weber, et al. ∙

research

∙ 06/22/2022

On a class of geodesically convex optimization problems solved via Euclidean MM methods

We study geodesically convex (g-convex) problems that can be written as ...

0 Suvrit Sra, et al. ∙

research

∙ 04/03/2022

Understanding the unstable convergence of gradient descent

Most existing analyses of (stochastic) gradient descent rely on the cond...

0 Kwangjun Ahn, et al. ∙

research

∙ 02/13/2022

Minimax in Geodesic Metric Spaces: Sion's Theorem and Algorithms

Determining whether saddle points exist or are approximable for nonconve...

0 Peiyuan Zhang, et al. ∙

research

∙ 12/29/2021

Time varying regression with hidden linear dynamics

We revisit a model for time-varying linear regression that assumes the u...

5 Ali Jadbabaie, et al. ∙

research

∙ 12/21/2021

Max-Margin Contrastive Learning

Standard contrastive learning approaches usually require a large number ...

0 Anshul Shah, et al. ∙

research

∙ 11/04/2021

A Riemannian Accelerated Proximal Extragradient Framework and its Implications

The study of accelerated gradient methods in Riemannian optimization has...

0 Jikai Jin, et al. ∙

research

∙ 10/20/2021

Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond

In distributed learning, local SGD (also known as federated averaging) a...

0 Chulhee Yun, et al. ∙

research

∙ 10/12/2021

On Convergence of Training Loss Without Reaching Stationary Points

It is a well-known fact that nonconvex optimization is computationally i...

9 Jingzhao Zhang, et al. ∙

research

∙ 03/12/2021

Can Single-Shuffle SGD be Better than Reshuffling SGD and GD?

We propose matrix norm inequalities that extend the Recht-Ré (2012) conj...

7 Chulhee Yun, et al. ∙

research

∙ 02/05/2021

Provably Efficient Algorithms for Multi-Objective Competitive RL

We study multi-objective reinforcement learning (RL) where an agent's re...

0 Tiancheng Yu, et al. ∙

research

∙ 12/31/2020

Why do classifier accuracies show linear trends under distribution shift?

Several recent studies observed that when classification models are eval...

0 Horia Mania, et al. ∙

research

∙ 10/28/2020

Provably Efficient Online Agnostic Learning in Markov Games

We study online agnostic learning, a problem that arises in episodic mul...

0 Yi Tian, et al. ∙

research

∙ 10/23/2020

Coping with Label Shift via Distributionally Robust Optimisation

The label shift problem refers to the supervised learning setting where ...

6 Jingzhao Zhang, et al. ∙

research

∙ 10/09/2020

Contrastive Learning with Hard Negative Samples

We consider the question: how can you sample good negative examples for ...

0 Joshua Robinson, et al. ∙

research

∙ 06/24/2020

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

We study minimax optimal reinforcement learning in episodic factored Mar...

11 Yi Tian, et al. ∙

research

∙ 06/12/2020

SGD with shuffling: optimal rates without component convexity and large epoch requirements

We study without-replacement SGD for solving finite-sum optimization pro...

0 Kwangjun Ahn, et al. ∙

research

∙ 06/08/2020

Stochastic Optimization with Non-stationary Noise

We investigate stochastic optimization problems under relaxed assumption...

0 Jingzhao Zhang, et al. ∙

research

∙ 04/18/2020

On Tight Convergence Rates of Without-replacement SGD

For solving finite-sum optimization problems, SGD without replacement sa...

0 Kwangjun Ahn, et al. ∙

research

∙ 02/19/2020

Strength from Weakness: Fast Learning Using Weak Supervision

We study generalization properties of weakly supervised learning. That i...

8 Joshua Robinson, et al. ∙

research

∙ 02/10/2020

On Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions

We provide the first non-asymptotic analysis for finding stationary poin...

0 Jingzhao Zhang, et al. ∙

research

∙ 01/24/2020

From Nesterov's Estimate Sequence to Riemannian Acceleration

We propose the first global accelerated gradient method for Riemannian m...

0 Kwangjun Ahn, et al. ∙

research

∙ 12/06/2019

Why ADAM Beats SGD for Attention Models

While stochastic gradient descent (SGD) is still the de facto algorithm ...

0 Jingzhao Zhang, et al. ∙

research

∙ 11/06/2019

Metrics Induced by Quantum Jensen-Shannon-Renyí and Related Divergences

We study symmetric divergences on Hermitian positive definite matrices g...

0 Suvrit Sra, et al. ∙

research

∙ 10/09/2019

Nonconvex stochastic optimization on manifolds via Riemannian Frank-Wolfe methods

We study stochastic projection-free methods for constrained optimization...

0 Melanie Weber, et al. ∙

research

∙ 07/22/2019

Efficient Policy Learning for Non-Stationary MDPs under Adversarial Manipulation

A Markov Decision Process (MDP) is a popular model for reinforcement lea...

4 Tiancheng Yu, et al. ∙

research

∙ 07/09/2019

Are deep ResNets provably better than linear predictors?

Recently, a residual network (ResNet) with a single residual block has b...

2 Chulhee Yun, et al. ∙

research

∙ 06/26/2019

Near Optimal Stratified Sampling

The performance of a machine learning system is usually evaluated by usi...

6 Tiancheng Yu, et al. ∙

research

∙ 06/12/2019

Flexible Modeling of Diversity with Strongly Log-Concave Distributions

Strongly log-concave (SLC) distributions are a rich class of discrete pr...

4 Joshua Robinson, et al. ∙

research

∙ 05/28/2019

Analysis of Gradient Clipping and Adaptive Scaling with a Relaxed Smoothness Condition

We provide a theoretical explanation for the fast convergence of gradien...

0 Jingzhao Zhang, et al. ∙

research

∙ 01/26/2019

Escaping Saddle Points with Adaptive Gradient Methods

Adaptive methods such as Adam and RMSProp are widely used in deep learni...

0 Matthew Staib, et al. ∙

research

∙ 12/07/2018

Deep-RBF Networks Revisited: Robust Classification with Rejection

One of the main drawbacks of deep neural networks, like many other class...

0 Pourya Habib Zadeh, et al. ∙

research

∙ 11/10/2018

R-SPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate

We study smooth stochastic optimization problems on Riemannian manifolds...

0 Jingzhao Zhang, et al. ∙

research

∙ 10/17/2018

Finite sample expressive power of small-width ReLU networks

We study universal finite sample expressivity of neural networks, define...

12 Chulhee Yun, et al. ∙

research

∙ 09/28/2018

Efficiently testing local optimality and escaping saddles for ReLU networks

We provide a theoretical algorithm for checking local optimality and esc...

18 Chulhee Yun, et al. ∙

research

∙ 06/26/2018

Random Shuffling Beats SGD after Finite Epochs

A long-standing problem in the theory of stochastic gradient descent (SG...

0 Jeff Z. HaoChen, et al. ∙

research

∙ 06/07/2018

Towards Riemannian Accelerated Gradient Methods

We propose a Riemannian version of Nesterov's Accelerated Gradient algor...

0 Hongyi Zhang, et al. ∙

research

∙ 05/01/2018

Direct Runge-Kutta Discretization Achieves Acceleration

We study gradient-based optimization methods obtained by directly discre...

0 Jingzhao Zhang, et al. ∙

research

∙ 03/27/2018

Non-Linear Temporal Subspace Representations for Activity Recognition

Representations that can compactly and effectively capture the temporal ...

0 Anoop Cherian, et al. ∙

research

∙ 02/15/2018

Learning Determinantal Point Processes by Sampling Inferred Negatives

Determinantal Point Processes (DPPs) have attracted significant interest...

0 Zelda Mariet, et al. ∙

research

∙ 02/10/2018

A Critical View of Global Optimality in Deep Learning

We investigate the loss surface of deep linear and nonlinear neural netw...

0 Chulhee Yun, et al. ∙

research

∙ 09/05/2017

A Generic Approach for Escaping Saddle points

A central challenge to using first-order methods for optimizing nonconve...

0 Sashank J Reddi, et al. ∙

research

∙ 07/11/2017

Unsupervised robust nonparametric learning of hidden community properties

We consider learning of fundamental properties of communities in large n...

0 Mikhail A. Langovoy, et al. ∙

research

∙ 07/08/2017

Global optimality conditions for deep neural networks

We study the error landscape of deep linear and nonlinear neural network...

0 Chulhee Yun, et al. ∙

research

∙ 06/10/2017

An Alternative to EM for Gaussian Mixture Models: Batch and Stochastic Riemannian Optimization

We consider maximum likelihood estimation for Gaussian Mixture Models (G...

0 Reshad Hosseini, et al. ∙

Suvrit Sra

Featured Co-authors

Sign in with Google

Consider DeepAI Pro