
Efficiently testing local optimality and escaping saddles for ReLU networks
We provide a theoretical algorithm for checking local optimality and esc...
read it

Finite sample expressive power of smallwidth ReLU networks
We study universal finite sample expressivity of neural networks, define...
read it

Strength from Weakness: Fast Learning Using Weak Supervision
We study generalization properties of weakly supervised learning. That i...
read it

Near Optimal Stratified Sampling
The performance of a machine learning system is usually evaluated by usi...
read it

Flexible Modeling of Diversity with Strongly LogConcave Distributions
Strongly logconcave (SLC) distributions are a rich class of discrete pr...
read it

Efficient Policy Learning for NonStationary MDPs under Adversarial Manipulation
A Markov Decision Process (MDP) is a popular model for reinforcement lea...
read it

Are deep ResNets provably better than linear predictors?
Recently, a residual network (ResNet) with a single residual block has b...
read it

A Generic Approach for Escaping Saddle points
A central challenge to using firstorder methods for optimizing nonconve...
read it

Global optimality conditions for deep neural networks
We study the error landscape of deep linear and nonlinear neural network...
read it

Diversity Networks: Neural Network Compression Using Determinantal Point Processes
We introduce Divnet, a flexible technique for learning networks with div...
read it

An Alternative to EM for Gaussian Mixture Models: Batch and Stochastic Riemannian Optimization
We consider maximum likelihood estimation for Gaussian Mixture Models (G...
read it

Polynomial Time Algorithms for Dual Volume Sampling
We study dual volume sampling, a method for selecting k columns from an ...
read it

Sequence Summarization Using Orderconstrained Kernelized Feature Subspaces
Representations that can compactly and effectively capture temporal evol...
read it

Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling
We study probability measures induced by set functions with constraints....
read it

Stochastic FrankWolfe Methods for Nonconvex Optimization
We study FrankWolfe methods for nonconvex stochastic and finitesum opt...
read it

Geometric Mean Metric Learning
We revisit the task of learning a Euclidean metric from data. We approac...
read it

Fast Sampling for Strongly Rayleigh Measures with Application to Determinantal Point Processes
In this note we consider sampling from (nonhomogeneous) strongly Raylei...
read it

Kronecker Determinantal Point Processes
Determinantal Point Processes (DPPs) are probabilistic models over all s...
read it

Fast Stochastic Methods for Nonsmooth Nonconvex Optimization
We analyze stochastic algorithms for optimizing nonconvex, nonsmooth fin...
read it

Directional Statistics in Machine Learning: a Brief Review
The modern data analyst must cope with data encoded in various forms, ve...
read it

Stochastic Variance Reduction for Nonconvex Optimization
We study nonconvex finitesum problems and analyze stochastic variance r...
read it

Fast Incremental Method for Nonconvex Optimization
We analyze a fast incremental aggregated gradient method for optimizing ...
read it

Firstorder Methods for Geodesically Convex Optimization
Geodesic convexity generalizes the notion of (vector space) convexity to...
read it

Gauss quadrature for matrix inverse forms with applications
We present a framework for accelerating a spectrum of machine learning a...
read it

AdaDelay: Delay Adaptive Distributed Stochastic Convex Optimization
We study distributed stochastic convex optimization under the delayed gr...
read it

Manifold Optimization for Gaussian Mixture Models
We take a new look at parameter estimation for Gaussian Mixture Models (...
read it

On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants
We study optimization algorithms based on variance reduction for stochas...
read it

Modular proximal optimization for multidimensional totalvariation regularization
One of the most frequently used notions of "structured sparsity" is that...
read it

Inference and Mixture Modeling with the Elliptical Gamma Distribution
We study modeling and inference with the Elliptical Gamma Distribution (...
read it

Parallel and Distributed BlockCoordinate FrankWolfe Algorithms
We develop parallel and distributed FrankWolfe algorithms; the former o...
read it

Randomized Nonlinear Component Analysis
Classical methods such as Principal Component Analysis (PCA) and Canonic...
read it

Statistical estimation for optimization problems on graphs
Large graphs abound in machine learning, data mining, and several relate...
read it

Riemannian Dictionary Learning and Sparse Coding for Positive Definite Matrices
Data encoded as symmetric positive definite (SPD) matrices frequently ar...
read it

Convex Optimization for Parallel Energy Minimization
Energy minimization has been an intensely studied core problem in comput...
read it

Fast projections onto mixednorm balls with applications
Joint sparsity offers powerful structural cues for feature selection, es...
read it

Positive definite matrices and the Sdivergence
Positive definite matrices abound in a dazzling variety of applications....
read it

Nonconvex proximal splitting: batch and incremental algorithms
Within the unmanageably large class of nonconvex optimization, we consid...
read it

Sparse Inverse Covariance Estimation via an Adaptive GradientBased Method
We study the problem of estimating from data, a sparse approximation to ...
read it

Combinatorial Topic Models using SmallVariance Asymptotics
Topic models have emerged as fundamental tools in unsupervised machine l...
read it

Unsupervised robust nonparametric learning of hidden community properties
We consider learning of fundamental properties of communities in large n...
read it

A Critical View of Global Optimality in Deep Learning
We investigate the loss surface of deep linear and nonlinear neural netw...
read it

Learning Determinantal Point Processes by Sampling Inferred Negatives
Determinantal Point Processes (DPPs) have attracted significant interest...
read it

NonLinear Temporal Subspace Representations for Activity Recognition
Representations that can compactly and effectively capture the temporal ...
read it

Direct RungeKutta Discretization Achieves Acceleration
We study gradientbased optimization methods obtained by directly discre...
read it

Towards Riemannian Accelerated Gradient Methods
We propose a Riemannian version of Nesterov's Accelerated Gradient algor...
read it

Random Shuffling Beats SGD after Finite Epochs
A longstanding problem in the theory of stochastic gradient descent (SG...
read it

RSPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate
We study smooth stochastic optimization problems on Riemannian manifolds...
read it

DeepRBF Networks Revisited: Robust Classification with Rejection
One of the main drawbacks of deep neural networks, like many other class...
read it

Escaping Saddle Points with Adaptive Gradient Methods
Adaptive methods such as Adam and RMSProp are widely used in deep learni...
read it

Analysis of Gradient Clipping and Adaptive Scaling with a Relaxed Smoothness Condition
We provide a theoretical explanation for the fast convergence of gradien...
read it
Suvrit Sra
is this you? claim profile
Researcher at Massachusetts Institute of Technology (MIT), Cofounder; Chief AI Officer at macroeyes