
-
Why do classifier accuracies show linear trends under distribution shift?
Several recent studies observed that when classification models are eval...
read it
-
Provably Efficient Online Agnostic Learning in Markov Games
We study online agnostic learning, a problem that arises in episodic mul...
read it
-
Coping with Label Shift via Distributionally Robust Optimisation
The label shift problem refers to the supervised learning setting where ...
read it
-
Contrastive Learning with Hard Negative Samples
We consider the question: how can you sample good negative examples for ...
read it
-
Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes
We study minimax optimal reinforcement learning in episodic factored Mar...
read it
-
SGD with shuffling: optimal rates without component convexity and large epoch requirements
We study without-replacement SGD for solving finite-sum optimization pro...
read it
-
Stochastic Optimization with Non-stationary Noise
We investigate stochastic optimization problems under relaxed assumption...
read it
-
On Tight Convergence Rates of Without-replacement SGD
For solving finite-sum optimization problems, SGD without replacement sa...
read it
-
Strength from Weakness: Fast Learning Using Weak Supervision
We study generalization properties of weakly supervised learning. That i...
read it
-
On Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions
We provide the first non-asymptotic analysis for finding stationary poin...
read it
-
From Nesterov's Estimate Sequence to Riemannian Acceleration
We propose the first global accelerated gradient method for Riemannian m...
read it
-
Why ADAM Beats SGD for Attention Models
While stochastic gradient descent (SGD) is still the de facto algorithm ...
read it
-
Metrics Induced by Quantum Jensen-Shannon-Renyí and Related Divergences
We study symmetric divergences on Hermitian positive definite matrices g...
read it
-
Nonconvex stochastic optimization on manifolds via Riemannian Frank-Wolfe methods
We study stochastic projection-free methods for constrained optimization...
read it
-
Efficient Policy Learning for Non-Stationary MDPs under Adversarial Manipulation
A Markov Decision Process (MDP) is a popular model for reinforcement lea...
read it
-
Are deep ResNets provably better than linear predictors?
Recently, a residual network (ResNet) with a single residual block has b...
read it
-
Near Optimal Stratified Sampling
The performance of a machine learning system is usually evaluated by usi...
read it
-
Flexible Modeling of Diversity with Strongly Log-Concave Distributions
Strongly log-concave (SLC) distributions are a rich class of discrete pr...
read it
-
Analysis of Gradient Clipping and Adaptive Scaling with a Relaxed Smoothness Condition
We provide a theoretical explanation for the fast convergence of gradien...
read it
-
Escaping Saddle Points with Adaptive Gradient Methods
Adaptive methods such as Adam and RMSProp are widely used in deep learni...
read it
-
Deep-RBF Networks Revisited: Robust Classification with Rejection
One of the main drawbacks of deep neural networks, like many other class...
read it
-
R-SPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate
We study smooth stochastic optimization problems on Riemannian manifolds...
read it
-
Finite sample expressive power of small-width ReLU networks
We study universal finite sample expressivity of neural networks, define...
read it
-
Efficiently testing local optimality and escaping saddles for ReLU networks
We provide a theoretical algorithm for checking local optimality and esc...
read it
-
Random Shuffling Beats SGD after Finite Epochs
A long-standing problem in the theory of stochastic gradient descent (SG...
read it
-
Towards Riemannian Accelerated Gradient Methods
We propose a Riemannian version of Nesterov's Accelerated Gradient algor...
read it
-
Direct Runge-Kutta Discretization Achieves Acceleration
We study gradient-based optimization methods obtained by directly discre...
read it
-
Non-Linear Temporal Subspace Representations for Activity Recognition
Representations that can compactly and effectively capture the temporal ...
read it
-
Learning Determinantal Point Processes by Sampling Inferred Negatives
Determinantal Point Processes (DPPs) have attracted significant interest...
read it
-
A Critical View of Global Optimality in Deep Learning
We investigate the loss surface of deep linear and nonlinear neural netw...
read it
-
A Generic Approach for Escaping Saddle points
A central challenge to using first-order methods for optimizing nonconve...
read it
-
Unsupervised robust nonparametric learning of hidden community properties
We consider learning of fundamental properties of communities in large n...
read it
-
Global optimality conditions for deep neural networks
We study the error landscape of deep linear and nonlinear neural network...
read it
-
An Alternative to EM for Gaussian Mixture Models: Batch and Stochastic Riemannian Optimization
We consider maximum likelihood estimation for Gaussian Mixture Models (G...
read it
-
Sequence Summarization Using Order-constrained Kernelized Feature Subspaces
Representations that can compactly and effectively capture temporal evol...
read it
-
Polynomial Time Algorithms for Dual Volume Sampling
We study dual volume sampling, a method for selecting k columns from an ...
read it
-
Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling
We study probability measures induced by set functions with constraints....
read it
-
Stochastic Frank-Wolfe Methods for Nonconvex Optimization
We study Frank-Wolfe methods for nonconvex stochastic and finite-sum opt...
read it
-
Geometric Mean Metric Learning
We revisit the task of learning a Euclidean metric from data. We approac...
read it
-
Fast Sampling for Strongly Rayleigh Measures with Application to Determinantal Point Processes
In this note we consider sampling from (non-homogeneous) strongly Raylei...
read it
-
Kronecker Determinantal Point Processes
Determinantal Point Processes (DPPs) are probabilistic models over all s...
read it
-
Fast Stochastic Methods for Nonsmooth Nonconvex Optimization
We analyze stochastic algorithms for optimizing nonconvex, nonsmooth fin...
read it
-
Directional Statistics in Machine Learning: a Brief Review
The modern data analyst must cope with data encoded in various forms, ve...
read it
-
Combinatorial Topic Models using Small-Variance Asymptotics
Topic models have emerged as fundamental tools in unsupervised machine l...
read it
-
Stochastic Variance Reduction for Nonconvex Optimization
We study nonconvex finite-sum problems and analyze stochastic variance r...
read it
-
Fast Incremental Method for Nonconvex Optimization
We analyze a fast incremental aggregated gradient method for optimizing ...
read it
-
First-order Methods for Geodesically Convex Optimization
Geodesic convexity generalizes the notion of (vector space) convexity to...
read it
-
Gauss quadrature for matrix inverse forms with applications
We present a framework for accelerating a spectrum of machine learning a...
read it
-
Diversity Networks: Neural Network Compression Using Determinantal Point Processes
We introduce Divnet, a flexible technique for learning networks with div...
read it
-
AdaDelay: Delay Adaptive Distributed Stochastic Convex Optimization
We study distributed stochastic convex optimization under the delayed gr...
read it