
Unsupervised Image Matching and Object Discovery as Optimization
Learning with complete or partial supervision is powerful but relies on ...
Learning with Differentiable Perturbed Optimizers
Machine learning pipelines often rely on optimization procedures to make...
Structured Prediction with Partial Labelling through the Infimum Loss
Annotating datasets is one of the main costs in nowadays supervised lear...
On the Convergence of Adam and Adagrad
We provide a simple proof of the convergence of the optimization algorit...
Stochastic Optimization for Regularized Wasserstein Estimators
Optimal transport is a foundational problem in optimization, that allows...
Partially Encrypted Machine Learning using Functional Encryption
Machine learning on encrypted data has received a lot of attention thank...
On the Global Convergence of Gradient Descent for Overparameterized Models using Optimal Transport
Many tasks in machine learning and signal processing can be solved by mi...
Globally Convergent Newton Methods for Illconditioned Generalized Selfconcordant Losses
In this paper, we study largescale convex optimization algorithms based...
Localized Structured Prediction
Key to structured prediction is exploiting the problem structure to simp...
Marginal Weighted Maximum Loglikelihood for Efficient Learning of PerturbandMap models
We consider the structuredoutput prediction problem through probabilist...
Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed
We study the problem of source separation for music using deep learning ...
A General Theory for Structured Prediction with Smooth Convex Surrogates
In this work we provide a theoretical framework for structured predictio...
A Universal Algorithm for Variational Inequalities Adaptive to Smoothness and Noise
We consider variational inequalities coming from monotone operators, a s...
Massively scalable Sinkhorn distances via the Nyström method
The Sinkhorn distance, a variant of the Wasserstein distance with entrop...
Nonlinear Acceleration of Deep Neural Networks
Regularized nonlinear acceleration (RNA) is a generic extrapolation sche...
Nonlinear Acceleration of CNNs
The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleratio...
Overcomplete Independent Component Analysis via SDP
We present a novel algorithm for overcomplete independent components ana...
Implicit Regularization of Discrete Gradient Dynamics in Deep Linear Neural Networks
When optimizing overparameterized models, such as deep neural networks,...
Implicit Bias of Gradient Descent for Wide Twolayer Neural Networks Trained with the Logistic Loss
Neural networks trained to minimize the logistic (a.k.a. crossentropy) ...
Fast Decomposable Submodular Function Minimization using Constrained Total Variation
We consider the problem of minimizing the sum of submodular set function...
MaxPlus Matching Pursuit for Deterministic Markov Decision Processes
We consider deterministic Markov decision processes (MDPs) and apply max...
AdaBatch: Efficient Gradient Aggregation Rules for Sequential and Parallel Stochastic Gradient Methods
We study a new aggregation operator for gradients coming from a minibat...
A Generic Approach for Escaping Saddle points
A central challenge to using firstorder methods for optimizing nonconve...
Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods
Our goal is to improve variance reducing stochastic methods through bett...
Combinatorial Penalties: Which structures are preserved by convex relaxations?
We consider the homogeneous and the nonhomogeneous convex relaxations f...
Efficient Algorithms for Nonconvex Isotonic Regression through Submodular Optimization
We consider the minimization of submodular functions subject to ordering...
Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains
We consider the minimization of an objective function given access to un...
On Structured Prediction Theory with Calibrated Convex Surrogate Losses
We provide novel theoretical insights on structured prediction in the co...
Optimal algorithms for smooth and strongly convex distributed optimization in networks
In this paper, we determine the optimal convergence rates for strongly c...
Stochastic Composite LeastSquares Regression with convergence rate O(1/n)
We consider the minimization of composite objective functions composed o...
Learning Determinantal Point Processes in Sublinear Time
We propose a new class of determinantal point processes (DPPs) which can...
Robust Discriminative Clustering with Sparse Regularizers
Clustering highdimensional data often requires some form of dimensional...
Parameter Learning for Logsupermodular Distributions
We consider logsupermodular models on binary variables, which are proba...
PACBayesian Theory Meets Bayesian Inference
We exhibit a strong link between frequentist PACBayesian risk bounds an...
Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling
We study parameter inference in largescale latent variable models. We f...
Beyond CCA: Moment Matching for MultiView Models
We introduce three novel semiparametric extensions of probabilistic can...
Harder, Better, Faster, Stronger Convergence Rates for LeastSquares Regression
We consider the optimization of a quadratic objective function whose gra...
Rethinking LDA: moment matching for discrete ICA
We consider moment matching techniques for estimation in Latent Dirichle...
From Averaging to Acceleration, There is Only a Stepsize
We show that accelerated gradient descent, averaged gradient descent and...
Learning the Structure for Structured Sparsity
Structured sparsity has recently emerged in statistics, machine learning...
On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions
We show that kernelbased quadrature rules for computing integrals can b...
Sequential Kernel Herding: FrankWolfe Optimization for Particle Filtering
Recently, the FrankWolfe optimization algorithm was suggested as a proc...
Constant Step Size LeastMeanSquare: BiasVariance Tradeoffs and Optimal Sampling Distributions
We consider the leastsquares regression problem and provide a detailed ...
Sparse and spurious: dictionary learning with noise and outliers
A popular approach within the signal processing and machine learning com...
SAGA: A Fast Incremental Gradient Method With Support for NonStrongly Convex Composite Objectives
In this work we introduce a new optimisation method called SAGA in the s...
On The Sample Complexity of Sparse Dictionary Learning
In the synthesis model signals are represented as a sparse combinations ...
Sample Complexity of Dictionary Learning and other Matrix Factorizations
Many modern tools in machine learning and signal processing, such as spa...
Minimizing Finite Sums with the Stochastic Average Gradient
We propose the stochastic average gradient (SAG) method for optimizing t...
Nonstronglyconvex smooth stochastic approximation with convergence rate O(1/n)
We consider the stochastic approximation problem where a convex function...
LargeMargin Metric Learning for Partitioning Problems
In this paper, we consider unsupervised partitioning problems, such as c...
