
Unsupervised Image Matching and Object Discovery as Optimization
Learning with complete or partial supervision is powerful but relies on ...
read it

Learning with Differentiable Perturbed Optimizers
Machine learning pipelines often rely on optimization procedures to make...
read it

Structured Prediction with Partial Labelling through the Infimum Loss
Annotating datasets is one of the main costs in nowadays supervised lear...
read it

On the Convergence of Adam and Adagrad
We provide a simple proof of the convergence of the optimization algorit...
read it

Stochastic Optimization for Regularized Wasserstein Estimators
Optimal transport is a foundational problem in optimization, that allows...
read it

Partially Encrypted Machine Learning using Functional Encryption
Machine learning on encrypted data has received a lot of attention thank...
read it

On the Global Convergence of Gradient Descent for Overparameterized Models using Optimal Transport
Many tasks in machine learning and signal processing can be solved by mi...
read it

Globally Convergent Newton Methods for Illconditioned Generalized Selfconcordant Losses
In this paper, we study largescale convex optimization algorithms based...
read it

Localized Structured Prediction
Key to structured prediction is exploiting the problem structure to simp...
read it

Marginal Weighted Maximum Loglikelihood for Efficient Learning of PerturbandMap models
We consider the structuredoutput prediction problem through probabilist...
read it

Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed
We study the problem of source separation for music using deep learning ...
read it

A General Theory for Structured Prediction with Smooth Convex Surrogates
In this work we provide a theoretical framework for structured predictio...
read it

A Universal Algorithm for Variational Inequalities Adaptive to Smoothness and Noise
We consider variational inequalities coming from monotone operators, a s...
read it

Massively scalable Sinkhorn distances via the Nyström method
The Sinkhorn distance, a variant of the Wasserstein distance with entrop...
read it

Nonlinear Acceleration of Deep Neural Networks
Regularized nonlinear acceleration (RNA) is a generic extrapolation sche...
read it

Nonlinear Acceleration of CNNs
The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleratio...
read it

Overcomplete Independent Component Analysis via SDP
We present a novel algorithm for overcomplete independent components ana...
read it

Implicit Regularization of Discrete Gradient Dynamics in Deep Linear Neural Networks
When optimizing overparameterized models, such as deep neural networks,...
read it

Implicit Bias of Gradient Descent for Wide Twolayer Neural Networks Trained with the Logistic Loss
Neural networks trained to minimize the logistic (a.k.a. crossentropy) ...
read it

Fast Decomposable Submodular Function Minimization using Constrained Total Variation
We consider the problem of minimizing the sum of submodular set function...
read it

MaxPlus Matching Pursuit for Deterministic Markov Decision Processes
We consider deterministic Markov decision processes (MDPs) and apply max...
read it

AdaBatch: Efficient Gradient Aggregation Rules for Sequential and Parallel Stochastic Gradient Methods
We study a new aggregation operator for gradients coming from a minibat...
read it

A Generic Approach for Escaping Saddle points
A central challenge to using firstorder methods for optimizing nonconve...
read it

Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods
Our goal is to improve variance reducing stochastic methods through bett...
read it

Combinatorial Penalties: Which structures are preserved by convex relaxations?
We consider the homogeneous and the nonhomogeneous convex relaxations f...
read it

Efficient Algorithms for Nonconvex Isotonic Regression through Submodular Optimization
We consider the minimization of submodular functions subject to ordering...
read it

Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains
We consider the minimization of an objective function given access to un...
read it

On Structured Prediction Theory with Calibrated Convex Surrogate Losses
We provide novel theoretical insights on structured prediction in the co...
read it

Optimal algorithms for smooth and strongly convex distributed optimization in networks
In this paper, we determine the optimal convergence rates for strongly c...
read it

Stochastic Composite LeastSquares Regression with convergence rate O(1/n)
We consider the minimization of composite objective functions composed o...
read it

Learning Determinantal Point Processes in Sublinear Time
We propose a new class of determinantal point processes (DPPs) which can...
read it

Robust Discriminative Clustering with Sparse Regularizers
Clustering highdimensional data often requires some form of dimensional...
read it

Parameter Learning for Logsupermodular Distributions
We consider logsupermodular models on binary variables, which are proba...
read it

PACBayesian Theory Meets Bayesian Inference
We exhibit a strong link between frequentist PACBayesian risk bounds an...
read it

Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling
We study parameter inference in largescale latent variable models. We f...
read it

Beyond CCA: Moment Matching for MultiView Models
We introduce three novel semiparametric extensions of probabilistic can...
read it

Harder, Better, Faster, Stronger Convergence Rates for LeastSquares Regression
We consider the optimization of a quadratic objective function whose gra...
read it

Rethinking LDA: moment matching for discrete ICA
We consider moment matching techniques for estimation in Latent Dirichle...
read it

From Averaging to Acceleration, There is Only a Stepsize
We show that accelerated gradient descent, averaged gradient descent and...
read it

Learning the Structure for Structured Sparsity
Structured sparsity has recently emerged in statistics, machine learning...
read it

On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions
We show that kernelbased quadrature rules for computing integrals can b...
read it

Sequential Kernel Herding: FrankWolfe Optimization for Particle Filtering
Recently, the FrankWolfe optimization algorithm was suggested as a proc...
read it

Constant Step Size LeastMeanSquare: BiasVariance Tradeoffs and Optimal Sampling Distributions
We consider the leastsquares regression problem and provide a detailed ...
read it

Sparse and spurious: dictionary learning with noise and outliers
A popular approach within the signal processing and machine learning com...
read it

SAGA: A Fast Incremental Gradient Method With Support for NonStrongly Convex Composite Objectives
In this work we introduce a new optimisation method called SAGA in the s...
read it

On The Sample Complexity of Sparse Dictionary Learning
In the synthesis model signals are represented as a sparse combinations ...
read it

Sample Complexity of Dictionary Learning and other Matrix Factorizations
Many modern tools in machine learning and signal processing, such as spa...
read it

Minimizing Finite Sums with the Stochastic Average Gradient
We propose the stochastic average gradient (SAG) method for optimizing t...
read it

Nonstronglyconvex smooth stochastic approximation with convergence rate O(1/n)
We consider the stochastic approximation problem where a convex function...
read it

LargeMargin Metric Learning for Partitioning Problems
In this paper, we consider unsupervised partitioning problems, such as c...
read it