
Screening for a Reweighted Penalized Conditional Gradient Method
The conditional gradient method (CGM) is widely used in largescale spar...
read it

A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip
We introduce the continuized Nesterov acceleration, a close variant of N...
read it

Batch Normalization Orthogonalizes Representations in Deep Random Networks
This paper underlines a subtle property of batchnormalization (BN): Suc...
read it

MaxMargin is Dead, Long Live MaxMargin!
The foundational concept of MaxMargin in machine learning is illposed ...
read it

A Continuized View on Nesterov Acceleration
We introduce the "continuized" Nesterov acceleration, a close variant of...
read it

Disambiguation of weak supervision with exponential convergence rates
Machine learning approached through supervised learning requires expensi...
read it

Fast rates in structured prediction
Discrete supervised learning problems such as classification are often t...
read it

Finding Global Minima via Kernel Approximations
We consider the global minimization of smooth functions based solely on ...
read it

VarianceReduced Methods for Machine Learning
Stochastic optimization lies at the heart of machine learning, and its c...
read it

Deep Equals Shallow for ReLU Networks in Kernel Regimes
Deep networks are often considered to be more expressive than shallow on...
read it

Nonparametric Models for Nonnegative Functions
Linear models have shown great effectiveness and flexibility in many fie...
read it

Consistent Structured Prediction with MaxMin Margin Markov Networks
Maxmargin methods for binary classification such as the support vector ...
read it

DualFree Stochastic Decentralized Optimization with Variance Reduction
We consider the problem of training machine learning models on distribut...
read it

Structured and Localized Image Restoration
We present a novel approach to image restoration that leverages ideas fr...
read it

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model
In the context of statistical supervised learning, the noiseless linear ...
read it

Principled Analyses and Design of FirstOrder Methods with Inexact Proximal Operators
Proximal operations are among the most common primitives appearing in bo...
read it

ARIANN: LowInteraction PrivacyPreserving Deep Learning via Function Secret Sharing
We propose ARIANN, a lowinteraction framework to perform private traini...
read it

An Optimal Algorithm for Decentralized Finite Sum Optimization
Modern largescale finitesum optimization relies on two key aspects: di...
read it

Explicit Regularization of Stochastic Gradient Methods through Duality
We consider stochastic gradient methods under the interpolation regime w...
read it

On the Convergence of Adam and Adagrad
We provide a simple proof of the convergence of the optimization algorit...
read it

Structured Prediction with Partial Labelling through the Infimum Loss
Annotating datasets is one of the main costs in nowadays supervised lear...
read it

Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization
We consider the setting of distributed empirical risk minimization where...
read it

Safe Screening for the Generalized Conditional Gradient Method
The conditional gradient method (CGM) has been widely used for fast spar...
read it

Stochastic Optimization for Regularized Wasserstein Estimators
Optimal transport is a foundational problem in optimization, that allows...
read it

Learning with Differentiable Perturbed Optimizers
Machine learning pipelines often rely on optimization procedures to make...
read it

Implicit Bias of Gradient Descent for Wide Twolayer Neural Networks Trained with the Logistic Loss
Neural networks trained to minimize the logistic (a.k.a. crossentropy) ...
read it

On the Effectiveness of Richardson Extrapolation in Machine Learning
Richardson extrapolation is a classical technique from numerical analysi...
read it

Music Source Separation in the Waveform Domain
Source separation for music is the task of isolating contributions, or s...
read it

UniXGrad: A Universal, Adaptive Algorithm with Optimal Guarantees for Constrained Optimization
We propose a novel adaptive, accelerated algorithm for the stochastic co...
read it

Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed
We study the problem of source separation for music using deep learning ...
read it

Towards closing the gap between the theory and practice of SVRG
Among the very first variance reduced stochastic methods for solving the...
read it

Globally Convergent Newton Methods for Illconditioned Generalized Selfconcordant Losses
In this paper, we study largescale convex optimization algorithms based...
read it

MaxPlus Matching Pursuit for Deterministic Markov Decision Processes
We consider deterministic Markov decision processes (MDPs) and apply max...
read it

Fast Decomposable Submodular Function Minimization using Constrained Total Variation
We consider the problem of minimizing the sum of submodular set function...
read it

An Accelerated Decentralized Stochastic Proximal Algorithm for Finite Sums
Modern largescale finitesum optimization relies on two key aspects: di...
read it

Partially Encrypted Machine Learning using Functional Encryption
Machine learning on encrypted data has received a lot of attention thank...
read it

Implicit Regularization of Discrete Gradient Dynamics in Deep Linear Neural Networks
When optimizing overparameterized models, such as deep neural networks,...
read it

Unsupervised Image Matching and Object Discovery as Optimization
Learning with complete or partial supervision is powerful but relies on ...
read it

Efficient PrimalDual Algorithms for LargeScale Multiclass Classification
We develop efficient algorithms to train ℓ_1regularized linear classifi...
read it

Beyond LeastSquares: Fast Rates for Regularized Empirical Risk Minimization through SelfConcordance
We consider learning methods based on the regularization of a convex emp...
read it

A General Theory for Structured Prediction with Smooth Convex Surrogates
In this work we provide a theoretical framework for structured predictio...
read it

A Universal Algorithm for Variational Inequalities Adaptive to Smoothness and Noise
We consider variational inequalities coming from monotone operators, a s...
read it

Stochastic firstorder methods: nonasymptotic and computeraided analyses via potential functions
We provide a novel computerassisted technique for systematically analyz...
read it

Asynchronous Accelerated Proximal Stochastic Gradient for Strongly Convex Distributed Finite Sums
In this work, we study the problem of minimizing the sum of strongly con...
read it

Overcomplete Independent Component Analysis via SDP
We present a novel algorithm for overcomplete independent components ana...
read it

A Note on Lazy Training in Supervised Differentiable Programming
In a series of recent theoretical works, it has been shown that strongly...
read it

Massively scalable Sinkhorn distances via the Nyström method
The Sinkhorn distance, a variant of the Wasserstein distance with entrop...
read it

Marginal Weighted Maximum Loglikelihood for Efficient Learning of PerturbandMap models
We consider the structuredoutput prediction problem through probabilist...
read it

Approximating the Quadratic Transportation Metric in NearLinear Time
Computing the quadratic transportation metric (also called the 2Wassers...
read it

SING: SymboltoInstrument Neural Generator
Recent progress in deep learning for audio synthesis opens the way to mo...
read it