
On the Convergence of Adam and Adagrad
We provide a simple proof of the convergence of the optimization algorit...
read it

Music Source Separation in the Waveform Domain
Source separation for music is the task of isolating contributions, or s...
read it

Symplectic Recurrent Neural Networks
We propose Symplectic Recurrent Neural Networks (SRNNs) as learning algo...
read it

Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed
We study the problem of source separation for music using deep learning ...
read it

Invariant Risk Minimization
We introduce Invariant Risk Minimization (IRM), a learning paradigm to e...
read it

Scaling Laws for the Principled Design, Initialization and Preconditioning of ReLU Networks
In this work, we describe a set of rules for the design and initializati...
read it

Cold Case: The Lost MNIST Digits
Although the popular MNIST dataset [LeCun et al., 1994] is derived from ...
read it

Controlling Covariate Shift using Equilibrium Normalization of Weights
We introduce a new normalization technique that exhibits the fast conver...
read it

On the Ineffectiveness of Variance Reduced Optimization for Deep Learning
The application of stochastic variance reduction to optimization has sho...
read it

SING: SymboltoInstrument Neural Generator
Recent progress in deep learning for audio synthesis opens the way to mo...
read it

AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization
Adaptive gradient methods such as AdaGrad and its variants update the st...
read it

WNGrad: Learn the Learning Rate in Gradient Descent
Adjusting the learning rate schedule in stochastic gradient methods is a...
read it

Adversarial Vulnerability of Neural Networks Increases With Input Dimension
Over the past four years, neural networks have proven vulnerable to adve...
read it

Geometrical Insights for Implicit Generative Modeling
Learning algorithms for implicit generative models can optimize a variet...
read it

Diagonal Rescaling For Neural Networks
We define a secondorder neural network stochastic gradient training alg...
read it

Wasserstein GAN
We introduce a new algorithm named WGAN, an alternative to traditional G...
read it

Towards Principled Methods for Training Generative Adversarial Networks
The goal of this paper is not to introduce a single algorithm or method,...
read it

Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond
We look at the eigenvalues of the Hessian of a loss function before and ...
read it

Optimization Methods for LargeScale Machine Learning
This paper provides a review and commentary on the past, present, and fu...
read it

Discovering Causal Signals in Images
This paper establishes the existence of observable footprints that revea...
read it

Unifying distillation and privileged information
Distillation (Hinton et al., 2015) and privileged information (Vapnik & ...
read it

No Regret Bound for Extreme Bandits
Algorithms for hyperparameter optimization abound, all of which work wel...
read it

A Lower Bound for the Optimization of Finite Sums
This paper presents a lower bound for optimizing a finite sum of n funct...
read it

ICE: Enabling NonExperts to Build Models Interactively for LargeScale Lopsided Problems
Quick interaction between a human teacher and a learning machine present...
read it

Counterfactual Reasoning and Learning Systems
This work shows how to leverage causal inference to understand the behav...
read it

From Machine Learning to Machine Reasoning
A plausible definition of "reasoning" could be "algebraically manipulati...
read it
Leon Bottou
is this you? claim profile
Diplôme d'Ingénieur from the École Polytechnique (X84) in 1987, the Master of Mathematics, Applied Mathematics and Computer Science from Ecole Normale Supérieure in 1988, and a PhD in computer science from University of ParisSud in 1991 I went to AT & T Bell Laboratories, AT & T Labs, NEC Labs America, and Microsoft Research. I joined the Facebook AI Research in March 2015.