
Implicit Regularization in Deep Learning: A View from Function Space
We approach the problem of implicit regularization in deep learning from...
Stochastic Hamiltonian Gradient Methods for Smooth Games
The success of adversarial formulations in machine learning has brought ...
Differentiable Causal Discovery from Interventional Data
Discovering causal relationships in data is a challenging task that invo...
Adversarial Example Games
The existence of adversarial examples capable of fooling trained neural ...
Adaptive Gradient Methods Converge Faster with OverParameterization (and you can do a linesearch)
As adaptive gradient methods are typically used for training overparame...
An Analysis of the Adaptation Speed of Causal Models
We consider the problem of discovering the causal process that generated...
Stochastic Polyak Stepsize for SGD: An Adaptive Learning Rate for Fast Convergence
We propose a stochastic variant of the classical Polyak stepsize (Polya...
Accelerating Smooth Games by Manipulating Spectral Shapes
We use matrix iteration theory to characterize acceleration in smooth ga...
Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation
We consider stochastic second order methods for minimizing stronglyconv...
GEAR: GeometryAware Rényi Information
Shannon's seminal theory of information has been of paramount importance...
A Tight and Unified Analysis of Extragradient for a Whole Spectrum of Differentiable Games
We consider differentiable games: multiobjective minimization problems,...
A Closer Look at the Optimization Landscapes of Generative Adversarial Networks
Generative adversarial networks have been very successful in generative ...
GradientBased Neural DAG Learning
We propose a novel scorebased approach to learning a directed acyclic g...
Painless Stochastic Gradient: Interpolation, LineSearch, and Convergence Rates
Recent works have shown that stochastic gradient descent (SGD) achieves ...
Implicit Regularization of Discrete Gradient Dynamics in Deep Linear Neural Networks
When optimizing overparameterized models, such as deep neural networks,...
Reducing Noise in GAN Training with Variance Reduced Extragradient
Using large minibatches when training generative adversarial networks (...
Centroid Networks for FewShot Clustering and Unsupervised FewShot Classification
Traditional clustering algorithms such as Kmeans rely heavily on the na...
Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information
This paper offers a methodological contribution at the intersection of m...
Quantifying Learning Guarantees for Convex but Inconsistent Surrogates
We study consistency properties of machine learning methods based on min...
A Modern Take on the BiasVariance Tradeoff in Neural Networks
We revisit the biasvariance tradeoff for neural networks in light of mo...
Scattering Networks for Hybrid Representation Learning
Scattering networks are a class of designed Convolutional Neural Network...
Predicting Solution Summaries to Integer Linear Programs under Imperfect Information with Machine Learning
The paper provides a methodological contribution at the intersection of ...
Negative Momentum for Improved Game Dynamics
Games generalize the optimization paradigm by introducing different obje...
FrankWolfe Splitting via Augmented Lagrangian Method
Minimizing a function over an intersection of convex sets is an importan...
A Variational Inequality Perspective on Generative Adversarial Nets
Stability has been a recurrent issue in training generative adversarial ...
A3T: Adversarially Augmented Adversarial Training
Recent research showed that deep neural networks are highly sensitive to...
Improved asynchronous parallel optimization analysis for stochastic incremental methods
As datasets continue to increase in size and multicore computer archite...
Adaptive Stochastic Dual Coordinate Ascent for Conditional Random Fields
This work investigates training Conditional Random Fields (CRF) by Stoch...
Parametric Adversarial Divergences are Good Task Losses for Generative Modeling
Generative modeling of high dimensional data like images is a notoriousl...
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization
Due to their simplicity and excellent performance, parallel asynchronous...
A Closer Look at Memorization in Deep Networks
We examine the role of memorization in deep learning, drawing connection...
SEARNN: Training RNNs with GlobalLocal Losses
We propose SEARNN, a novel training algorithm for recurrent neural netwo...
On Structured Prediction Theory with Calibrated Convex Surrogate Losses
We provide novel theoretical insights on structured prediction in the co...
Joint Discovery of Object States and Manipulation Actions
Many human activities involve object manipulations aiming to modify the ...
FrankWolfe Algorithms for Saddle Point Problems
We extend the FrankWolfe (FW) optimization algorithm to solve constrain...
Convergence Rate of FrankWolfe for NonConvex Objectives
We give a simple proof that the FrankWolfe algorithm obtains a stationa...
ASAGA: Asynchronous Parallel SAGA
We describe ASAGA, an asynchronous parallel version of the incremental g...
Minding the Gaps for Block FrankWolfe Optimization of Structured SVMs
In this paper, we propose several improvements on the blockcoordinate F...
PACBayesian Theory Meets Bayesian Inference
We exhibit a strong link between frequentist PACBayesian risk bounds an...
Beyond CCA: Moment Matching for MultiView Models
We introduce three novel semiparametric extensions of probabilistic can...
On the Global Linear Convergence of FrankWolfe Optimization Variants
The FrankWolfe (FW) optimization algorithm has lately regained popular...
Barrier FrankWolfe for Marginal Inference
We introduce a globallyconvergent algorithm for optimizing the treerew...
Rethinking LDA: moment matching for discrete ICA
We consider moment matching techniques for estimation in Latent Dirichle...
Unsupervised Learning from Narrated Instruction Videos
We address the problem of automatically learning the main steps to compl...
Variance Reduced Stochastic Gradient Descent with Neighbors
Stochastic Gradient Descent (SGD) is a workhorse in machine learning, ye...
Sequential Kernel Herding: FrankWolfe Optimization for Particle Filtering
Recently, the FrankWolfe optimization algorithm was suggested as a proc...
On Pairwise Costs for Network Flow MultiObject Tracking
Multiobject tracking has been recently approached with the mincost net...
SAGA: A Fast Incremental Gradient Method With Support for NonStrongly Convex Composite Objectives
In this work we introduce a new optimisation method called SAGA in the s...
A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method
In this note, we present a new averaging technique for the projected sto...
BlockCoordinate FrankWolfe Optimization for Structural SVMs
We propose a randomized blockcoordinate variant of the classic FrankWo...
