
Getting to the Point. Index Sets and ParallelismPreserving Autodiff for Pointful Array Programming
We present a novel programming language design that attempts to combine ...
read it

Complex Momentum for Learning in Games
We generalize gradient descent with momentum for learning in differentia...
read it

Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations
We perform scalable approximate inference in a recentlyproposed family ...
read it

Oops I Took A Gradient: Scalable Sampling for Discrete Distributions
We propose a general and scalable approximate sampling strategy for prob...
read it

SelfTuning Stochastic Optimization with CurvatureAware Gradient Filtering
Standard firstorder stochastic optimization algorithms base their updat...
read it

Teaching with Commentaries
Effective training of deep neural networks can be challenging, and there...
read it

No MCMC for me: Amortized sampling for fast and stable training of energybased models
EnergyBased Models (EBMs) present a flexible and appealing way to repre...
read it

A Study of Gradient Variance in Deep Learning
The impact of gradient noise on training deep models is widely acknowled...
read it

Learning Differential Equations that are Easy to Solve
Differential equations parameterized by neural networks become expensive...
read it

SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models
Standard variational lower bounds used to train latent variable models p...
read it

What went wrong and when? Instancewise Feature Importance for Timeseries Models
Multivariate time series models are poised to be used for decision suppo...
read it

Cutting out the MiddleMan: Training and Evaluating EnergyBased Models without Sampling
We present a new method for evaluating and training unnormalized density...
read it

Scalable Gradients for Stochastic Differential Equations
The adjoint sensitivity method scalably computes gradients of solutions ...
read it

Neural Networks with Cheap Differential Operators
Gradients of neural networks can be computed efficiently for any archite...
read it

Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One
We propose to reinterpret a standard discriminative classifier of p(yx)...
read it

Optimizing Millions of Hyperparameters by Implicit Differentiation
We propose an algorithm for inexpensive gradientbased hyperparameter op...
read it

Efficient Graph Generation with Graph Recurrent Attention Networks
We propose a new family of efficient and expressive deep generative mode...
read it

Understanding Undesirable Word Embedding Associations
Word embeddings are often criticized for capturing undesirable word asso...
read it

Latent ODEs for IrregularlySampled Time Series
Time series with nonuniform intervals occur in many applications, and a...
read it

Residual Flows for Invertible Generative Modeling
Flowbased generative models parameterize probability distributions thro...
read it

SelfTuning Networks: Bilevel Optimization of Hyperparameters using Structured BestResponse Functions
Hyperparameter optimization can be formulated as a bilevel optimization ...
read it

Invertible Residual Networks
Reversible deep networks provide useful theoretical guarantees and have ...
read it

Towards Understanding Linear Word Analogies
A surprising property of word vectors is that vector algebra can often b...
read it

FFJORD: Freeform Continuous Dynamics for Scalable Reversible Generative Models
A promising class of generative models maps points from a simple distrib...
read it

Stochastic Combinatorial Ensembles for Defending Against Adversarial Examples
Many deep learning algorithms can be easily fooled with simple adversari...
read it

Explaining Image Classifiers by Adaptive Dropout and Generative Infilling
Explanations of blackbox classifiers often rely on saliency maps, which...
read it

Scalable Recommender Systems through Recursive Evidence Chains
Recommender systems can be formulated as a matrix completion problem, pr...
read it

Neural Ordinary Differential Equations
We introduce a new family of deep neural network models. Instead of spec...
read it

Stochastic Hyperparameter Optimization through Hypernetworks
Machine learning models are often tuned by nesting optimization of model...
read it

Isolating Sources of Disentanglement in Variational Autoencoders
We decompose the evidence lower bound to show the existence of a term me...
read it

Inference Suboptimality in Variational Autoencoders
Amortized inference has led to efficient approximate inference for large...
read it

Generating and designing DNA with deep generative models
We propose generative neural network methods to generate DNA sequences a...
read it

Noisy Natural Gradient as Variational Inference
Combining the flexibility of deep learning with Bayesian uncertainty est...
read it

Reinterpreting ImportanceWeighted Autoencoders
The standard interpretation of importanceweighted autoencoders is that ...
read it

Sticking the Landing: Simple, LowerVariance Gradient Estimators for Variational Inference
We propose a simple and general variant of the standard reparameterized ...
read it

Neural networks for the prediction organic chemistry reactions
Reaction prediction remains one of the major challenges for organic chem...
read it

Composing graphical models with neural networks for structured representations and fast inference
We propose a general modeling and inference framework that composes prob...
read it

Convolutional Networks on Graphs for Learning Molecular Fingerprints
We introduce a convolutional neural network that operates directly on gr...
read it

Early Stopping is Nonparametric Variational Inference
We show that unconverged stochastic gradient descent can be interpreted ...
read it

Gradientbased Hyperparameter Optimization through Reversible Learning
Tuning hyperparameters of learning algorithms is hard because gradients ...
read it

Warped Mixtures for Nonparametric Cluster Shapes
A mixture of Gaussians fit to a single curved or heavytailed cluster wi...
read it

OptimallyWeighted Herding is Bayesian Quadrature
Herding and kernel herding are deterministic methods of choosing samples...
read it

Avoiding pathologies in very deep networks
Choosing appropriate architectures and regularization strategies for dee...
read it

Automatic Construction and NaturalLanguage Description of Nonparametric Regression Models
This paper presents the beginnings of an automatic statistician, focusin...
read it

Structure Discovery in Nonparametric Regression through Compositional Kernel Search
Despite its importance, choosing the structural form of the kernel in no...
read it

Additive Gaussian Processes
We introduce a Gaussian process model of functions which are additive. A...
read it