
Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization
Standard dynamics models for continuous control make use of feedforward ...
read it

Benchmarks for Deep OffPolicy Evaluation
Offpolicy evaluation (OPE) holds the promise of being able to leverage ...
read it

Offline Policy Selection under Uncertainty
The presence of uncertainty in policy evaluation significantly complicat...
read it

RL Unplugged: Benchmarks for Offline Reinforcement Learning
Offline methods for reinforcement learning have the potential to help br...
read it

DisARM: An Antithetic Gradient Estimator for Binary Latent Variables
Training models with discrete latent variables is challenging due to the...
read it

Conservative QLearning for Offline Reinforcement Learning
Effectively leveraging large, previously collected datasets in reinforce...
read it

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
In this tutorial article, we aim to provide the reader with the conceptu...
read it

D4RL: Datasets for Deep DataDriven Reinforcement Learning
The offline reinforcement learning (RL) problem, also referred to as bat...
read it

Datasets for DataDriven Reinforcement Learning
The offline reinforcement learning (RL) problem, also referred to as bat...
read it

MetaLearning without Memorization
The ability to learn new concepts with small amounts of data is a critic...
read it

Behavior Regularized Offline Reinforcement Learning
In reinforcement learning (RL) research, it is common to assume access t...
read it

Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse
Posterior collapse in Variational Autoencoders (VAEs) arises when the va...
read it

EnergyInspired Models: Learning with SamplerInduced Distributions
Energybased models (EBMs) are powerful probabilistic models, but suffer...
read it

Reinforcement Learning Driven Heuristic Optimization
Heuristic algorithms such as simulated annealing, Concorde, and METIS ar...
read it

Stabilizing OffPolicy QLearning via Bootstrapping Error Reduction
Offpolicy reinforcement learning aims to leverage experience collected ...
read it

On Variational Bounds of Mutual Information
Estimating and optimizing Mutual Information (MI) is core to many proble...
read it

ModelBased Reinforcement Learning for Atari
Modelfree reinforcement learning (RL) can be used to learn effective po...
read it

Learning to Walk via Deep Reinforcement Learning
Deep reinforcement learning suggests the promise of fully automated lear...
read it

Soft ActorCritic Algorithms and Applications
Modelfree deep reinforcement learning (RL) algorithms have been success...
read it

The Laplacian in RL: Learning Representations with Efficient Approximations
The smallest eigenvectors of the graph Laplacian are wellknown to provi...
read it

Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives
Deep latent variable models have become a popular model choice due to th...
read it

SampleEfficient Reinforcement Learning with Stochastic Ensemble Value Expansion
Integrating modelfree and modelbased approaches in reinforcement learn...
read it

Guided evolutionary strategies: escaping the curse of dimensionality in random search
Many applications in machine learning require optimizing a function whos...
read it

Smoothed Action Value Functions for Learning Gaussian Policies
Stateaction value functions (i.e., Qvalues) are ubiquitous in reinforc...
read it

The Mirage of ActionDependent Baselines in Reinforcement Learning
Policy gradient methods are a widely used class of modelfree reinforcem...
read it

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling
Recent advances in deep reinforcement learning have made significant str...
read it

An online sequencetosequence model for noisy speech recognition
Generative models have long been the dominant approach for speech recogn...
read it

Filtering Variational Objectives
When used as a surrogate objective for maximum likelihood estimation in ...
read it

Learning Hard Alignments with Variational Inference
There has recently been significant interest in hard attention models fo...
read it

MaxPooling Loss Training of Long ShortTerm Memory Networks for SmallFootprint Keyword Spotting
We propose a maxpooling based loss function for training Long ShortTer...
read it

REBAR: Lowvariance, unbiased gradient estimates for discrete latent variable models
Learning in models with discrete latent variables is challenging due to ...
read it

Regularizing Neural Networks by Penalizing Confident Output Distributions
We systematically explore regularizing neural networks by penalizing low...
read it

Compacting Neural Network Classifiers via Dropout Training
We introduce dropout compaction, a novel method for training feedforwar...
read it