
Benchmarks for Deep OffPolicy Evaluation
Offpolicy evaluation (OPE) holds the promise of being able to leverage ...
COMBO: Conservative Offline ModelBased Policy Optimization
Modelbased algorithms, which learn a dynamics model from logged experie...
COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning
Reinforcement learning has been applied to a wide variety of robotics pr...
Implicit UnderParameterization Inhibits DataEfficient Deep Reinforcement Learning
We identify an implicit underparameterization phenomenon in valuebased...
Conservative Safety Critics for Exploration
Safe exploration presents a major challenge in reinforcement learning (R...
One Solution is Not All You Need: FewShot Extrapolation via Structured MaxEnt RL
While reinforcement learning algorithms can learn effective policies for...
OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning
Reinforcement learning (RL) has achieved impressive performance in a var...
Conservative QLearning for Offline Reinforcement Learning
Effectively leveraging large, previously collected datasets in reinforce...
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
In this tutorial article, we aim to provide the reader with the conceptu...
D4RL: Datasets for Deep DataDriven Reinforcement Learning
The offline reinforcement learning (RL) problem, also referred to as bat...
Datasets for DataDriven Reinforcement Learning
The offline reinforcement learning (RL) problem, also referred to as bat...
DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction
Deep reinforcement learning can learn effective policies for a wide rang...
RewardConditioned Policies
Reinforcement learning offers the promise of automating the acquisition ...
Model Inversion Networks for ModelBased Optimization
In this work, we aim to solve datadriven optimization problems, where t...
AdvantageWeighted Regression: Simple and Scalable OffPolicy Reinforcement Learning
In this paper, we aim to develop a simple and scalable reinforcement lea...
Stabilizing OffPolicy QLearning via Bootstrapping Error Reduction
Offpolicy reinforcement learning aims to leverage experience collected ...
Graph Normalizing Flows
We introduce graph normalizing flows: a new, reversible graph neural net...
Calibration of Encoder Decoder Models for Neural Machine Translation
We study the calibration of several state of the art neural machine tran...
Diagnosing Bottlenecks in Deep Qlearning Algorithms
Qlearning methods represent a commonly used class of algorithms in rein...
Aviral Kumar
