
Benchmarks for Deep OffPolicy Evaluation
Offpolicy evaluation (OPE) holds the promise of being able to leverage ...
read it

COMBO: Conservative Offline ModelBased Policy Optimization
Modelbased algorithms, which learn a dynamics model from logged experie...
read it

COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning
Reinforcement learning has been applied to a wide variety of robotics pr...
read it

Implicit UnderParameterization Inhibits DataEfficient Deep Reinforcement Learning
We identify an implicit underparameterization phenomenon in valuebased...
read it

Conservative Safety Critics for Exploration
Safe exploration presents a major challenge in reinforcement learning (R...
read it

One Solution is Not All You Need: FewShot Extrapolation via Structured MaxEnt RL
While reinforcement learning algorithms can learn effective policies for...
read it

OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning
Reinforcement learning (RL) has achieved impressive performance in a var...
read it

Conservative QLearning for Offline Reinforcement Learning
Effectively leveraging large, previously collected datasets in reinforce...
read it

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
In this tutorial article, we aim to provide the reader with the conceptu...
read it

D4RL: Datasets for Deep DataDriven Reinforcement Learning
The offline reinforcement learning (RL) problem, also referred to as bat...
read it

Datasets for DataDriven Reinforcement Learning
The offline reinforcement learning (RL) problem, also referred to as bat...
read it

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction
Deep reinforcement learning can learn effective policies for a wide rang...
read it

RewardConditioned Policies
Reinforcement learning offers the promise of automating the acquisition ...
read it

Model Inversion Networks for ModelBased Optimization
In this work, we aim to solve datadriven optimization problems, where t...
read it

AdvantageWeighted Regression: Simple and Scalable OffPolicy Reinforcement Learning
In this paper, we aim to develop a simple and scalable reinforcement lea...
read it

Stabilizing OffPolicy QLearning via Bootstrapping Error Reduction
Offpolicy reinforcement learning aims to leverage experience collected ...
read it

Graph Normalizing Flows
We introduce graph normalizing flows: a new, reversible graph neural net...
read it

Calibration of Encoder Decoder Models for Neural Machine Translation
We study the calibration of several state of the art neural machine tran...
read it

Diagnosing Bottlenecks in Deep Qlearning Algorithms
Qlearning methods represent a commonly used class of algorithms in rein...
read it
Aviral Kumar
is this you? claim profile