
OffPolicy Evaluation via the Regularized Lagrangian
The recently proposed distribution correction estimation (DICE) family o...
read it

RL Unplugged: Benchmarks for Offline Reinforcement Learning
Offline methods for reinforcement learning have the potential to help br...
read it

DeploymentEfficient Reinforcement Learning via ModelBased Offline Optimization
Most reinforcement learning (RL) algorithms assume online access to the ...
read it

D4RL: Datasets for Deep DataDriven Reinforcement Learning
The offline reinforcement learning (RL) problem, also referred to as bat...
read it

Datasets for DataDriven Reinforcement Learning
The offline reinforcement learning (RL) problem, also referred to as bat...
read it

BRPO: Batch Residual Policy Optimization
In batch reinforcement learning (RL), one often constrains a learned pol...
read it

Reinforcement Learning via FenchelRockafellar Duality
We review basic concepts of convex duality, focusing on the very general...
read it

Imitation Learning via OffPolicy Distribution Matching
When performing imitation learning from expert demonstrations, distribut...
read it

AlgaeDICE: Policy Gradient from Arbitrary Experience
In many realworld applications of reinforcement learning (RL), interact...
read it

Behavior Regularized Offline Reinforcement Learning
In reinforcement learning (RL) research, it is common to assume access t...
read it

Groupbased Fair Learning Leads to Counterintuitive Predictions
A number of machine learning (ML) methods have been proposed recently to...
read it

Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?
Hierarchical reinforcement learning has demonstrated significant success...
read it

MultiAgent Manipulation via Locomotion using Hierarchical Sim2Real
Manipulation and locomotion are closely related problems that are often ...
read it

DualDICE: BehaviorAgnostic Estimation of Discounted Stationary Distribution Corrections
In many realworld reinforcement learning applications, access to the en...
read it

DeepMDP: Learning Continuous Latent Space Models for Representation Learning
Many reinforcement learning (RL) tasks provide the agent with highdimen...
read it

Lyapunovbased Safe Policy Optimization for Continuous Control
We study continuous action reinforcement learning problems in which it i...
read it

Identifying and Correcting Label Bias in Machine Learning
Datasets often contain biases which unfairly disadvantage certain groups...
read it

The Laplacian in RL: Learning Representations with Efficient Approximations
The smallest eigenvectors of the graph Laplacian are wellknown to provi...
read it

NearOptimal Representation Learning for Hierarchical Reinforcement Learning
We study the problem of representation learning in goalconditioned hier...
read it

DataEfficient Hierarchical Reinforcement Learning
Hierarchical reinforcement learning (HRL) is a promising approach to ext...
read it

A Lyapunovbased Approach to Safe Reinforcement Learning
In many realworld reinforcement learning (RL) problems, besides optimiz...
read it

Smoothed Action Value Functions for Learning Gaussian Policies
Stateaction value functions (i.e., Qvalues) are ubiquitous in reinforc...
read it

Deep Reinforcement Learning for VisionBased Robotic Grasping: A Simulated Comparative Evaluation of OffPolicy Methods
In this paper, we explore deep reinforcement learning algorithms for vis...
read it

Path Consistency Learning in Tsallis Entropy Regularized MDPs
We study the sparse entropyregularized reinforcement learning (ERL) pro...
read it

MorphNet: Fast & Simple ResourceConstrained Structure Learning of Deep Networks
We present MorphNet, an approach to automate the design of neural networ...
read it

TrustPCL: An OffPolicy Trust Region Method for Continuous Control
Trust region methods, such as TRPO, are often used to stabilize policy o...
read it

Bridging the Gap Between Value and Policy Based Reinforcement Learning
We establish a new connection between value and policy based reinforceme...
read it

Improving Policy Gradient by Exploring Underappreciated Rewards
This paper presents a novel form of policy gradient for modelfree reinf...
read it
Ofir Nachum
is this you? claim profile