Critic Regularized Regression

06/26/2020
by   Ziyu Wang, et al.
32

Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction. It addresses challenges with regard to the cost of data collection and safety, both of which are particularly pertinent to real-world applications of RL. Unfortunately, most off-policy algorithms perform poorly when learning from a fixed dataset. In this paper, we propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR). We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces – outperforming several state-of-the-art offline RL algorithms by a significant margin on a wide range of benchmark tasks.

READ FULL TEXT

page 5

page 18

page 19

12/26/2020

POPO: Pessimistic Offline Policy Optimization

Offline reinforcement learning (RL), also known as batch RL, aims to opt...
11/08/2021

Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning

In real world, affecting the environment by a weak policy can be expensi...
11/26/2019

Behavior Regularized Offline Reinforcement Learning

In reinforcement learning (RL) research, it is common to assume access t...
05/14/2022

QHD: A brain-inspired hyperdimensional reinforcement learning algorithm

Reinforcement Learning (RL) has opened up new opportunities to solve a w...
02/23/2021

MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning

In many contemporary applications such as healthcare, finance, robotics,...
05/12/2021

Interpretable performance analysis towards offline reinforcement learning: A dataset perspective

Offline reinforcement learning (RL) has increasingly become the focus of...
06/05/2020

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization

Most reinforcement learning (RL) algorithms assume online access to the ...

Code Repositories