DeepAI AI Chat
Log In Sign Up

Critic Regularized Regression

by   Ziyu Wang, et al.

Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction. It addresses challenges with regard to the cost of data collection and safety, both of which are particularly pertinent to real-world applications of RL. Unfortunately, most off-policy algorithms perform poorly when learning from a fixed dataset. In this paper, we propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR). We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces – outperforming several state-of-the-art offline RL algorithms by a significant margin on a wide range of benchmark tasks.


page 5

page 18

page 19


POPO: Pessimistic Offline Policy Optimization

Offline reinforcement learning (RL), also known as batch RL, aims to opt...

Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning

In real world, affecting the environment by a weak policy can be expensi...

Launchpad: Learning to Schedule Using Offline and Online RL Methods

Deep reinforcement learning algorithms have succeeded in several challen...

Behavior Regularized Offline Reinforcement Learning

In reinforcement learning (RL) research, it is common to assume access t...

Deep Offline Reinforcement Learning for Real-World Treatment Optimization Applications

There is increasing interest in data-driven approaches for dynamically c...

Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL

Offline reinforcement learning (RL) offers an appealing approach to real...

Interpretable performance analysis towards offline reinforcement learning: A dataset perspective

Offline reinforcement learning (RL) has increasingly become the focus of...

Code Repositories