Q-Value Weighted Regression: Reinforcement Learning with Limited Data

02/12/2021
by   Piotr Kozakowski, et al.
41

Sample efficiency and performance in the offline setting have emerged as significant challenges of deep reinforcement learning. We introduce Q-Value Weighted Regression (QWR), a simple RL algorithm that excels in these aspects. QWR is an extension of Advantage Weighted Regression (AWR), an off-policy actor-critic algorithm that performs very well on continuous control tasks, also in the offline setting, but has low sample efficiency and struggles with high-dimensional observation spaces. We perform an analysis of AWR that explains its shortcomings and use these insights to motivate QWR. We show experimentally that QWR matches the state-of-the-art algorithms both on tasks with continuous and discrete actions. In particular, QWR yields results on par with SAC on the MuJoCo suite and - with the same set of hyperparameters - yields results on par with a highly tuned Rainbow implementation on a set of Atari games. We also verify that QWR performs well in the offline RL setting.

READ FULL TEXT

page 3

page 8

research
05/17/2021

Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Offline Reinforcement Learning promises to learn effective policies from...
research
10/01/2019

Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

In this paper, we aim to develop a simple and scalable reinforcement lea...
research
10/14/2021

Offline Reinforcement Learning with Soft Behavior Regularization

Most prior approaches to offline reinforcement learning (RL) utilize beh...
research
10/05/2021

You Only Evaluate Once: a Simple Baseline Algorithm for Offline RL

The goal of offline reinforcement learning (RL) is to find an optimal po...
research
10/23/2018

Efficient Eligibility Traces for Deep Reinforcement Learning

Eligibility traces are an effective technique to accelerate reinforcemen...
research
01/05/2023

Extreme Q-Learning: MaxEnt RL without Entropy

Modern Deep Reinforcement Learning (RL) algorithms require estimates of ...
research
09/16/2019

Deep Reinforcement Learning for Task-driven Discovery of Incomplete Networks

Complex networks are often either too large for full exploration, partia...

Please sign up or login with your details

Forgot password? Click here to reset