DeepAI AI Chat
Log In Sign Up

Q-Value Weighted Regression: Reinforcement Learning with Limited Data

by   Piotr Kozakowski, et al.

Sample efficiency and performance in the offline setting have emerged as significant challenges of deep reinforcement learning. We introduce Q-Value Weighted Regression (QWR), a simple RL algorithm that excels in these aspects. QWR is an extension of Advantage Weighted Regression (AWR), an off-policy actor-critic algorithm that performs very well on continuous control tasks, also in the offline setting, but has low sample efficiency and struggles with high-dimensional observation spaces. We perform an analysis of AWR that explains its shortcomings and use these insights to motivate QWR. We show experimentally that QWR matches the state-of-the-art algorithms both on tasks with continuous and discrete actions. In particular, QWR yields results on par with SAC on the MuJoCo suite and - with the same set of hyperparameters - yields results on par with a highly tuned Rainbow implementation on a set of Atari games. We also verify that QWR performs well in the offline RL setting.


page 3

page 8


Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Offline Reinforcement Learning promises to learn effective policies from...

Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

In this paper, we aim to develop a simple and scalable reinforcement lea...

Offline Reinforcement Learning with Soft Behavior Regularization

Most prior approaches to offline reinforcement learning (RL) utilize beh...

Efficient Eligibility Traces for Deep Reinforcement Learning

Eligibility traces are an effective technique to accelerate reinforcemen...

Deep Reinforcement Learning with Dynamic Optimism

In recent years, deep off-policy actor-critic algorithms have become a d...

Discrete and Continuous Action Representation for Practical RL in Video Games

While most current research in Reinforcement Learning (RL) focuses on im...

Deep Reinforcement Learning for Task-driven Discovery of Incomplete Networks

Complex networks are often either too large for full exploration, partia...