Model-Augmented Q-learning

02/07/2021
by   Youngmin Oh, et al.
0

In recent years, Q-learning has become indispensable for model-free reinforcement learning (MFRL). However, it suffers from well-known problems such as under- and overestimation bias of the value, which may adversely affect the policy learning. To resolve this issue, we propose a MFRL framework that is augmented with the components of model-based RL. Specifically, we propose to estimate not only the Q-values but also both the transition and the reward with a shared network. We further utilize the estimated reward from the model estimators for Q-learning, which promotes interaction between the estimators. We show that the proposed scheme, called Model-augmented Q-learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward. Finally, we also provide a trick to prioritize past experiences in the replay buffer by utilizing model-estimation errors. We experimentally validate MQL built upon state-of-the-art off-policy MFRL methods, and show that MQL largely improves their performance and convergence. The proposed scheme is simple to implement and does not require additional training cost.

READ FULL TEXT

page 2

page 14

page 15

research
01/09/2020

Population-Guided Parallel Policy Search for Reinforcement Learning

In this paper, a new population-guided parallel learning scheme is propo...
research
10/05/2021

Imaginary Hindsight Experience Replay: Curious Model-based Learning for Sparse Reward Tasks

Model-based reinforcement learning is a promising learning strategy for ...
research
11/28/2019

Augmented Random Search for Quadcopter Control: An alternative to Reinforcement Learning

Model-based reinforcement learning strategies are believed to exhibit mo...
research
07/19/2013

Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation

The goal of reinforcement learning (RL) is to let an agent learn an opti...
research
05/22/2023

TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning via Transition Occupancy Matching

Standard model-based reinforcement learning (MBRL) approaches fit a tran...
research
10/25/2021

Operator Augmentation for Model-based Policy Evaluation

In model-based reinforcement learning, the transition matrix and reward ...
research
05/09/2021

Improving Cost Learning for JPEG Steganography by Exploiting JPEG Domain Knowledge

Although significant progress in automatic learning of steganographic co...

Please sign up or login with your details

Forgot password? Click here to reset