Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation

07/19/2013
by   Syogo Mori, et al.
0

The goal of reinforcement learning (RL) is to let an agent learn an optimal control policy in an unknown environment so that future expected rewards are maximized. The model-free RL approach directly learns the policy based on data samples. Although using many samples tends to improve the accuracy of policy learning, collecting a large number of samples is often expensive in practice. On the other hand, the model-based RL approach first estimates the transition model of the environment and then learns the policy based on the estimated transition model. Thus, if the transition model is accurately learned from a small amount of data, the model-based approach can perform better than the model-free approach. In this paper, we propose a novel model-based RL method by combining a recently proposed model-free policy search method called policy gradients with parameter-based exploration and the state-of-the-art transition model estimator called least-squares conditional density estimation. Through experiments, we demonstrate the practical usefulness of the proposed method.

READ FULL TEXT

page 16

page 17

page 18

page 19

research
04/26/2023

Reinforcement Learning with Partial Parametric Model Knowledge

We adapt reinforcement learning (RL) methods for continuous control to b...
research
11/18/2018

Policy Optimization with Model-based Explorations

Model-free reinforcement learning methods such as the Proximal Policy Op...
research
01/15/2022

Physical Derivatives: Computing policy gradients by physical forward-propagation

Model-free and model-based reinforcement learning are two ends of a spec...
research
04/09/2021

Learning to Reweight Imaginary Transitions for Model-Based Reinforcement Learning

Model-based reinforcement learning (RL) is more sample efficient than mo...
research
08/24/2023

Bayesian Exploration Networks

Bayesian reinforcement learning (RL) offers a principled and elegant app...
research
05/22/2019

COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration

Data efficiency and robustness to task-irrelevant perturbations are long...
research
02/07/2021

Model-Augmented Q-learning

In recent years, Q-learning has become indispensable for model-free rein...

Please sign up or login with your details

Forgot password? Click here to reset