Learning to Reweight Imaginary Transitions for Model-Based Reinforcement Learning

04/09/2021
by   Wenzhen Huang, et al.
0

Model-based reinforcement learning (RL) is more sample efficient than model-free RL by using imaginary trajectories generated by the learned dynamics model. When the model is inaccurate or biased, imaginary trajectories may be deleterious for training the action-value and policy functions. To alleviate such problem, this paper proposes to adaptively reweight the imaginary transitions, so as to reduce the negative effects of poorly generated trajectories. More specifically, we evaluate the effect of an imaginary transition by calculating the change of the loss computed on the real samples when we use the transition to train the action-value and policy functions. Based on this evaluation criterion, we construct the idea of reweighting each imaginary transition by a well-designed meta-gradient algorithm. Extensive experimental results demonstrate that our method outperforms state-of-the-art model-based and model-free RL algorithms on multiple tasks. Visualization of our changing weights further validates the necessity of utilizing reweight scheme.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2018

Temporal Difference Models: Model-Free Deep RL for Model-Based Control

Model-free reinforcement learning (RL) is a powerful, general tool for l...
research
11/30/2020

Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

Although deep reinforcement learning (RL) has been successfully applied ...
research
05/28/2020

Domain Knowledge Integration By Gradient Matching For Sample-Efficient Reinforcement Learning

Model-free deep reinforcement learning (RL) agents can learn an effectiv...
research
06/14/2023

Off-policy Evaluation in Doubly Inhomogeneous Environments

This work aims to study off-policy evaluation (OPE) under scenarios wher...
research
08/30/2022

Model-Based Reinforcement Learning with SINDy

We draw on the latest advancements in the physics community to propose a...
research
12/24/2020

Assured RL: Reinforcement Learning with Almost Sure Constraints

We consider the problem of finding optimal policies for a Markov Decisio...
research
07/19/2013

Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation

The goal of reinforcement learning (RL) is to let an agent learn an opti...

Please sign up or login with your details

Forgot password? Click here to reset