Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy

07/25/2022
by   Xiyao Wang, et al.
9

Model-based reinforcement learning (RL) achieves higher sample efficiency in practice than model-free RL by learning a dynamics model to generate samples for policy learning. Previous works learn a "global" dynamics model to fit the state-action visitation distribution for all historical policies. However, in this paper, we find that learning a global dynamics model does not necessarily benefit model prediction for the current policy since the policy in use is constantly evolving. The evolving policy during training will cause state-action visitation distribution shifts. We theoretically analyze how the distribution of historical policies affects the model learning and model rollouts. We then propose a novel model-based RL method, named Policy-adaptation Model-based Actor-Critic (PMAC), which learns a policy-adapted dynamics model based on a policy-adaptation mechanism. This mechanism dynamically adjusts the historical policy mixture distribution to ensure the learned model can continually adapt to the state-action visitation distribution of the evolving policy. Experiments on a range of continuous control environments in MuJoCo show that PMAC achieves state-of-the-art asymptotic performance and almost two times higher sample efficiency than prior model-based methods.

READ FULL TEXT
research
10/10/2020

Trust the Model When It Is Confident: Masked Model-based Actor-Critic

It is a popular belief that model-based Reinforcement Learning (RL) is m...
research
06/09/2020

Variational Model-based Policy Optimization

Model-based reinforcement learning (RL) algorithms allow us to combine m...
research
06/05/2023

Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic

Learning high-quality Q-value functions plays a key role in the success ...
research
03/22/2021

Improving Actor-Critic Reinforcement Learning via Hamiltonian Policy

Approximating optimal policies in reinforcement learning (RL) is often n...
research
12/05/2022

Physics-Informed Model-Based Reinforcement Learning

We apply reinforcement learning (RL) to robotics. One of the drawbacks o...
research
08/23/2019

A Comparison of Action Spaces for Learning Manipulation Tasks

Designing reinforcement learning (RL) problems that can produce delicate...
research
09/15/2021

DROMO: Distributionally Robust Offline Model-based Policy Optimization

We consider the problem of offline reinforcement learning with model-bas...

Please sign up or login with your details

Forgot password? Click here to reset