Dual policy as self-model for planning

06/07/2023
by   Jaesung Yoo, et al.
0

Planning is a data efficient decision-making strategy where an agent selects candidate actions by exploring possible future states. To simulate future states when there is a high-dimensional action space, the knowledge of one's decision making strategy must be used to limit the number of actions to be explored. We refer to the model used to simulate one's decisions as the agent's self-model. While self-models are implicitly used widely in conjunction with world models to plan actions, it remains unclear how self-models should be designed. Inspired by current reinforcement learning approaches and neuroscience, we explore the benefits and limitations of using a distilled policy network as the self-model. In such dual-policy agents, a model-free policy and a distilled policy are used for model-free actions and planned actions, respectively. Our results on a ecologically relevant, parametric environment indicate that distilled policy network for self-model stabilizes training, has faster inference than using model-free policy, promotes better exploration, and could learn a comprehensive understanding of its own behaviors, at the cost of distilling a new network apart from the model-free policy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2017

Learning model-based planning from scratch

Conventional wisdom holds that model-based planning is a powerful approa...
research
06/06/2023

Agents Explore the Environment Beyond Good Actions to Improve Their Model for Better Decisions

Improving the decision-making capabilities of agents is a key challenge ...
research
08/14/2023

DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation

VLN-CE is a recently released embodied task, where AI agents need to nav...
research
01/24/2022

Generative Planning for Temporally Coordinated Exploration in Reinforcement Learning

Standard model-free reinforcement learning algorithms optimize a policy ...
research
01/20/2011

Dyna-H: a heuristic planning reinforcement learning algorithm applied to role-playing-game strategy decision systems

In a Role-Playing Game, finding optimal trajectories is one of the most ...
research
04/13/2018

MOVI: A Model-Free Approach to Dynamic Fleet Management

Modern vehicle fleets, e.g., for ridesharing platforms and taxi companie...
research
11/23/2020

Evolutionary Planning in Latent Space

Planning is a powerful approach to reinforcement learning with several d...

Please sign up or login with your details

Forgot password? Click here to reset