Thinker: Learning to Plan and Act

07/27/2023
by   Stephen Chung, et al.
0

We propose the Thinker algorithm, a novel approach that enables reinforcement learning agents to autonomously interact with and utilize a learned world model. The Thinker algorithm wraps the environment with a world model and introduces new actions designed for interacting with the world model. These model-interaction actions enable agents to perform planning by proposing alternative plans to the world model before selecting a final action to execute in the environment. This approach eliminates the need for hand-crafted planning algorithms by enabling the agent to learn how to plan autonomously and allows for easy interpretation of the agent's plan with visualization. We demonstrate the algorithm's effectiveness through experimental results in the game of Sokoban and the Atari 2600 benchmark, where the Thinker algorithm achieves state-of-the-art performance and competitive results, respectively. Visualizations of agents trained with the Thinker algorithm demonstrate that they have learned to plan effectively with the world model to select better actions. The algorithm's generality opens a new research direction on how a world model can be used in reinforcement learning and how planning can be seamlessly integrated into an agent's decision-making process.

READ FULL TEXT
research
07/19/2017

Learning model-based planning from scratch

Conventional wisdom holds that model-based planning is a powerful approa...
research
06/26/2020

What can I do here? A Theory of Affordances in Reinforcement Learning

Reinforcement learning algorithms usually assume that all actions are al...
research
07/30/2020

Moody Learners – Explaining Competitive Behaviour of Reinforcement Learning Agents

Designing the decision-making processes of artificial agents that are in...
research
03/07/2022

Self-directed Learning of Action Models using Exploratory Planning

Complex, real-world domains may not be fully modeled for an agent, espec...
research
08/14/2023

Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents

Accomplishing household tasks requires to plan step-by-step actions cons...
research
12/13/2019

Long-Term Planning and Situational Awareness in OpenAI Five

Understanding how knowledge about the world is represented within model-...
research
09/27/2017

The detour problem in a stochastic environment: Tolman revisited

We designed a grid world task to study human planning and re-planning be...

Please sign up or login with your details

Forgot password? Click here to reset