Agents Explore the Environment Beyond Good Actions to Improve Their Model for Better Decisions

06/06/2023
by   Matthias Unverzagt, et al.
0

Improving the decision-making capabilities of agents is a key challenge on the road to artificial intelligence. To improve the planning skills needed to make good decisions, MuZero's agent combines prediction by a network model and planning by a tree search using the predictions. MuZero's learning process can fail when predictions are poor but planning requires them. We use this as an impetus to get the agent to explore parts of the decision tree in the environment that it otherwise would not explore. The agent achieves this, first by normal planning to come up with an improved policy. Second, it randomly deviates from this policy at the beginning of each training episode. And third, it switches back to the improved policy at a random time step to experience the rewards from the environment associated with the improved policy, which is the basis for learning the correct value expectation. The simple board game Tic-Tac-Toe is used to illustrate how this approach can improve the agent's decision-making ability. The source code, written entirely in Java, is available at https://github.com/enpasos/muzero.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2023

Dual policy as self-model for planning

Planning is a data efficient decision-making strategy where an agent sel...
research
02/25/2022

Decision Making in Non-Stationary Environments with Policy-Augmented Monte Carlo Tree Search

Decision-making under uncertainty (DMU) is present in many important pro...
research
06/19/2023

CAMMARL: Conformal Action Modeling in Multi Agent Reinforcement Learning

Before taking actions in an environment with more than one intelligent a...
research
09/23/2021

Adaptive Sampling using POMDPs with Domain-Specific Considerations

We investigate improving Monte Carlo Tree Search based solvers for Parti...
research
07/21/2020

PackIt: A Virtual Environment for Geometric Planning

The ability to jointly understand the geometry of objects and plan actio...
research
12/07/2012

A simple method for decision making in robocup soccer simulation 3d environment

In this paper new hierarchical hybrid fuzzy-crisp methods for decision m...
research
06/17/2016

Introspective Agents: Confidence Measures for General Value Functions

Agents of general intelligence deployed in real-world scenarios must ada...

Please sign up or login with your details

Forgot password? Click here to reset