Agents Explore the Environment Beyond Good Actions to Improve Their Model for Better Decisions

06/06/2023

∙

Improving the decision-making capabilities of agents is a key challenge on the road to artificial intelligence. To improve the planning skills needed to make good decisions, MuZero's agent combines prediction by a network model and planning by a tree search using the predictions. MuZero's learning process can fail when predictions are poor but planning requires them. We use this as an impetus to get the agent to explore parts of the decision tree in the environment that it otherwise would not explore. The agent achieves this, first by normal planning to come up with an improved policy. Second, it randomly deviates from this policy at the beginning of each training episode. And third, it switches back to the improved policy at a random time step to experience the rewards from the environment associated with the improved policy, which is the basis for learning the correct value expectation. The simple board game Tic-Tac-Toe is used to illustrate how this approach can improve the agent's decision-making ability. The source code, written entirely in Java, is available at https://github.com/enpasos/muzero.

READ FULL TEXT

Agents Explore the Environment Beyond Good Actions to Improve Their Model for Better Decisions

Dual policy as self-model for planning

Decision Making in Non-Stationary Environments with Policy-Augmented Monte Carlo Tree Search

CAMMARL: Conformal Action Modeling in Multi Agent Reinforcement Learning

Adaptive Sampling using POMDPs with Domain-Specific Considerations

PackIt: A Virtual Environment for Geometric Planning

A simple method for decision making in robocup soccer simulation 3d environment

Introspective Agents: Confidence Measures for General Value Functions

Agents Explore the Environment Beyond Good Actions to Improve Their Model for Better Decisions

Related Research

Dual policy as self-model for planning

Decision Making in Non-Stationary Environments with Policy-Augmented Monte Carlo Tree Search

CAMMARL: Conformal Action Modeling in Multi Agent Reinforcement Learning

Adaptive Sampling using POMDPs with Domain-Specific Considerations

PackIt: A Virtual Environment for Geometric Planning

A simple method for decision making in robocup soccer simulation 3d environment

Introspective Agents: Confidence Measures for General Value Functions