Decision Making in Non-Stationary Environments with Policy-Augmented Monte Carlo Tree Search

02/25/2022
by   Geoffrey Pettet, et al.
1

Decision-making under uncertainty (DMU) is present in many important problems. An open challenge is DMU in non-stationary environments, where the dynamics of the environment can change over time. Reinforcement Learning (RL), a popular approach for DMU problems, learns a policy by interacting with a model of the environment offline. Unfortunately, if the environment changes the policy can become stale and take sub-optimal actions, and relearning the policy for the updated environment takes time and computational effort. An alternative is online planning approaches such as Monte Carlo Tree Search (MCTS), which perform their computation at decision time. Given the current environment, MCTS plans using high-fidelity models to determine promising action trajectories. These models can be updated as soon as environmental changes are detected to immediately incorporate them into decision making. However, MCTS's convergence can be slow for domains with large state-action spaces. In this paper, we present a novel hybrid decision-making approach that combines the strengths of RL and planning while mitigating their weaknesses. Our approach, called Policy Augmented MCTS (PA-MCTS), integrates a policy's actin-value estimates into MCTS, using the estimates to seed the action trajectories favored by the search. We hypothesize that PA-MCTS will converge more quickly than standard MCTS while making better decisions than the policy can make on its own when faced with nonstationary environments. We test our hypothesis by comparing PA-MCTS with pure MCTS and an RL agent applied to the classical CartPole environment. We find that PC-MCTS can achieve higher cumulative rewards than the policy in isolation under several environmental shifts while converging in significantly fewer iterations than pure MCTS.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/06/2019

Combining Planning and Deep Reinforcement Learning in Tactical Decision Making for Autonomous Driving

Tactical decision making for autonomous driving is challenging due to th...
research
10/04/2022

Continuous Monte Carlo Graph Search

In many complex sequential decision making tasks, online planning is cru...
research
06/06/2023

Agents Explore the Environment Beyond Good Actions to Improve Their Model for Better Decisions

Improving the decision-making capabilities of agents is a key challenge ...
research
03/14/2022

Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies

Human decision making is well known to be imperfect and the ability to a...
research
06/15/2018

Sample-Efficient Deep RL with Generative Adversarial Tree Search

We propose Generative Adversarial Tree Search (GATS), a sample-efficient...
research
04/26/2021

Universal Off-Policy Evaluation

When faced with sequential decision-making problems, it is often useful ...
research
01/24/2023

Off-Policy Evaluation for Action-Dependent Non-Stationary Environments

Methods for sequential decision-making are often built upon a foundation...

Please sign up or login with your details

Forgot password? Click here to reset