C-MCTS: Safe Planning with Monte Carlo Tree Search

05/25/2023
by   Dinesh Parthasarathy, et al.
0

Many real-world decision-making tasks, such as safety-critical scenarios, cannot be fully described in a single-objective setting using the Markov Decision Process (MDP) framework, as they include hard constraints. These can instead be modeled with additional cost functions within the Constrained Markov Decision Process (CMDP) framework. Even though CMDPs have been extensively studied in the Reinforcement Learning literature, little attention has been given to sampling-based planning algorithms such as MCTS for solving them. Previous approaches use Monte Carlo cost estimates to avoid constraint violations. However, these suffer from high variance which results in conservative performance with respect to costs. We propose Constrained MCTS (C-MCTS), an algorithm that estimates cost using a safety critic. The safety critic training is based on Temporal Difference learning in an offline phase prior to agent deployment. This critic limits the exploration of the search tree and removes unsafe trajectories within MCTS during deployment. C-MCTS satisfies cost constraints but operates closer to the constraint boundary, achieving higher rewards compared to previous work. As a nice byproduct, the planner is more efficient requiring fewer planning steps. Most importantly, we show that under model mismatch between the planner and the real world, our approach is less susceptible to cost violations than previous work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/21/2022

Feature Acquisition using Monte Carlo Tree Search

Feature acquisition algorithms address the problem of acquiring informat...
research
05/20/2018

A Lyapunov-based Approach to Safe Reinforcement Learning

In many real-world reinforcement learning (RL) problems, besides optimiz...
research
10/02/2022

Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation

We address the problem of safe reinforcement learning from pixel observa...
research
12/23/2022

Online Planning for Constrained POMDPs with Continuous Spaces through Dual Ascent

Rather than augmenting rewards with penalties for undesired behavior, Co...
research
07/30/2021

An Extensible and Modular Design and Implementation of Monte Carlo Tree Search for the JVM

Flexible implementations of Monte Carlo Tree Search (MCTS), combined wit...
research
02/07/2020

Safe Wasserstein Constrained Deep Q-Learning

This paper presents a distributionally robust Q-Learning algorithm (DrQ)...
research
02/12/2015

Monte Carlo Planning method estimates planning horizons during interactive social exchange

Reciprocating interactions represent a central feature of all human exch...

Please sign up or login with your details

Forgot password? Click here to reset