Measurable Monte Carlo Search Error Bounds

06/08/2021
by   John Mern, et al.
0

Monte Carlo planners can often return sub-optimal actions, even if they are guaranteed to converge in the limit of infinite samples. Known asymptotic regret bounds do not provide any way to measure confidence of a recommended action at the conclusion of search. In this work, we prove bounds on the sub-optimality of Monte Carlo estimates for non-stationary bandits and Markov decision processes. These bounds can be directly computed at the conclusion of the search and do not require knowledge of the true action-value. The presented bound holds for general Monte Carlo solvers meeting mild convergence conditions. We empirically test the tightness of the bounds through experiments on a multi-armed bandit and a discrete Markov decision process for both a simple solver and Monte Carlo tree search.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2020

Simulation Based Algorithms for Markov Decision Processes and Multi-Action Restless Bandits

We consider multi-dimensional Markov decision processes and formulate a ...
research
09/10/2018

Monte Carlo Tree Search for Verifying Reachability in Markov Decision Processes

The maximum reachability probabilities in a Markov decision process can ...
research
03/09/2020

Convex Hull Monte-Carlo Tree Search

This work investigates Monte-Carlo planning for agents in stochastic env...
research
06/08/2020

POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis

Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), ...
research
11/01/2019

Generalized Mean Estimation in Monte-Carlo Tree Search

We consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision Pr...
research
04/20/2017

Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds

Monte Carlo Tree Search (MCTS), most famously used in game-play artifici...
research
08/09/2014

Selecting Computations: Theory and Applications

Sequential decision problems are often approximately solvable by simulat...

Please sign up or login with your details

Forgot password? Click here to reset