Robust and Adaptive Planning under Model Uncertainty

01/09/2019
by   Apoorva Sharma, et al.
0

Planning under model uncertainty is a fundamental problem across many applications of decision making and learning. In this paper, we propose the Robust Adaptive Monte Carlo Planning (RAMCP) algorithm, which allows computation of risk-sensitive Bayes-adaptive policies that optimally trade off exploration, exploitation, and robustness. RAMCP formulates the risk-sensitive planning problem as a two-player zero-sum game, in which an adversary perturbs the agent's belief over the models. We introduce two versions of the RAMCP algorithm. The first, RAMCP-F, converges to an optimal risk-sensitive policy without having to rebuild the search tree as the underlying belief over models is perturbed. The second version, RAMCP-I, improves computational efficiency at the cost of losing theoretical guarantees, but is shown to yield empirical results comparable to RAMCP-F. RAMCP is demonstrated on an n-pull multi-armed bandit problem, as well as a patient treatment scenario.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2023

Scale-Adaptive Balancing of Exploration and Exploitation in Classical Planning

Balancing exploration and exploitation has been an important problem in ...
research
02/10/2021

Risk-Averse Bayes-Adaptive Reinforcement Learning

In this work, we address risk-averse Bayesadaptive reinforcement learnin...
research
05/14/2021

Thompson Sampling for Gaussian Entropic Risk Bandits

The multi-armed bandit (MAB) problem is a ubiquitous decision-making pro...
research
06/08/2020

POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis

Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), ...
research
02/14/2012

Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search

Bayes-optimal behavior, while well-defined, is often difficult to achiev...
research
07/25/2018

A Minimax Tree Based Approach for Minimizing Detectability and Maximizing Visibility

We introduce and study the problem of planning a trajectory for an agent...
research
05/30/2022

Adaptive Learning for Discovery

In this paper, we study a sequential decision-making problem, called Ada...

Please sign up or login with your details

Forgot password? Click here to reset