Action Selection for MDPs: Anytime AO* vs. UCT

09/26/2019
by   Blai Bonet, et al.
0

In the presence of non-admissible heuristics, A* and other best-first algorithms can be converted into anytime optimal algorithms over OR graphs, by simply continuing the search after the first solution is found. The same trick, however, does not work for best-first algorithms over AND/OR graphs, that must be able to expand leaf nodes of the explicit graph that are not necessarily part of the best partial solution. Anytime optimal variants of AO* must thus address an exploration-exploitation tradeoff: they cannot just "exploit", they must keep exploring as well. In this work, we develop one such variant of AO* and apply it to finite-horizon MDPs. This Anytime AO* algorithm eventually delivers an optimal policy while using non-admissible random heuristics that can be sampled, as when the heuristic is the cost of a base policy that can be sampled with rollouts. We then test Anytime AO* for action selection over large infinite-horizon MDPs that cannot be solved with existing off-line heuristic search and dynamic programming algorithms, and compare it with UCT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2022

On the Convergence of Monte Carlo UCB for Random-Length Episodic MDPs

In reinforcement learning, Monte Carlo algorithms update the Q function ...
research
06/01/2021

On-Line Policy Iteration for Infinite Horizon Dynamic Programming

In this paper we propose an on-line policy iteration (PI) algorithm for ...
research
01/03/2023

Faster Approximate Dynamic Programming by Freezing Slow States

We consider infinite horizon Markov decision processes (MDPs) with fast-...
research
01/23/2013

My Brain is Full: When More Memory Helps

We consider the problem of finding good finite-horizon policies for POMD...
research
02/27/2021

Parallel Stochastic Mirror Descent for MDPs

We consider the problem of learning the optimal policy for infinite-hori...
research
01/09/2023

Minimax Weight Learning for Absorbing MDPs

Reinforcement learning policy evaluation problems are often modeled as f...
research
04/19/2023

Bridging RL Theory and Practice with the Effective Horizon

Deep reinforcement learning (RL) works impressively in some environments...

Please sign up or login with your details

Forgot password? Click here to reset