Heuristic Search Value Iteration for POMDPs

07/11/2012
by   Trey Smith, et al.
0

We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI).HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy. HSVI gets its power by combining two well-known techniques: attention-focusing search heuristics and piecewise linear convex representations of the value function. HSVI's soundness and convergence have been proven. On some benchmark problems from the literature, HSVI displays speedups of greater than 100 with respect to other state-of-the-art POMDP value iteration algorithms. We also apply HSVI to a new rover exploration problem 10 times larger than most POMDP problems in the literature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2013

Solving POMDPs by Searching in Policy Space

Most algorithms for solving POMDPs iteratively improve a value function ...
research
11/25/2022

Operator Splitting Value Iteration

We introduce new planning and reinforcement learning algorithms for disc...
research
07/04/2012

Point-Based POMDP Algorithms: Improved Analysis and Implementation

Existing complexity bounds for point-based POMDP value iteration algorit...
research
06/06/2019

A novel approach to model exploration for value function learning

Planning and Learning are complementary approaches. Planning relies on d...
research
01/28/2022

Planning and Learning with Adaptive Lookahead

The classical Policy Iteration (PI) algorithm alternates between greedy ...
research
08/27/2019

Exploration-Enhanced POLITEX

We study algorithms for average-cost reinforcement learning problems wit...
research
03/19/2023

Going faster to see further: GPU-accelerated value iteration and simulation for perishable inventory control using JAX

Value iteration can find the optimal replenishment policy for a perishab...

Please sign up or login with your details

Forgot password? Click here to reset