DeepAI AI Chat
Log In Sign Up

Heuristic Search Value Iteration for POMDPs

07/11/2012
by   Trey Smith, et al.
0

We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI).HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy. HSVI gets its power by combining two well-known techniques: attention-focusing search heuristics and piecewise linear convex representations of the value function. HSVI's soundness and convergence have been proven. On some benchmark problems from the literature, HSVI displays speedups of greater than 100 with respect to other state-of-the-art POMDP value iteration algorithms. We also apply HSVI to a new rover exploration problem 10 times larger than most POMDP problems in the literature.

READ FULL TEXT

page 1

page 2

page 3

page 4

01/30/2013

Solving POMDPs by Searching in Policy Space

Most algorithms for solving POMDPs iteratively improve a value function ...
11/25/2022

Operator Splitting Value Iteration

We introduce new planning and reinforcement learning algorithms for disc...
07/04/2012

Point-Based POMDP Algorithms: Improved Analysis and Implementation

Existing complexity bounds for point-based POMDP value iteration algorit...
06/06/2019

A novel approach to model exploration for value function learning

Planning and Learning are complementary approaches. Planning relies on d...
01/28/2022

Planning and Learning with Adaptive Lookahead

The classical Policy Iteration (PI) algorithm alternates between greedy ...
07/04/2022

Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions

Value iteration (VI) is a foundational dynamic programming method, impor...
01/27/2023

Policy-Value Alignment and Robustness in Search-based Multi-Agent Learning

Large-scale AI systems that combine search and learning have reached sup...