DeepAI AI Chat
Log In Sign Up

Heuristic Search Value Iteration for POMDPs

by   Trey Smith, et al.

We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI).HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy. HSVI gets its power by combining two well-known techniques: attention-focusing search heuristics and piecewise linear convex representations of the value function. HSVI's soundness and convergence have been proven. On some benchmark problems from the literature, HSVI displays speedups of greater than 100 with respect to other state-of-the-art POMDP value iteration algorithms. We also apply HSVI to a new rover exploration problem 10 times larger than most POMDP problems in the literature.


page 1

page 2

page 3

page 4


Solving POMDPs by Searching in Policy Space

Most algorithms for solving POMDPs iteratively improve a value function ...

Operator Splitting Value Iteration

We introduce new planning and reinforcement learning algorithms for disc...

Point-Based POMDP Algorithms: Improved Analysis and Implementation

Existing complexity bounds for point-based POMDP value iteration algorit...

A novel approach to model exploration for value function learning

Planning and Learning are complementary approaches. Planning relies on d...

Planning and Learning with Adaptive Lookahead

The classical Policy Iteration (PI) algorithm alternates between greedy ...

Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions

Value iteration (VI) is a foundational dynamic programming method, impor...

Policy-Value Alignment and Robustness in Search-based Multi-Agent Learning

Large-scale AI systems that combine search and learning have reached sup...