Discretized Approximations for POMDP with Average Cost

07/11/2012
by   Huizhen Yu, et al.
0

In this paper, we propose a new lower approximation scheme for POMDP with discounted and average cost criterion. The approximating functions are determined by their values at a finite number of belief points, and can be computed efficiently using value iteration algorithms for finite-state MDP. While for discounted problems several lower approximation schemes have been proposed earlier, ours seems the first of its kind for average cost problems. We focus primarily on the average cost case, and we show that the corresponding approximation can be computed efficiently using multi-chain algorithms for finite-state MDP. We give a preliminary analysis showing that regardless of the existence of the optimal average cost J in the POMDP, the approximation obtained is a lower bound of the liminf optimal average cost function, and can also be used to calculate an upper bound on the limsup optimal average cost function, as well as bounds on the cost of executing the stationary policy associated with the approximation. Weshow the convergence of the cost approximation, when the optimal average cost is constant and the optimal differential cost is continuous.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2021

Tight Lower Bound for Average Number of Terms in Optimal Double-base Number System

We show in this note that the average number of terms in the optimal dou...
research
12/01/2022

Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP

This work considers the sample complexity of obtaining an ε-optimal poli...
research
10/25/2021

Learning Stochastic Shortest Path with Linear Function Approximation

We study the stochastic shortest path (SSP) problem in reinforcement lea...
research
05/04/2021

Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

We propose two algorithms for episodic stochastic shortest path problems...
research
05/23/2018

Representation Balancing MDPs for Off-Policy Policy Evaluation

We study the problem of off-policy policy evaluation (OPPE) in RL. In co...
research
12/30/2019

Optimal Uncertainty-guided Neural Network Training

The neural network (NN)-based direct uncertainty quantification (UQ) met...
research
08/02/2023

Certified Multi-Fidelity Zeroth-Order Optimization

We consider the problem of multi-fidelity zeroth-order optimization, whe...

Please sign up or login with your details

Forgot password? Click here to reset