The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs

10/14/2021
by   Johannes Müller, et al.
0

We consider the problem of finding the best memoryless stochastic policy for an infinite-horizon partially observable Markov decision process (POMDP) with finite state and action spaces with respect to either the discounted or mean reward criterion. We show that the (discounted) state-action frequencies and the expected cumulative reward are rational functions of the policy, whereby the degree is determined by the degree of partial observability. We then describe the optimization problem as a linear optimization problem in the space of feasible state-action frequencies subject to polynomial constraints that we characterize explicitly. This allows us to address the combinatorial and geometric complexity of the optimization problem using recent tools from polynomial optimization. In particular, we demonstrate how the partial observability constraints can lead to multiple smooth and non-smooth local optimizers and we estimate the number of critical points.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2022

Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space

Reward optimization in fully observable Markov decision processes is equ...
research
07/23/2020

Batch Policy Learning in Average Reward Markov Decision Processes

We consider the batch (off-line) policy learning problem in the infinite...
research
01/16/2013

PEGASUS: A Policy Search Method for Large MDPs and POMDPs

We propose a new approach to the problem of searching a space of policie...
research
04/11/2022

Towards Painless Policy Optimization for Constrained MDPs

We study policy optimization in an infinite horizon, γ-discounted constr...
research
05/16/2022

Efficient Algorithms for Planning with Participation Constraints

We consider the problem of planning with participation constraints intro...
research
02/14/2012

Efficient Inference in Markov Control Problems

Markov control algorithms that perform smooth, non-greedy updates of the...
research
06/03/2011

Experiments with Infinite-Horizon, Policy-Gradient Estimation

In this paper, we present algorithms that perform gradient ascent of the...

Please sign up or login with your details

Forgot password? Click here to reset