Active Trajectory Estimation for Partially Observed Markov Decision Processes via Conditional Entropy

04/04/2021
by   Timothy L. Molloy, et al.
0

In this paper, we consider the problem of controlling a partially observed Markov decision process (POMDP) in order to actively estimate its state trajectory over a fixed horizon with minimal uncertainty. We pose a novel active smoothing problem in which the objective is to directly minimise the smoother entropy, that is, the conditional entropy of the (joint) state trajectory distribution of concern in fixed-interval Bayesian smoothing. Our formulation contrasts with prior active approaches that minimise the sum of conditional entropies of the (marginal) state estimates provided by Bayesian filters. By establishing a novel form of the smoother entropy in terms of the POMDP belief (or information) state, we show that our active smoothing problem can be reformulated as a (fully observed) Markov decision process with a value function that is concave in the belief state. The concavity of the value function is of particular importance since it enables the approximate solution of our active smoothing problem using piecewise-linear function approximations in conjunction with standard POMDP solvers. We illustrate the approximate solution of our active smoothing problem in simulation and compare its performance to alternative approaches based on minimising marginal state estimate uncertainties.

READ FULL TEXT
research
08/19/2021

Smoother Entropy for Active State Trajectory Estimation and Obfuscation in POMDPs

We study the problem of controlling a partially observed Markov decision...
research
03/23/2021

Smoothing-Averse Control: Covertness and Privacy from Smoothers

In this paper we investigate the problem of controlling a partially obse...
research
12/22/2021

Entropy-Regularized Partially Observed Markov Decision Processes

We investigate partially observed Markov decision processes (POMDPs) wit...
research
10/15/2018

Successor Uncertainties: exploration and uncertainty in temporal difference learning

We consider the problem of balancing exploration and exploitation in seq...
research
08/21/2021

Sequential Stochastic Optimization in Separable Learning Environments

We consider a class of sequential decision-making problems under uncerta...
research
06/27/2023

An analytical model of active inference in the Iterated Prisoner's Dilemma

This paper addresses a mathematically tractable model of the Prisoner's ...
research
07/04/2012

Efficient Test Selection in Active Diagnosis via Entropy Approximation

We consider the problem of diagnosing faults in a system represented by ...

Please sign up or login with your details

Forgot password? Click here to reset