A Minimum Relative Entropy Controller for Undiscounted Markov Decision Processes

by   Pedro A. Ortega, et al.

Adaptive control problems are notoriously difficult to solve even in the presence of plant-specific controllers. One way to by-pass the intractable computation of the optimal policy is to restate the adaptive control as the minimization of the relative entropy of a controller that ignores the true plant dynamics from an informed controller. The solution is given by the Bayesian control rule-a set of equations characterizing a stochastic adaptive controller for the class of possible plant dynamics. Here, the Bayesian control rule is applied to derive BCR-MDP, a controller to solve undiscounted Markov decision processes with finite state and action spaces and unknown dynamics. In particular, we derive a non-parametric conjugate prior distribution over the policy space that encapsulates the agent's whole relevant history and we present a Gibbs sampler to draw random policies from this distribution. Preliminary results show that BCR-MDP successfully avoids sub-optimal limit cycles due to its built-in mechanism to balance exploration versus exploitation.


page 1

page 2

page 3

page 4


Answer Set Programming for Non-Stationary Markov Decision Processes

Non-stationary domains, where unforeseen changes happen, present a chall...

Convergence of Bayesian Control Rule

Recently, new approaches to adaptive control have sought to reformulate ...

Adaptive Shielding under Uncertainty

This paper targets control problems that exhibit specific safety and per...

Robust Entropy-regularized Markov Decision Processes

Stochastic and soft optimal policies resulting from entropy-regularized ...

Robust Asymmetric Learning in POMDPs

Policies for partially observed Markov decision processes can be efficie...

Stochastic Control with Stale Information--Part I: Fully Observable Systems

In this study, we adopt age of information as a measure of the staleness...

Remarks on Bayesian Control Charts

There is a considerable amount of ongoing research on the use of Bayesia...