A Minimum Relative Entropy Controller for Undiscounted Markov Decision Processes

02/07/2010
by   Pedro A. Ortega, et al.
0

Adaptive control problems are notoriously difficult to solve even in the presence of plant-specific controllers. One way to by-pass the intractable computation of the optimal policy is to restate the adaptive control as the minimization of the relative entropy of a controller that ignores the true plant dynamics from an informed controller. The solution is given by the Bayesian control rule-a set of equations characterizing a stochastic adaptive controller for the class of possible plant dynamics. Here, the Bayesian control rule is applied to derive BCR-MDP, a controller to solve undiscounted Markov decision processes with finite state and action spaces and unknown dynamics. In particular, we derive a non-parametric conjugate prior distribution over the policy space that encapsulates the agent's whole relevant history and we present a Gibbs sampler to draw random policies from this distribution. Preliminary results show that BCR-MDP successfully avoids sub-optimal limit cycles due to its built-in mechanism to balance exploration versus exploitation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2023

Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space

Models of many real-life applications, such as queuing models of communi...
research
02/16/2010

Convergence of Bayesian Control Rule

Recently, new approaches to adaptive control have sought to reformulate ...
research
10/08/2020

Adaptive Shielding under Uncertainty

This paper targets control problems that exhibit specific safety and per...
research
08/03/2022

Bayesian regularization of empirical MDPs

In most applications of model-based Markov decision processes, the param...
research
11/06/2022

On learning history based policies for controlling Markov decision processes

Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapprox...
research
12/07/2017

Remarks on Bayesian Control Charts

There is a considerable amount of ongoing research on the use of Bayesia...
research
10/25/2018

Stochastic Control with Stale Information--Part I: Fully Observable Systems

In this study, we adopt age of information as a measure of the staleness...

Please sign up or login with your details

Forgot password? Click here to reset