Bayesian regularization of empirical MDPs

08/03/2022
by   Samarth Gupta, et al.
6

In most applications of model-based Markov decision processes, the parameters for the unknown underlying model are often estimated from the empirical data. Due to noise, the policy learnedfrom the estimated model is often far from the optimal policy of the underlying model. When applied to the environment of the underlying model, the learned policy results in suboptimal performance, thus calling for solutions with better generalization performance. In this work we take a Bayesian perspective and regularize the objective function of the Markov decision process with prior information in order to obtain more robust policies. Two approaches are proposed, one based on L^1 regularization and the other on relative entropic regularization. We evaluate our proposed algorithms on synthetic simulations and on real-world search logs of a large scale online shopping store. Our results demonstrate the robustness of regularized MDP policies against the noise present in the models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/01/2016

Optimizing Quantiles in Preference-based Markov Decision Processes

In the Markov decision process model, policies are usually evaluated by ...
research
02/02/2021

Stability-Constrained Markov Decision Processes Using MPC

In this paper, we consider solving discounted Markov Decision Processes ...
research
02/18/2022

A mixed-integer programming model for identifying intuitive ambulance dispatching policies

Markov decision process models and algorithms can be used to identify op...
research
09/15/2021

Synthesizing Policies That Account For Human Execution Errors Caused By State-Aliasing In Markov Decision Processes

When humans are given a policy to execute, there can be policy execution...
research
12/21/2020

Universal Policies for Software-Defined MDPs

We introduce a new programming paradigm called oracle-guided decision pr...
research
01/21/2017

Learning Policies for Markov Decision Processes from Data

We consider the problem of learning a policy for a Markov decision proce...
research
02/07/2010

A Minimum Relative Entropy Controller for Undiscounted Markov Decision Processes

Adaptive control problems are notoriously difficult to solve even in the...

Please sign up or login with your details

Forgot password? Click here to reset