Thompson Sampling for Parameterized Markov Decision Processes with Uninformative Actions

05/13/2023
by   Michael Gimelfarb, et al.
0

We study parameterized MDPs (PMDPs) in which the key parameters of interest are unknown and must be learned using Bayesian inference. One key defining feature of such models is the presence of "uninformative" actions that provide no information about the unknown parameters. We contribute a set of assumptions for PMDPs under which Thompson sampling guarantees an asymptotically optimal expected regret bound of O(T^-1), which are easily verified for many classes of problems such as queuing, inventory control, and dynamic pricing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/29/2014

Thompson Sampling for Learning Parameterized Markov Decision Processes

We consider reinforcement learning in parameterized Markov Decision Proc...
research
09/05/2015

Reinforcement Learning with Parameterized Actions

We introduce a model-free algorithm for learning in Markov decision proc...
research
03/22/2023

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

A central task in control theory, artificial intelligence, and formal me...
research
02/08/2023

Learning How to Infer Partial MDPs for In-Context Adaptation and Exploration

To generalize across tasks, an agent should acquire knowledge from past ...
research
07/01/2016

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

Computational results demonstrate that posterior sampling for reinforcem...
research
10/19/2021

Stateful Offline Contextual Policy Evaluation and Learning

We study off-policy evaluation and learning from sequential data in a st...
research
06/11/2021

Safe Reinforcement Learning with Linear Function Approximation

Safety in reinforcement learning has become increasingly important in re...

Please sign up or login with your details

Forgot password? Click here to reset