Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits

10/23/2021
by   Hongju Park, et al.
0

Contextual multi-armed bandits are classical models in reinforcement learning for sequential decision-making associated with individual information. A widely-used policy for bandits is Thompson Sampling, where samples from a data-driven probabilistic belief about unknown parameters are used to select the control actions. For this computationally fast algorithm, performance analyses are available under full context-observations. However, little is known for problems that contexts are not fully observed. We propose a Thompson Sampling algorithm for partially observable contextual multi-armed bandits, and establish theoretical performance guarantees. Technically, we show that the regret of the presented policy scales logarithmically with time and the number of arms, and linearly with the dimension. Further, we establish rates of learning unknown parameters, and provide illustrative numerical analyses.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2020

Greedy Bandits with Sampled Context

Bayesian strategies for contextual bandits have proved promising in sing...
research
07/10/2019

Productization Challenges of Contextual Multi-Armed Bandits

Contextual Multi-Armed Bandits is a well-known and accepted online optim...
research
06/15/2023

Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning

Thompson sampling (TS) is widely used in sequential decision making due ...
research
05/25/2018

Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic Programming

We design a new myopic strategy for a wide class of sequential design of...
research
02/02/2022

Efficient Algorithms for Learning to Control Bandits with Unobserved Contexts

Contextual bandits are widely-used in the study of learning-based contro...
research
09/19/2022

Active Inference for Autonomous Decision-Making with Contextual Multi-Armed Bandits

In autonomous robotic decision-making under uncertainty, the tradeoff be...
research
03/18/2022

Approximate Function Evaluation via Multi-Armed Bandits

We study the problem of estimating the value of a known smooth function ...

Please sign up or login with your details

Forgot password? Click here to reset