Provably Efficient Learning in Partially Observable Contextual Bandit

08/07/2023
by   Xueping Gong, et al.
0

In this paper, we investigate transfer learning in partially observable contextual bandits, where agents have limited knowledge from other agents and partial information about hidden confounders. We first convert the problem to identifying or partially identifying causal effects between actions and rewards through optimization problems. To solve these optimization problems, we discretize the original functional constraints of unknown distributions into linear constraints, and sample compatible causal models via sequentially solving linear programmings to obtain causal bounds with the consideration of estimation error. Our sampling algorithms provide desirable convergence results for suitable sampling distributions. We then show how causal bounds can be applied to improving classical bandit algorithms and affect the regrets with respect to the size of action sets and function spaces. Notably, in the task with function approximation which allows us to handle general context distributions, our method improves the order dependence on function space size compared with previous literatures. We formally prove that our causally enhanced algorithms outperform classical bandit algorithms and achieve orders of magnitude faster convergence rates. Finally, we perform simulations that demonstrate the efficiency of our strategy compared to the current state-of-the-art methods. This research has the potential to enhance the performance of contextual bandit agents in real-world applications where data is scarce and costly to obtain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2020

Bandits with Partially Observable Offline Data

We study linear contextual bandits with access to a large, partially obs...
research
10/11/2019

Regret Analysis of Causal Bandit Problems

We study how to learn optimal interventions sequentially given causal in...
research
03/07/2021

Hierarchical Causal Bandit

Causal bandit is a nascent learning model where an agent sequentially ex...
research
11/02/2019

Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints

Recent advances in contextual bandit optimization and reinforcement lear...
research
09/16/2020

Causal Discovery for Causal Bandits utilizing Separating Sets

The Causal Bandit is a variant of the classic Bandit problem where an ag...
research
11/19/2017

Estimation Considerations in Contextual Bandits

Contextual bandit algorithms seek to learn a personalized treatment assi...
research
12/03/2021

Chronological Causal Bandits

This paper studies an instance of the multi-armed bandit (MAB) problem, ...

Please sign up or login with your details

Forgot password? Click here to reset