Causal Discovery for Causal Bandits utilizing Separating Sets

09/16/2020
by   Arnoud A. W. M. de Kroon, et al.
0

The Causal Bandit is a variant of the classic Bandit problem where an agent must identify the best action in a sequential decision-making process, where the reward distribution of the actions displays a non-trivial dependence structure that is governed by a causal model. All methods proposed thus far in the literature rely on exact prior knowledge of the causal model to obtain improved estimators for the reward. We formulate a new causal bandit algorithm that is the first to no longer rely on explicit prior causal knowledge and instead uses the output of causal discovery algorithms. This algorithm relies on a new estimator based on separating sets, a causal structure already known in causal discovery literature. We show that given a separating set, this estimator is unbiased, and has lower variance compared to the sample mean. We derive a concentration bound and construct a UCB-type algorithm based on this bound, as well as a Thompson sampling variant. We compare our algorithms with traditional bandit algorithms on simulation data. On these problems, our algorithms show a significant boost in performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/03/2021

Chronological Causal Bandits

This paper studies an instance of the multi-armed bandit (MAB) problem, ...
research
06/05/2021

Causal Bandits with Unknown Graph Structure

In causal bandit problems, the action set consists of interventions on v...
research
06/27/2020

Local Causal Structure Learning and its Discovery Between Type 2 Diabetes and Bone Mineral Density

Type 2 diabetes (T2DM), one of the most prevalent chronic diseases, affe...
research
01/26/2023

Causal Bandits without Graph Learning

We study the causal bandit problem when the causal graph is unknown and ...
research
03/07/2021

Hierarchical Causal Bandit

Causal bandit is a nascent learning model where an agent sequentially ex...
research
02/20/2021

Causal Policy Gradients

Policy gradient methods can solve complex tasks but often fail when the ...
research
08/07/2023

Provably Efficient Learning in Partially Observable Contextual Bandit

In this paper, we investigate transfer learning in partially observable ...

Please sign up or login with your details

Forgot password? Click here to reset