Adaptively Exploiting d-Separators with Causal Bandits

by   Blair Bilodeau, et al.

Multi-armed bandit problems provide a framework to identify the optimal intervention over a sequence of repeated experiments. Without additional assumptions, minimax optimal performance (measured by cumulative regret) is well-understood. With access to additional observed variables that d-separate the intervention from the outcome (i.e., they are a d-separator), recent causal bandit algorithms provably incur less regret. However, in practice it is desirable to be agnostic to whether observed variables are a d-separator. Ideally, an algorithm should be adaptive; that is, perform nearly as well as an algorithm with oracle knowledge of the presence or absence of a d-separator. In this work, we formalize and study this notion of adaptivity, and provide a novel algorithm that simultaneously achieves (a) optimal regret when a d-separator is observed, improving on classical minimax algorithms, and (b) significantly smaller regret than recent causal bandit algorithms when the observed variables are not a d-separator. Crucially, our algorithm does not require any oracle knowledge of whether a d-separator is observed. We also generalize this adaptivity to other conditions, such as the front-door criterion.


page 1

page 2

page 3

page 4


Bridging Adversarial and Nonstationary Multi-armed Bandit

In the multi-armed bandit framework, there are two formulations that are...

Causal Bandits on General Graphs

We study the problem of determining the best intervention in a Causal Ba...

Causal Bandits with Unknown Graph Structure

In causal bandit problems, the action set consists of interventions on v...

Hierarchical Causal Bandit

Causal bandit is a nascent learning model where an agent sequentially ex...

Optimal Stochastic Nonconvex Optimization with Bandit Feedback

In this paper, we analyze the continuous armed bandit problems for nonco...

On the Hardness of Inventory Management with Censored Demand Data

We consider a repeated newsvendor problem where the inventory manager ha...

Sparsity-Agnostic Lasso Bandit

We consider a stochastic contextual bandit problem where the dimension d...

Code Repositories


Code for our paper "Adaptively Exploiting d-Separators with Causal Bandits".

view repo