Learning Good Interventions in Causal Graphs via Covering
We study the causal bandit problem that entails identifying a near-optimal intervention from a specified set A of (possibly non-atomic) interventions over a given causal graph. Here, an optimal intervention in A is one that maximizes the expected value for a designated reward variable in the graph, and we use the standard notion of simple regret to quantify near optimality. Considering Bernoulli random variables and for causal graphs on N vertices with constant in-degree, prior work has achieved a worst case guarantee of O (N/√(T)) for simple regret. The current work utilizes the idea of covering interventions (which are not necessarily contained within A) and establishes a simple regret guarantee of O(√(N/T)). Notably, and in contrast to prior work, our simple regret bound depends only on explicit parameters of the problem instance. We also go beyond prior work and achieve a simple regret guarantee for causal graphs with unobserved variables. Further, we perform experiments to show improvements over baselines in this setting.
READ FULL TEXT