Learning Neural Causal Models with Active Interventions

09/06/2021
by   Nino Scherrer, et al.
7

Discovering causal structures from data is a challenging inference problem of fundamental importance in all areas of science. The appealing scaling properties of neural networks have recently led to a surge of interest in differentiable neural network-based methods for learning causal structures from data. So far differentiable causal discovery has focused on static datasets of observational or interventional origin. In this work, we introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process. Our method significantly reduces the required number of interactions compared with random intervention targeting and is applicable for both discrete and continuous optimization formulations of learning the underlying directed acyclic graph (DAG) from data. We examine the proposed method across a wide range of settings and demonstrate superior performance on multiple benchmarks from simulated to real-world data.

READ FULL TEXT VIEW PDF

Authors

page 10

page 23

page 25

page 26

03/04/2022

Differentiable Causal Discovery Under Latent Interventions

Recent work has shown promising results in causal discovery by leveragin...
07/03/2020

Differentiable Causal Discovery from Interventional Data

Discovering causal relationships in data is a challenging task that invo...
06/03/2022

BaCaDI: Bayesian Causal Discovery with Unknown Interventions

Learning causal structures from observation and experimentation is a cen...
11/18/2021

CCSL: A Causal Structure Learning Method from Multiple Unknown Environments

Most existing causal structure learning methods require data to be indep...
04/11/2022

Learning to Induce Causal Structure

The fundamental challenge in causal induction is to infer the underlying...
08/19/2021

Structure Learning for Directed Trees

Knowing the causal structure of a system is of fundamental interest in m...
01/10/2013

Causal Discovery from Changes

We propose a new method of discovering causal structures, based on the d...

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Learning causal structure from data is a challenging but important task that lies at the heart of scientific reasoning and accompanying progress in many disciplines (sachs2005causal; hill2016inferring; lauritzen1988local; korb2010bayesian). While there exists a plethora of methods for the task, computationally and statistically more efficient algorithms are highly desired (heinze2018causal)

. As a result, there has been a surge in interest in differentiable structure learning and the combination of deep learning and causal inference

(scholkopf2021toward). Such methods define a structural causal model with smoothly differentiable parameters that are adjusted to fit observational data (zheng2018dags; yu2019dag; zheng2020learning; bengio2019meta; lorch21dibs; annadani2021variational), although some methods can accept interventional data, thereby significantly improving the identification of the underlying data-generating process (ke2021dependency; brouillard2020differentiable; lippe2021efficient). However, the improvement critically depends on the experiments and interventions available.

Despite advances in high-throughput methods for interventional data in specific fields (dixit2016perturb), the acquisition of interventional samples in the general settings tends to be costly, technically impossible or even unethical for specific interventions. There is, therefore, a need for efficient usage of the available interventional samples and efficient experimental design to keep the number of interventions to a minimum.

A significant amount of prior work exists in causal structure learning that leverages active learning and experimental design to improve identifiability in a sequential manner. These approaches are either graph theoretical

(he2008active; eberhardt2012almost; hyttinen2013experiment; hauser2014two; shanmugam2015learning; kocaoglu2017experimental; kocaoglu2017cost; lindgren2018experimental; ghassami2018budgeted; ghassami2019interventional; greenewald2019sample; squires2020active), Bayesian (murphy2001active; tong2001active; masegosa2013interactive; cho2016reconstructing; ness2017bayesian; agrawal2019abcd; zemplenyi2021bayesian) or rely on Invariant Causal Prediction (gamella2020active). These methods are typically computationally very expensive and do not scale well with respect to the number of variables or dataset size (heinze2018causal)

. A promising alternative to these methods is the use of active learning in a continuous optimization framework for causal structure learning from joint data. However, since the applicability of existing scores / heuristics for selecting intervention targets is limited for existing frameworks (see §

4), current approaches rely on random and independent interventions and do not leverage the acquired evidence of the processed experiments.

To this end, we propose a novel method of active selection of intervention targets that can easily be incorporated into many differentiable causal discovery algorithms. Since most differentiable causal discovery algorithm treats the adjacency matrix of a causal graph as a learned soft-adjacency, it is readily available for parametrized sampling of different hypothesis graphs. Our method finds an intervention target that gives maximum disagreement between post-interventional sample distributions under these hypothesis graphs, and we conjecture that interventions on such nodes will contain more information about the causal structure and hence enable more efficient learning. To the best of our knowledge, our paper is the first approach to combine both a continuous optimization framework and active causal structure learning from observational and interventional data. We summarize our contributions as follows:

  • [topsep=0pt,itemsep=0pt,leftmargin=25pt]

  • We propose a novel approach for selecting intervention (single and multi-target interventions) which identify the underlying graph efficiently and can be used for any differentiable causal discovery method.

  • We introduce a novel scalable, two-phase DAG sampling procedure which efficiently generates hypothesis DAGs based on a soft-adjacency matrix.

  • We examine the proposed intervention-targeting method across a wide range of settings and demonstrate superior performance against established competitive baselines on multiple benchmarks from simulated to real-world data.

  • We provide empirical insights on the distribution of selected intervention targets and its connection to the (causal) topological order of the variables in the underlying system. Further, we show how the underlying causal graph is identified through the interventions.

2 Preliminaries

Structural Causal Model. An SCM (peters2017elements)

is defined over a set of random variables

or just for short and a directed acyclic graph (DAG) over variables. The random variables are connected by functions and jointly independent noise variables via

where are ’s parents, and directed edges in the graph represent direct causation. The conditionals define the conditional distribution for given its parents of .

Interventions. Interventions on changes the conditional distribution of to a different distribution, hence affecting the outcome of . Interventions can be perfect/hard or imperfect/soft. Hard interventions entirely remove the dependencies of a variable on its parents

, hence defining the conditional probably distribution of

by some rather than . A more general form of intervention is the soft intervention, where the intervention changes the effect of the parents of on itself by modifying the conditional distribution from to an alternative .

2.1 Dependency Structure Discovery from Interventions (DSDI)

We evaluate the proposed method under multiple continuous-optimization causal learning frameworks from fused data (bareinboim2016causal), one of them being DSDI (ke2021dependency). The work of DSDI reformulates the problem of causal discovery from discrete data as a continuous optimization problem using neural networks. The framework proposes to learn the causal adjacency matrix as a matrix parameter of a neural network, and is trained using a 3-stage iterative procedure. The first stage involves sampling graphs under the model’s current belief in the graph structure and then training the conditionals of the sampled graphs using observational data. The next stage is to evaluate the sampled graphs under interventional data and score these graphs accordingly. The final step is to update the learned adjacency matrix with the scores from stage 2. This method performs competitively compared to many other methods; However, all intervention targets in stage 2 of DSDI are random and independent, a strategy that scales poorly to larger graphs. A better approach would have been active intervention targeting.

2.2 Differentiable Causal Discovery from Interventional Data (DCDI)

We also consider the work of DCDI (brouillard2020differentiable), which addresses causal discovery from continuous

data as a continuous-constrained optimization problem using neural networks to model parameters of gaussian distributions or normalizing flows

(rezende2015variational) which represent conditional distributions. Unlike DSDI’s iterative training of the structural and functional parameters, DCDI optimizes the causal adjacency matrix and functional parameters jointly over the fused (observational and interventional) data space. But like DSDI, DCDI uses random and independent interventions.

(a) AIT procedure for intervention target
(b) Discrepancy score landscape
Figure 1: Procedure: Start by sampling a set of hypothetical graphs given the current graph belief , and apply an intervention on targets on the functional parameters which results in partially altered parameters . We continue by sampling sets of hypothetical interventional samples under for every graph in and compute the corresponding discrepancy score

as a ratio of variance-between-graphs

VBG and variance-within-graphs VWG. The landscape of the discrepancy score is visualized in logarithmic scale in (b), where red denotes high values and blue low values.

3 Active Intervention Targeting

We present a score-based intervention design strategy, called Active Intervention Targeting (AIT), which is applicable to many discrete and continuous optimization formulations of causal structure learning algorithms. Furthermore, we show how our proposed method can be integrated into recent differentiable causal discovery frameworks for guided exploration using interventional data.

Assumptions. The proposed method assumes access to a belief state in the graph structure (e.g., in the form of a distribution over graphs, probabilistic adjacency matrix or a set of hypothetical graphs) and functional parameters characterizing the conditional relationships between variables. The proposed model does not have to assume causal sufficiency per se. However, it inherits the assumptions of the selected base framework, and this may include causal sufficiency depending on the base algorithm of choice. In case the underlying framework can handle unobserved variables and offers a generative method for interventional samples, then our method is also applicable

3.1 A score for intervention targeting

Given a graph belief state with its corresponding functional parameters , and a possible set of intervention targets (single-node and multi-node intervention targets), we wish to select the most informative intervention target(s) with respect to identifiability of the underlying structure. Such target(s) presumably yield relatively high discrepancies between samples drawn under different hypothesis graphs, indicating uncertainty about the target’s relation to its parents and/or children.

We thus construct an F-test-inspired score to determine the exhibiting the highest discrepancies between post-interventional sample distributions generated by likely graph structures under fixed functional parameters .

In order to compare sample distributions over different graphs, we distinguish between two sources of variation: Variance between graphs (VBG) and variance within graphs (VWG). While VBG characterizes the variance of sample means over multiple graphs, VWG accounts for the sample variance within a specific graph. We mask the contribution of the intervened variables to VBG and VWG, and construct our discrepancy score as a ratio .

This discrepancy score attains high values for intervention targets of particular interest (see. Fig. 1b for a visualization of the landscape). While VBG itself indicates for which intervention targets the model is unsettled about, an extension to the proposed variance ratio enables more control over the region of interest. Given a fixed set of graphs and a fixed interventional sample size across all graphs, let us assume a scenario where multiple intervention targets attain high VBG. Assessing VWG allows us to distinguish between two extreme cases: (a) targets with sample populations that exhibit large VWG (b) targets with sample populations that exhibit low VWG. While high VBG in (a) might be induced by an insufficient sample size due to high variance in the interventional distribution itself, (b) clearly indicates high discrepancy between graphs and should be preferentially studied for the causal discovery process.

Input: Functional Parameters , Graph Belief State , Interventional Target Space
Output: Intervention Target

1: Sample a set of hypothesis graphs from
2:for each intervention target in  do
3:      Perform intervention corresponding to on
4:     for each graph in  do
5:          Draw samples from on and set variables in to 0
6:     end for
7:     Compute discrepancy score:
8:end for
9:Target Intervention
Algorithm 1 Active Intervention Targeting (AIT)
Computational Details.

We begin by sampling a set of graphs from our graph structure belief state , however parametrized. This will remain fixed for all considered interventions.

Then, we fix an intervention target and apply the corresponding intervention to , resulting in partially altered functional parameters where some conditionals have been changed. Next, we draw interventional samples from on every graph and set variables in to zero to mask of their contribution to the variance. Having collected all samples over the considered graphs for the specific intervention target , we compute and as:

where

is a vector of the same dimension as any sample in

and denotes the overall sample-mean of the interventional setting, the corresponding mean for a specific graph and is the -th sample of the -th graph configuration. Finally, we construct the discrepancy score of as:

In contrast to the original definition of the F-Score, we can ignore the normalization constants due to equal group size and degree-of-freedoms. Although the dependence between the variables is apparent from the connected causal structure, we approximate the variance of the multidimensional samples as the trace over the covariance matrix by assuming that the variables are independent. An outline of the method is provided in Algorithm

1.

Figure 2: Two-Stage DAG Sampling: Based on a soft-adjacency , we sample a topological node ordering from an iterative refined score which is repeatedly computed until we have processed all nodes of the graph. We proceed by permuting according to the drawn node ordering and constrain the upper triangular part to ensure DAGness. Finally, we take independent Bernoulli draws of the unconstrained edge beliefs and arrive at a sampled DAG.

3.2 Two-Phase DAG sampling

Embedding AIT into recent differentiable causal discovery frameworks requires a graph sampler which generates a set of likely graph configurations under the current graph belief state. However, drawing samples from unconstrained graphs (e.g. partially undirected graphs, cyclic directed graphs) is an expensive multi-pass process and we thus constrain our graph sampling space to DAGs for the present work. Since most differentiable causal structure learning algorithms learn edge beliefs in the form of a soft-adjacency matrix, we present a scalable, two-stage DAG sampling procedure which exploits structural information of the soft-adjacency beyond independent edge confidences (see Figure 2 for a visual illustration). More precisely, we start by sampling topological node orderings from an iterative refined score and construct DAGs in the constrained space by independent Bernoulli draws over possible edges. We can therefore guarantee DAGness by construction and do not have to rely on expensive, non-scalable techniques such as rejection sampling or gibbs sampling. The overall method is inspired by topological sorting algorithms of DAGs where we iteratively identify nodes with no incoming edges, remove them from the graph and repeat it until we have processed all nodes.

Soft-Adjacency. Given a learnable graph structure of a graph over variables, the soft-adjacency matrix is given as such that encodes the probabilistic belief in random variable being a direct cause of , where

denotes the sigmoid function. For the ease of notation, we define

and use to denote the considered soft-adjacency at iteration k. Note that the shape of changes through the iterations.

Sample node orderings. For the iterative root sampling procedure, we start at iteration with an initial soft-adjacency and apply the following routine for iterations. We take the maximum over rows of

and result at a vector of independent probabilities

, where denotes the maximal probability of variable being a child of any other variable at the current belief state. After taking the complement , we arrive at where denotes the approximated probability of variable

being a root node in the current state. In order to arrive at normalized distribution to sample a root node, we apply a temperature-scalded softmax:

where denotes the temperature. The introduction of temperature-scaling allows to control the distribution over nodes and account for the entropy of the structural belief. We proceed by sampling a (root) node as and delete all corresponding rows and columns from and arrive at a shrinked soft-adjacency over the remaining variables. We repeat the procedure until we have processed all nodes and have a resulting topological node ordering of .

Sample DAGs based on node orderings. Given a node ordering , we permute the soft-adjacency accordingly and constrain the upper triangular part by setting values to to ensure DAGness by construction (as shown in Figure 2). Finally, we sample a DAG by independent Bernoulli draws of the edge beliefs, as proposed in ke2021dependency.

Connection to Plackett-Luce distribution (Luce59; plackett1975analysis):

Our proposed node ordering sampling routine can be regarded as an extension of the Placket-Luce distribution over node permutations. In contrast, we refine scores in an iterative fashion rather than setting them apriori as we account for previously drawn nodes to estimate the probability of a node being the root node in the current iteration.

Figure 3: Adapted workflow of DSDI with Active Intervention Targeting

3.3 Applicability to DSDI

Before integrating our method into the DSDI framework, we must choose/design a graph sampler based on DSDI’s graph belief characterization and define a sampling routine to generate interventional samples under a given state of the structural and functional parameters.

DSDI offers a learnable graph structure of a graph over variables as such that encodes the soft-adjacency matrix. This formulation naturally suggests the application of the introduced two-phase DAG sampling to generate hypothetical DAGs under current beliefs. Under these hypothetical, acyclic graph configurations, one may then apply an intervention to DSDI functional parameters and sample data using ancestral sampling.

DSDI’s architectural choices allow a seamless integration of our proposed active intervention targeting into stage 2, where graphs are evaluated using interventional data. See Figure 3 for an illustrative description of the adapted workflow and §2.1 for a compact description of the base framework.

3.4 Applicability to DCDI

We start again with defining sampling routines for the generation of hypothetical graphs and interventional samples under a given state. DCDI offers us also access to which allows the same setup as with DSDI. In order to generate interventional samples under the hypothetical graph configuration, we alter the conditionals of the intervened variables and perform ancestral sampling based on the models learned conditional densities.

Embedding AIT into DCDI allows us to predict an interventional target space instead of relying on random interventional samples chosen out of the full target space. In contrast to the unconstrained target space of the original formulation, we estimate a target space of constrained size using AIT and reevaluate it after a fixed amount of gradient steps (see §A.6.1 for technical details).

4 Related Work

Causal induction can use either observational and (or) interventional data. With purely observational data, the causal graph is only identifiable up to a Markov equivalence class (MEC) (spirtes2000causation), interventions are needed in order to identify the underlying causal graph (eberhardt2007interventions). Our work focuses on causal induction from interventional data.

Causal Structure Learning. There exists several approaches for causal induction from interventional data: score-based, constraint-based, conditional independence test based and continuous optimization. We refer to (heinze2018causal; vowels2021d) for recent overviews. While most algorithms perform heuristic, guided searches through the discrete space of DAGs, zheng2018dags reformulates it as a continuous optimization problem constrained to the zero level set of the adjacency matrix exponential. This important result has driven recent work in the field and showed promising results (kalainathan2018sam; yu2019dag; ng2019graph; lachapelle2020gradient-based; zheng2020learning; Zhu2020Causal). Due to the limitations of purely observational data, ke2021dependency and brouillard2020differentiable extend the continuous optimization framework to make use of interventional data. lippe2021efficient scales in a concurrent work with ours the work of (ke2021dependency) to higher dimensions by splitting structural edge parameters in separate orientation and likelihood parameters and leveraging it in an adapted gradient formulation with lower variance. In contrast to (brouillard2020differentiable; ke2021dependency) and our work, they require interventional data on every variable.

Active Causal Structure Learning. Interventions are usually hard to perform and in some cases even impossible (peters2017elements). Minimizing the number of interventions performed is desirable. Active causal structure learning addresses this problem, and a number of approaches have been proposed in the literature. These approaches can be divided into those that select intervention targets using graph-theoretic frameworks, and those using Bayesian methods and information gain.

Graph-theoretic frameworks usually proceed from a pre-specified MEC or CPDAG (completed partially directed acyclic graph) and either investigate special graph substructures (he2008active) such as cliques (eberhardt2012almost; squires2020active), trees (greenewald2019sample), or they prune and orient edges until a satisfactory solution is reached (ghassami2018budgeted; ghassami2019interventional; hyttinen2013experiment), perhaps under a cost budget (kocaoglu2017cost; lindgren2018experimental). Their chief limitation is that an incorrect starting CPDAG can prevent reaching the correct graph structure even with an optimal choice of interventions.

The other popular set of techniques involve sampling graphs from the posterior distribution in a Bayesian framework using MCMC and then selecting the interventions which maximize the information gain on discrete (murphy2001active; tong2001active) or Gaussian (cho2016reconstructing) variables. The drawbacks of these techniques are poor scaling and the difficulty of integrating them with non-Bayesian methods, except perhaps by bootstrapping (agrawal2019abcd).

In contrast to existing work, our base frameworks do not start from a pre-specified MEC or CPDAG and existing graph-theoretical approaches are hence not applicable unless we pre-initalize them with a known skeleton. However, in the case we offer access to a predefined structure in the form of a MEC or CPDAG, a previously directed edge is likely to be inverted during the ongoing process which contradicts with the underlying assumptions of existing approaches. Further, we build atop non-Bayesian frameworks and are therefore limited in applying methods based on information gain which require access to a posterior distribution over graph structures. While bootstrapping would allow us to approximate the posterior distribution over graph structures in our non-Bayesian setting, it is not guaranteed to achieve full support over all graphs since the support is limited to graphs estimated in the bootstrap procedure (agrawal2019abcd). Furthermore, the computational burden of bootstrap would limit us in scaling to graphs of larger size.

5 Experiments

We evaluate the proposed active intervention targeting mechanism on single-target interventions under two different settings: DSDI (ke2021dependency) and DCDI (brouillard2020differentiable). We investigate the impact of AIT under both settings for identifiability, sample complexity, and convergence behaviour compared to random targeting where the next intervention target is chosen independent of the current evidence. In a further line of experiments, we analyze the targeting dynamics with respect to convergence behaviour and the distribution of target node selections. This section will highlight our results on DSDI while also including key results with respect to DCDI (structural discovery and identifiability). However, further ablation studies and analysis of DCDI results have been shifted to the appendix.

Evaluation Setup.

A huge variety of SCMs and their induced DAGs exist, each of which can stress causal structure discovery algorithms in different ways. We perform a systematic evaluation over a selected set of synthetic and non-synthetic SCMs (and datasets). We distinguish between synthetic structured graphs and random graphs, the latter generated from the Erdős–Rényi (ER) model with varying edge densities (see §A.2 for a detailed description of the setup). For conciseness, we only report results on 15-node graphs in this section for the noise-free synthetic setting for AIT on DSDI and on 10-node graphs for the noisy setting for AIT on DSDI (discrete data). In addition, we report key results on 10-node graphs for AIT on DCDI (continuous data) in the main text and provide further results and ablation studies in the appendix. We complete the setup with the Sachs flow cytometry dataset (sachs2005causal) and the Asia network (lauritzen1988local) to evaluate the proposed method on well-known real-world datasets for causal structure discovery. 111The real-world datasets are available through a Creative Commons Attribution-Share Alike License in the bnlearn R package and most baseline implementations are available for Python in the causal discovery toolbox (kalainathan2019causal) with an MIT license. A-ICP is provide by the authors at https://github.com/juangamella/aicp but without a license.

Key Findings.

(a) We report strong results for active-targeted structure discovery on both discrete and continuous-valued datasets and outperform random targeting in all experiments. (b) The proposed intervention targeting mechanism significantly reduces sample complexity with strong benefits for graphs of increasing size and density. (c) The distribution of target selections during graph exploration is strongly connected to the topology of the underlying graph. (d) Undesirable interventions are drastically reduced. (e) When monitoring structured Hamming distance (SHD) throughout the procedure, an “elbow” point appears approximately when the Markov equivalence class (MEC) has been isolated. (f) Active targeting introduces desirable properties such as improved recovery of erroneously converging edges. (g) AIT significantly improves robustness in noise-perturbed environments.

5.1 Structure discovery: Synthetic datasets

We evaluate accuracy in terms of structural Hamming distance (SHD) (acid2003searching) on a diverse set of synthetic non-linear datasets under both DSDI and DCDI, adopting their respective evaluation setups.

The results of DSDI with AIT are reported in Table 1. DSDI with active intervention targeting outperforms all baselines and DSDI with random intervention targeting over all presented datasets. It enables almost perfect identifiability on all structured graphs of size 15 except for the full15 graph, and significantly improves structure discovery of random graphs with varying densities. As the size or density of the underlying causal graphs increases, the benefit of the selection policy becomes more apparent (see Figure 4).

We also examine the effectiveness of our proposed method for DCDI (brouillard2020differentiable) on non-linear data from random graphs of size 10. Active Intervention Targeting improves the identification in terms of sample complexity and structural identifiability compared to random exploration (see Figure 6 and §A.6 for further analyses). We observe that the targeting mechanisms which controls the order and frequency of interventional targets presented to the model, has a clear impact. Further experimental results for DCDI can be found in the appendix.

Structured Graphs Random Graphs
Chain Collider Tree Bidiag Jungle Full ER-1 ER-2 ER-4
GES (chickering2002optimal) 13 1 12 14 14 69 8.3 () 17.6 () 39.4 ()
GIES (hauser2012characterization) 13 6 10 17 23 60 10.9 () 18.1 () 39.3 ()
ICP (peters2016causal) 14 14 14 27 26 105 16.2 () 31.1 () 60.1 ()
A-ICP (gamella2020active) 14 14 14 27 26 105 16.2 () 31.1 () 60.1 ()
NOTEARS (zheng2018dags) 22 21 26 33 35 93 23.7 () 35.8 () 59.5 ()
DAG-GNN (yu2019dag) 11 14 15 27 25 97 16.0 () 30.6 () 59.7 ()
DSDI (Random) (ke2021dependency) highlightColor!400 highlightColor!400 2 3 7 24 1.4 () 2.1 () 7.2 ()
DSDI (AIT) highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!407 highlightColor!400.0 () highlightColor!400.0 () highlightColor!400.0 ()
Table 1: SHD (lower is better) on various 15-variable synthetic datasets. Structured graphs are sorted in ascending order according to their edge density.    denotes average SHD over 10 random graphs.
Sachs Asia
GES (chickering2002optimal) 19 4
GIES (hauser2012characterization) 16 11
ICP (peters2016causal) 17 8
A-ICP (gamella2020active) 17 8
NOTEARS (zheng2018dags) 22 14
DAG-GNN (yu2019dag) 19 10
DSDI (Random) (ke2021dependency) highlightColor!406 highlightColor!400
DSDI (AIT) highlightColor!406 highlightColor!400
Table 2: SHD (lower is better) on two real-world datasets

5.2 Structure discovery: flow cytometry and asia dataset

While the synthetic datasets systematically explore the strengths and weaknesses of causal structure discovery methods, we further evaluate their capabilities on the real-world flow cytometry dataset (also known as Sachs network)(sachs2005causal) and the Asia network (lauritzen1988local) from the BnLearn Repository. DSDI with active intervention targeting outperforms all measured baselines and achieves the same result as random targeting in terms of SHD, but with reduced sample complexity. Despite AIT deviating only by 6 undirected edges from the (concensus) ground truth structure of Sachs et al. (sachs2005causal), there is some concern about the correctness of this graph and the different assumptions associated with the dataset (mooij2020jointcausal; zemplenyi2021bayesian). Therefore, perfect identification may not be achievable by any method in practice in the Sachs setting.

5.3 Effect of intervention targeting on sample complexity

Aside from the significantly improved identification of underlying causal structures, our method allows for a substantial reduction in interventional sample complexity. After reaching the “elbow” point in terms of structural Hamming distance, random intervention targeting requires a fairly long time to converge to a solution within the MEC. In contrast, our proposed technique continues to select informative intervention targets beyond the elbow point and more quickly converges to the correct graph within the MEC. The continued effectiveness of our method directly translates to increased sample-efficiency and convergence speed, and is apparent for all examined datasets (see Figure 4).

(a) ER-1: (b) ER-2: (c) ER-4:
Figure 4: DSDI with active intervention targeting (orange) leads to superior performance over random intervention targeting (blue) on random graphs of size 15. The performance gap becomes more significant with increasing edges density. The plot shows average performance, in terms of SHD and percentage of correctly-identified present ground-truth edges. Error bands were estimated using 10 random ER graphs per setting.

5.4 Distribution of intervention targets

The careful study of the behaviour of the proposed method under our chosen synthetic graphs enable us to reason about the method’s underlying dynamics. Analysing the dynamics of intervention targeting reveals that the distribution of target node selections is linked to the topology of the underlying graph. More specifically, the number of selections of a given target node strongly correlates with its out-degree and number of descendants in the underlying ground-truth graph structure (see Figure 5). That our method prefers interventions on nodes with greater (downstream) impact on the overall system can be most clearly observed in the distribution of target selection on the example of the chain and jungle synthetic graphs in Figure 7.

5.5 Reduction of undesirable interventions

An intervention destroys the original causal influence of other variables on the intervened target variable , so its samples cannot be used to determine the causal parents of in the undisturbed system. Therefore, if a variable without children is detected, interventions upon it should be avoided since they effectively result in redundant observational samples of the remaining variables that are of no benefit for causal structure discovery. Active intervention targeting leads to the desirable property that interventions on such variables are drastically reduced (see Figure 6 and 7).

(a) Random Targeting (b) Active Intervention Targeting
Figure 5: Correlation scores between the number of individual target selections and different topological properties of those targets. AIT shows strong correlations with the measured properties over all graphs, which indicates a controlled discovery of the underlying structure through preferential targeting of nodes with greater (downstream) impact on the overall system.
Figure 6: DCDI guided through AIT (orange) allows a more rapid discovery of the underlying causal structure compared to DSDI relying on random interventions (blue). The distribution of selected intervention targets shows again its highly correlated connection to the topology of the underlying graph (depicted on the left) where nodes of greater impact on the overall system are preferentially studied and nodes without children are rarely chosen.

5.6 Identification of Markov equivalence class

Investigating the evolution of the target distribution over time reveal that the discovery seems to be divided into two phases of exploration: Phase 1 lasts until the elbow point in terms of SHD, and Phase 2 from the elbow point until convergence (see Figure 4). We observed over multiple experiments that phase 1 tends to quickly discover the underlying skeleton (removing superfluous connections while keeping some edges undirected), until a belief state is reached representing a MEC, or a class of graphs very close to a MEC. Phase 2 is predominantly operating on the partially directed skeleton and directing the remaining edges.

(a) Graph: chain15
(b) Graph: jungle15

Figure 7: DSDI: Dynamics and target distribution of AIT for two structured graphs of size 15. Both graphs’ nodes are sorted in topological order, root node first. chain15 is a linear graph, while jungle15 is binary-tree-like with 4 levels. Until the SHD “elbow” point is reached, the selection of nodes in the chain15 graph is almost uniform, while afterwards it is clearly leaning towards the root. For the much denser jungle15, the multi-level structure characteristic of tree-like graphs is readily apparent even before the elbow point. In all cases, nodes with no children are very rarely chosen.

5.7 Recovery of erroneously converging edges

Recovery of incorrectly-converging edges critically depends on adapting the order of interventions, which a random intervention policy does not. In sharp contrast, intervention targeting significantly promotes early recovery (see Figure 8). The observed edge dynamics and the corresponding graph belief states indicate that the random policy can lock itself into unfavorable belief states from which recovery is extremely difficult, while AIT provides an escape hatch throughout learning.

Figure 8: DSDI: Edge Dynamics of collider15 with active intervention targeting. Green edges denote present ground-truth edges which should desirably converge to , and red edges denoting absent edges in the ground-truth graph which should converge to .

5.8 AIT improves robustness in noise-perturbed environments

Considering that noise significantly impairs the performance of causal discovery, we examine the performance of active intervention targeting in noise-perturbed environments with respect to SHD and convergence speed and compare it with random intervention targeting. We conduct experiments under different noise levels in the setting of binary data generated from structured and random graphs of varying density. A noise level denotes the probability of flipping a random variable and apply it to all measured variables of observational and interventional samples.

Through all examined settings, we observe that active intervention targeting significantly improves identifiability in contrast to random targeting (see Table 3). Active intervention targeting perfectly identifies all structured graphs, except for the collider and full graph, up to a noise level of , i.e. where every 20th variables is flipped.

The observed performance boost is even more noticeable in the convergence speed, as shown in Fig. 9 for ER-4 graphs spanning over 10 variables. While the convergence-gap gets more significant with an increasing noise-level, random targeting does not converge to the ground-truth graphs for a noise level higher than . In contrast, AIT still converges to the correct graph and shows even a convergence-tendency for . These findings support our observation from different experiments that active intervention targeting leads to a more controlled and robust graph discovery. Further experimental results in noise-perturbed environments can be found in the appendix.

Chain10 Collider10 Tree10 Bidiag10 Jungle10 Full10 ER-1 ER-2 ER-4
Random 0 0 0 0 0 0 0.0 () 0.0 () 0.0 ()
AIT 0 0 0 0 0 0 0.0 () 0.0 () 0.0 ()
Random 0 0 0 0 0 3 0.0 () 0.0 () 0.6 ()
AIT 0 0 0 0 0 0 0.0 () 0.0 () 0.0 ()
Random 0 4 0 0 0 12 0.0 () 0.0 () 6.0 ()
AIT 0 0 0 0 0 3 0.0 () 0.0 () 0.0 ()
Random 1 9 0 2 1 33 1.3 () 8.0 () 27.0 ()
AIT 0 7 0 0 0 23 0.0 () 1.3 () 18.7 ()
Random 9 9 9 16 16 45 11.0 () 20.7 () 40.0 ()
AIT 7 9 6 16 15 44 10.3 () 20.0 () 39.3 ()
Table 3: Performance evaluation (SHD) under different noise level for structured and random graphs    denotes average SHD over 3 random graphs.
(a) (b) (c) (d)
Figure 9: Convergence Behaviour in terms of SHD for random ER-4 graphs over 10 variables under different noise levels , where Active Intervention Targeting (orange) clearly outperforms Random Targeting (blue) over all noise levels. The performance gap becomes of larger magnitude as the noise level increases. Error bands were estimated using 3 random ER graphs per setting.

6 Conclusion

Promising results have driven a recent surge of interest in continuous optimization methods for Bayesian network structure learning from observational and interventional data. In this work, we proposed a active learning method to choose interventions which identify the underlying graph efficiently in the setting of differentiable causal discovery. We show that active intervention targeting not only improves sample efficiency but also identification of the underlying causal structures compared to random targeting of interventions.

While our method shows significant improvements with respect to sample efficiency and graph recovery over existing methods across multiple noise-free and noise-perturbed datasets, the number of interventions is not yet optimal (atkinson1975optimal; eberhardt2012number) and can potentially be reduced in future work. Further, the interventional samples were presented to the evaluated frameworks according to a fixed learning schema (e.g. fixed amount of samples for evaluated interventions in graph scoring) in this work. It would be interesting to see if the information discovered by AIT could be used for a more adaptive learning procedure to further improve sample efficiency.

References

Appendix A Appendix

  

a.1 Two-Stage DAG Sampling

We present an outline of the proposed two-stage DAG sampling procedure which exploits structural information of the soft-adjacency beyond independent edge confidences. The routine is based on a graph belief state where denotes a soft-adjacency characterization. We start by sampling topological node orderings from an iterative refined score and construct DAGs in the constrained space by independent Bernoulli draws over possible edges. We can therefore guarantee DAGness by construction.

The temperature parameter of the temperature-scaled softmax can be used to account for the entropy of the graph belief state. However, in the general setting we suggest to initialize the parameter to . Note that initializing results in always picking the maximizing argument and

results in an uniform distribution.

Input: Graph Belief State in the form of a soft-adjacency matrix
Output: DAG Adjacency Matrix

1:
2: Phase 1: Sample Node Ordering
3:
4:
5:for  to  do
6:     
7:     
8:     
9:      where
10:     Remove from
11:     
12:end for
13:
14: Phase 2: Sample DAG based on node ordering
15:
16: Permute according to
17: Constrain upper diagonal part by setting values to 0
18:
19: Apply inverse permutation of to
Algorithm 2 Two-Stage DAG Sampling

a.2 Experimental Setup

A huge variety of SCMs and their induced DAGs exist, each of which can stress causal structure discovery algorithms in different ways. In this work, We perform a systematic evaluation over a selected set of synthetic and non-synthetic SCMs. We distinguish between discrete (based on DSDI (ke2021dependency)) or continuous (based on DCDI (brouillard2020differentiable)) valued random variables. Through all experiment, we limit us to 1000 samples per intervention.

a.2.1 Synthetic Datasets

Graph Structure. We adopt the structured graphs (see Fig. 10) proposed in the work of DSDI (ke2021dependency) as they adequately represent topological diversity of possible DAGs in a compact fashion. They can be split up in a set of graphs without cycles in the undirected skeletons, and one group with cycles. Extending the setup with random graphs with varying edge densities, generated from the Erdős–Rényi (ER) model, allows us to assess the generalized performance of the proposed method from sparse to dense DAGs.

Discrete Data Generation. We adopt the generative setup of DSDI (ke2021dependency)

and model the SCMs using two-layer MLPs with Leaky ReLU activations between layers. For every variable

, a seperate MLP models the conditional relationship . The MLP parameters are initialized orthogonally within the range of and biases uniformly in the range of .

Continuous Data Generation. For the evaluation of the adapted DCDI framework, we adopt their generative setup as described in (brouillard2020differentiable) and use the existing non-linear datasets.

Graphs with acyclic skeletons:


(a) Chain (b) Collider (c) Tree

Graphs with cyclic skeletons:


(d) Bidiag (e) Jungle (f) Full
Figure 10: Visualization of Structured Graphs as proposed in ke2021dependency - adapted illustration

a.2.2 Real-World Datasets

Besides the many synthetic graphs, we evaluate our method on real-world datasets provided by the BnLearn data repository. Namely on the Asia (lauritzen1988local) and the Sachs (sachs2005causal) datasets (see Fig. 11 for a visualization of their underlying ground-truth structure). Sachs (sachs2005causal) represents a systems biology dataset which exhibits non-linearity, confounding and complex structure.

(a) Asia (b) Sachs
Figure 11: Ground-truth structure of the evaluated real-world datasets provided by the BnLearn data repository - Illustration from: https://www.bnlearn.com/bnrepository/discrete-small.html

a.3 Availability of Used (Existing) Assets

Base Frameworks.

Baseline Methods.

Datasets.

a.4 Hyper-Parameters

We used a similar set of hyperparameters for our AIT + DSDI and AIT + DCDI models as those used in the original paper

(ke2021dependency; brouillard2020differentiable). The specific hyperparamters we used are stated as follows.

DSDI.

Number of iterations 1000
Batch size 256
Sparsity Regularizer 0.1
DAG Regularizer 0.5
Functional parameter training iterations 10000
Number of interventions per phase 2 25
Number of data batches for scoring 10
Number of graph configurations for scoring
- Graph Size 5: 10
- Graph Size 10: 20
- Graph Size 15 40
AIT:
- Number of graph configurations 100
- Number of interventional samples per graph & target 256
Table 4: Hyperparameters for DSDI including the corresponding
AIT parameters

DCDI.

0
2
0.9
Augmented Lagrangian Thresh
Learning rate
Nr. of hidden units 16
Nr. of hidden layers 2
AIT:
- Number of graph configurations 100
- Number of interventional samples per graph & target 256
Table 5: Hyperparameters for DCDI including the corresponding
AIT parameters

a.5 Discrete Setting: Additional Experiments / Results

In this section, we show further results and visualizations of experiments on discrete data and single-target interventions in various settings (such as graphs of varying size, noise-free vs. noise-perturbed, limited intervention targets). All experiments are based on the framework DSDI.

a.5.1 Evaluation (SHD) on graphs of varying size and density

Structured Graphs Random Graphs
Chain Collider Tree Bidiag Jungle Full ER-1 ER-2 ER-4
GES (chickering2002optimal) 3 0 4 6 4 9 4.3 ()
GIES (hauser2012characterization) 3 4 2 6 5 10 4.7 ()
ICP (peters2016causal) 4 4 4 7 6 10 5.4 ()
A-ICP (gamella2020active) 4 4 4 7 6 10 5.4 ()
NOTEARS (zheng2018dags) 5 3 6 5 7 9 6.1 ()
DAG-GNN (yu2019dag) 4 4 3 4 6 9 5.1 ()
DSDI (Random) (ke2021dependency) highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400.0 ()
DSDI (AIT) highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400.0 ()
Table 6: SHD (lower is better) on various 5-variable synthetic datasets. Structured graphs are sorted in ascending order according to their edge density.    denotes average SHD over 10 random graphs.,   †ER-2 graphs on 5 results in the full5 graph and ER-4 graphs on 5 node graphs are non-existing
Structured Graphs Random Graphs
Chain Collider Tree Bidiag Jungle Full ER-1 ER-2 ER-4
GES (chickering2002optimal) 9 2 6 8 10 35 7.0 () 10.7 () 26.7 ()
GIES (hauser2012characterization) 12 6 13 16 9 20 12.2 () 14.1 () 26.1 ()
ICP (peters2016causal) 9 9 9 17 16 45 10.6 () 20.7 () 39.8 ()
A-ICP (gamella2020active) 9 9 9 17 16 45 10.6 () 20.7 () 39.8 ()
NOTEARS (zheng2018dags) 13 16 12 21 21 42 16.4 () 22.9 () 36.6 ()
DAG-GNN (yu2019dag) 8 7 6 15 13 38 10.3 () 20.1 () 38.4 ()
DSDI (Random) (ke2021dependency) highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400.0 () highlightColor!400.0 () highlightColor!400.0 ()
DSDI (AIT) highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400.0 () highlightColor!400.0 () highlightColor!400.0 ()
Table 7: SHD (lower is better) on various 10-variable synthetic datasets. Structured graphs are sorted in ascending order according to their edge density.    denotes average SHD over 10 random graphs.
Structured Graphs Random Graphs
Chain Collider Tree Bidiag Jungle Full ER-1 ER-2 ER-4
GES (chickering2002optimal) 13 1 12 14 14 69 8.3 () 17.6 () 39.4 ()
GIES (hauser2012characterization) 13 6 10 17 23 60 10.9 () 18.1 () 39.3 ()
ICP (peters2016causal) 14 14 14 27 26 105 16.2 () 31.1 () 60.1 ()
A-ICP (gamella2020active) 14 14 14 27 26 105 16.2 () 31.1 () 60.1 ()
NOTEARS (zheng2018dags) 22 21 26 33 35 93 23.7 () 35.8 () 59.5 ()
DAG-GNN (yu2019dag) 11 14 15 27 25 97 16.0 () 30.6 () 59.7 ()
DSDI (Random) (ke2021dependency) highlightColor!400 highlightColor!400 2 3 7 24 1.4 () 2.1 () 7.2 ()
DSDI (AIT) highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!400 highlightColor!408 highlightColor!400.0 () highlightColor!400.0 () highlightColor!400.0 ()
Table 8: SHD (lower is better) on various 15-variable synthetic datasets. Structured graphs are sorted in ascending order according to their edge density.    denotes average SHD over 10 random graphs.

a.5.2 Evaluation of convergence speed on graphs of varying size and density

While we have shown the effectiveness of AIT on random ER graphs of size in §5.3, we observe similar effects on ER graphs of size (see Figure 12). Overall, the results indicate a greater impact of our proposed targeting mechanisms on graphs of bigger size compared to random intervention targeting which poorly scales to graphs of larger size.

(a) ER-1: (b) ER-2: (c) ER-4:
Figure 12: DSDI with AIT (orange) leads to superior performance over random intervention targeting (blue) on random graphs of size 10 of varying edge densities. Error bands were estimated using 10 random ER graphs per setting.
(a) ER-1: (b) ER-2: (c) ER-4:
Figure 13: DSDI with AIT (orange) leads to superior performance over random intervention targeting (blue) on random graphs of size 15 of varying edge densities. Error bands were estimated using 10 random ER graphs per setting.

a.5.3 Target selection analysis for graphs of varying size and density

We evaluate the distribution of target node selections over multiple DAGs of varying size to investigate the behaviour of our proposed method. Over all performed experiments, our method prefers interventions on nodes with greater (downstream) impact on the overall system, i.e. nodes of higher topological rank in the underlying DAG.

(a) Random Targeting (a) Active Intervention Targeting
Figure 14: Correlation scores over graphs of varying size and density between the number of individual target selections and different topological properties of those targets. AIT shows strong correlations with the measured properties over all graphs, which indicates a controlled discovery of the underlying structure through preferential targeting of nodes with greater (downstream) impact on the overall system.

a.5.4 Visualization of target distribution on structured graphs of size 5

Figure 15: Visualization of target selection on structured graphs of size 5 - bigger node size denotes more selection of the node. While Random Targeting acts as we expect and selects every node an uniform amount, AIT prefers targeting of nodes with greater (downstream) impact on the overall system, i.e. nodes of higher topological order.

a.5.5 Extended Analysis of Edge Dynamics

We show all edge dynamics of all structured graphs over 15 variables and compare the dynamics of random targeting to active intervention targeting in a noise-free setting where we have access to all possible single-target interventions.

(a) Graph: Chain (b) Graph: Tree (c) Graph: Collider

Figure 16: Edge Dynamics of the examined structured graphs spanning over 15 variables - Part 1: The upper part shows the dynamics of random targeting and the lower of active intervention targeting.

(d) Graph: Bidiag (e) Graph: Jungle (f) Graph: Full

Figure 17: Edge Dynamics of the examined structured graphs spanning over 15 variables - Part 2: The upper part shows the dynamics of random targeting and the lower of active intervention targeting.

a.5.6 Improved robustness with DSDI+AIT in noise perturbed environments

While section §5.8 highlights our key findings in noise-perturbed systems, we examine the impact of AIT in noise perturbed environments more thoroughly in this section. Therefore, we systematically analyze experiments under different noise levels in the setting of binary data generated from random graphs of varying densities. A noise level denotes the probability of flipping a random variable and apply it to all measured variables of observational and interventional samples.

Evaluating convergence on various ER graphs of varying densities over 10 variables under different noise levels reveals that the impact of AIT becomes of larger magnitude as the density of the graph and the noise level increases.

(i) ER-1:

(ii) ER-2:

(iii) ER-4:

(a) (b) (c) (d)
Figure 18: Convergence behaviour in terms of SHD for random ER graphs of various densities over 10 variables under different noise levels . Overall, Active Intervention Targeting (orange) clearly outperforms Random Targeting (blue) over all densities under all noise levels. The performance gap becomes of larger magnitude as density of the graph and the noise level increases. Error bands were estimated using 3 random ER graphs per setting.

a.5.7 Limited Intervention Targets

While we allow access to all possible single-target interventions in all other experiments, real world settings are usually more restrictive. Specific interventions might be either technically impossible or even unethical, or the experiments might want to prevent interventions upon specific target nodes due to increased experiment costs. In order to test the capability of AIT, we limit the set of possible intervention targets in the following experiments and analyze the resulting behaviour based on DSDI. We examine speed of convergence and the effect on the target distribution under different scenarios on structured graphs using DSDI with AIT based on single-target interventions.

Scenario 1: We perform experiments on a Chain5 graph where we restrict us on intervening upon a different node in five experiment and once allow access to all targets as a comparison.

Throughout the experiments, we observe that blocking interventions on nodes of a higher topological level results in greater degradation of the convergence speed compared to blocked intervention on lower levels (see Figure 19). Furthermore, the distribution of selected targets indicates that our method preferentially chooses neighboring nodes of a blocked target node in the restricted setting.

Figure 19: Limited intervention targets on Chain5: The impact of the restricted target node (red circled node) is clearly observable in the convergence speed (left) and distribution of target selections (middle). The speed of convergence indicates a dependence on the topological characteristic of the restricted intervention target.

Scenario 2: We perform multiple experiments on a Tree5 graph where we restrict access to different subsets of nodes (e.g. root node, set of all sink nodes) for single-target interventions.

Similar to the experiments on Chain5, we observe a clear impact of the available intervention targets on the convergence speed and identifiability of the underlying structure (see Figure 20). While preventing interventions on all sink nodes (node 2, 3 and 4) results in improved convergence towards the underlying structure, restricted access to the set of nodes which act as causes of other nodes (node 0 and 1) prevents us from identifying the correct underlying structure.

Figure 20: Limited intervention targets on Tree5: The impact of the restricted target nodes (red circled nodes) is clearly observable in the convergence speed (left) and distribution of target selections (middle).

a.6 Continuous Setting: Technical Details and further Results

While the original framework of DCDI (brouillard2020differentiable) proposes a joint-optimization over the observational and interventional sample space by selecting samples at random, we adapt their framework to the setting of active causal discovery where we acquire interventional sample in an adaptive manner. We hypothesize that a controlled selection of informative intervention targets allows a more rapid and controlled discovery of the underlying causal structure.

a.6.1 Integration of AIT into DCDI

Instead of demanding the full interventional target space during the complete optimization as in the original approach, we split the optimization procedure into different episodes, where AIT is used to estimate a target space of size for each episode. This is done by computing the discrepancy scores over all possible intervention targets and selecting the highest scoring targets. During an episode, we continue by performing gradient steps using the fixed target space and reevaluate it afterwards for the next episode. We visualize the adaption in the following high-level outline of the individual methodologies.

1:
2: Full target space of size
3:Run DCDI on until convergence
Algorithm 3 DCDI

1:
2:for episode until convergence do
3:      Estimate target space of size using AIT
4:     Run gradient steps of DCDI on
5:end for
Algorithm 4 DCDI + AIT

a.6.2 Evaluation

We evaluate the effectiveness of AIT in the base framework of DCDI in the setting of non-linear, continuous data generated from random graphs over variables and show the potential of our proposed method.

Structural Identification / Convergence: Despite their joint optimization formulation is not apriori designed for the setting of experimental design, an AIT guided version shows superior/competitive performance in terms of structural identification and sample complexity over the original formulation (see Figure 21).

Distribution of Intervention Targets: As in DSDI, we observe strong correlation of the number of target selections with the measured topological properties of the specific nodes. This indicates a controlled discovery of the underlying causal structure through preferential targeting of nodes with greater (downstream) impact on the overall system. In addition, interventions on variables without children are drastically reduced (see also §5.5 for equivalent observations in DSDI).

Effect of Target Space Size : While the original formulation assumes for the complete optimization procedure (i.e. ) and relies on random samples out of the full target space, our adapted AIT-guided version of DCDI constrains the target space to a subset of targets for each episode. An ablation study on the size of the target space shows that for all choices of , our approach outperforms the original formulation in terms of sample complexity while achieving same or better performance in terms of SHD.

Figure 21: While DCDI Vanilla assumes access to the full interventional target space through the complete optimization, the AIT guided DCDI approach reevaluates its interventional target space of size every gradient steps. Among the above evaluated graphs (ground-truth on the left), DCDI+AIT demonstrates a more rapid identification of the underlying causal structure while achieving same or better performance in terms of SHD. The distribution of selected single-node intervention targets reveals again its connection to the topological properties of the corresponding nodes.
(a) (b) (d) (d)
Figure 22: All evaluated target space sizes show that DCDI+AIT (orange) outperforms DCDI (blue) in terms of sample complexity while achieving same or better performance. Error bands were estimated using 10 random ER graphs per setting.