# Almost Optimal Universal Lower Bound for Learning Causal DAGs with Atomic Interventions

A well-studied challenge that arises in the structure learning problem of causal directed acyclic graphs (DAG) is that using observational data, one can only learn the graph up to a "Markov equivalence class" (MEC). The remaining undirected edges have to be oriented using interventions, which can be very expensive to perform in applications. Thus, the problem of minimizing the number of interventions needed to fully orient the MEC has received a lot of recent attention, and is also the focus of this work. We prove two main results. The first is a new universal lower bound on the number of atomic interventions that any algorithm (whether active or passive) would need to perform in order to orient a given MEC. Our second result shows that this bound is, in fact, within a factor of two of the size of the smallest set of atomic interventions that can orient the MEC. Our lower bound is provably better than previously known lower bounds. The proof of our lower bound is based on the new notion of clique-block shared-parents (CBSP) orderings, which are topological orderings of DAGs without v-structures and satisfy certain special properties. Further, using simulations on synthetic graphs and by giving examples of special graph families, we show that our bound is often significantly better.

## Authors

• 3 publications
• 8 publications
• 6 publications
11/01/2020

### Active Structure Learning of Causal DAGs via Directed Clique Tree

A growing body of work has begun to study intervention design for effici...
06/06/2021

### Collaborative Causal Discovery with Atomic Interventions

We introduce a new Collaborative Causal Discovery problem, through which...
06/13/2012

### Identifying Optimal Sequential Decisions

We consider conditions that allow us to find an optimal strategy for seq...
03/05/2019

### Size of Interventional Markov Equivalence Classes in Random DAG Models

Directed acyclic graph (DAG) models are popular for capturing causal rel...
07/05/2021

### Matching a Desired Causal State via Shift Interventions

Transforming a causal system from a given initial state to a desired tar...
02/27/2020

### A Lower Bound of the Number of Rewrite Rules Obtained by Homological Methods

It is well-known that some equational theories such as groups or boolean...
05/27/2022

### Fast Causal Orientation Learning in Directed Acyclic Graphs

Causal relationships among a set of variables are commonly represented b...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Causal Bayesian Networks (CBN) provide a very convenient framework for modeling causal relationships between a collection of random variables

(Pearl, 2009)

. A CBN is fully specified by (a) a directed acyclic graph (DAG), whose nodes model random variables of interest, and whose edges depict immediate causal relationships between the nodes, and (b) a conditional probability distribution (CPD) of each variable given its parent variables (in the DAG) such that the joint distribution of all variables factorizes as a product of these conditionals. The generality of the framework has led to CBN becoming a popular tool for the modeling of causal relationships in a variety of fields, with health science

(Shen et al., 2020), molecular cell biology (Friedman, 2004), and computational advertising (Bottou et al., 2013) being a few examples.

It is well known that the underlying DAG of a CBN is not uniquely determined by the joint distribution of its nodes. In fact, the joint distribution only determines the DAG up to its Markov Equivalence Class (MEC), which is represented as a partially directed graph with well-defined combinatorial properties (Verma and Pearl, 1990; Chickering, 1995; Meek, 1995; Andersson et al., 1997). Information about which nodes are adjacent is encoded in the MEC, but the direction of several edges remains undetermined. Thus, learning algorithms based only on the observed joint distribution (Glymour et al., 2019) cannot direct these remaining edges. As a result, algorithms which use additional interventional distributions were developed (Squires et al. (2020) and references therein). In addition to the joint distribution, these algorithms also assume access to interventional distributions generated as a result of randomizing some target vertices in the original CBN (a process called intervention) and thereby breaking their dependence on any of their ancestors. A natural and well-motivated (Eberhardt et al., 2005) question, therefore, is to find the minimum number of interventions required to fully resolve the orientations of the undirected edges in the MEC.

Interventions, especially on a large set of nodes, however, can be expensive to perform (Kocaoglu et al., 2017). In this respect, the setting of atomic interventions, where each intervention is on a single node, is already very interesting and finding the smallest number of atomic interventions that can orient the MEC is well studied (Squires et al., 2020). A long line of work, including those cited above, has considered in various settings the problem of designing methods for finding the smallest set of atomic interventions that would fully orient all edges of a given MEC. An important distinction between such methods is whether they are active (He and Geng, 2008), i.e., where the directions obtained via the current intervention are available before one decides which further interventions to perform; or passive, where all the interventions to be performed have to be specified beforehand. Methods can also differ in whether or not randomness is used in selecting the targets of the interventions. An important question, therefore, is to understand how many interventions must be performed by any given method to fully orient an MEC.

#### Universal Lower Bounds

While several works have reported lower bounds (on minimum number of atomic interventions required to orient an MEC) in different settings, a very satisfying solution concept for such lower bounds, called universal lower bounds, was proposed by Squires et al. (2020). A universal lower bound of atomic interventions for orienting a given MEC means that if a set of atomic interventions is of size less than , then for every ground-truth DAG in the MEC, the set will fail to fully orient the MEC. Thus, a universal lower bound has two universality properties. First, the value of a universal lower bound depends only upon the MEC, and applies to every DAG in the MEC. Second, the lower bound applies to every set of interventions that would fully orient the MEC, without regards to the method by which the intervention set was produced.

In this work, we address the problem of obtaining tight universal lower bounds. The goal is to find a universal lower bound such that for any DAG in the MEC, the smallest set of atomic interventions that can orient the MEC into has size bounded above by a constant factor of the universal lower bound. Similar to Squires et al. (2020), we work in the setting of causally sufficient models, i.e. there are no hidden confounders, selection bias or feedback. To the best of our knowledge, this is the first work that addresses the problem of tight (up to a constant factor) universal lower bounds. We note that the best known universal lower bounds (Squires et al., 2020) so far are not tight and provide concrete examples of graph families that illustrate this in Section 3.2.

### 1.1 Our Contributions

We prove a new universal lower bound on the size of any set of atomic interventions that can orient a given MEC, improving upon previous work (Squires et al., 2020). We further prove that our lower bound is optimal within a factor of 2 in the class of universal lower bounds: we show that for any DAG in the MEC, there is a set of atomic interventions of size at most twice our lower bound, that would fully orient the MEC if the unknown ground-truth DAG were .

We also compare our new lower bound with the one obtained previously by Squires et al. (2020). We prove analytically that our lower bound is at least as good as the one given by Squires et al. (2020). We further give examples of graph classes where our bound is significantly better (in fact, it is apparent from our proof that the graphs in which the two lower bounds are close must have very special properties). We then supplement these theoretical findings with simulation results comparing our lower bound with the “true” optimal answer and with the lower bound in previous work.

Our lower bound is based on elementary combinatorial arguments drawing upon the theory of chordal graphs, and centers around a notion of certain special topological orderings of DAGs without v-structures, which we call () orderings (section 3). This is in contrast to the earlier work of Squires et al. (2020), where they had to develop sophisticated notions of directed clique trees and residuals in order to prove their lower bound. We expect that the notion of orderings may also be of interest in the design of optimal intervention sets.

### 1.2 Related Work

The theoretical underpinning for many works dealing with the use of interventions for orienting an MEC can be said to be the notion of “interventional” Markov equivalence (Hauser and Bühlmann, 2012), which, roughly speaking, says that given a collection of sets of targets for interventions, two DAGs and are -Markov equivalent if and only if for all , the DAGs obtained by removing from and the incoming edges of all vertices in are in the same MEC (Hauser and Bühlmann, 2012, Theorem 10). Thus, interventions have the capability of distinguishing between DAGs in the same Markov Equivalence class, and in particular, “interventional” Markov equivalence classes can be finer than MECs (Hauser and Bühlmann, 2012, see also fig. 1 below).

As described above, the problem of learning the orientations of a CBN using interventions has been studied in a wide variety of settings. Lower bounds and algorithms for the problem have been obtained in the setting of interventions of arbitrary sizes and with various cost models (Eberhardt, 2008; Shanmugam et al., 2015; Kocaoglu et al., 2017), in the setting when the underlying model is allowed to contain feedback loops (and is therefore not a CBN in the usual sense) (Hyttinen et al., 2013a, b), in settings where hidden variables are present (Addanki et al., 2020, 2021), and in interventional “sample efficiency” settings (Agrawal et al., 2019; Greenewald et al., 2019). The related notion of orienting the maximum possible number of edges given a fixed budget on the number or cost of interventions has also been studied (Hauser and Bühlmann, 2014; Ghassami et al., 2018; AhmadiTeshnizi et al., 2020). However, to the best of our knowledge, the work of Squires et al. (2020) was the first to isolate the notion of a universal lower bound, and prove a lower bound in that setting.

## 2 Preliminaries

#### Graphs

A partially directed graph (or just graph) consists of a set of nodes or vertices and a set of adjacencies. Each adjacency in is of the form or , where are distinct vertices, with the condition that for any , at most one of , and is present in . 111 and are treated as equal. If there is an adjacency in containing both and , then we say that and are adjacent in , or that there is an edge between and in . If , then we say that the edge between and in is undirected, while if then we say that the edge between and is directed in . is said to be undirected if all its adjacencies are undirected, and directed if all its adjacencies are directed. Given a directed graph , and a vertex in , we denote by the set of nodes in such that is present in . A vertex in is said to be a child of if . An induced subgraph of is a graph whose vertices are some subset of , and whose adjacencies are all those adjacencies in both of whose elements are in . This induced subgraph of is denoted as . The skeleton of , denoted , is an undirected graph with nodes and adjacencies whenever , are adjacent in .

A cycle in a graph is a sequence of vertices (with ) such that for each , either or is present in . The length of the cycle is , and the cycle is said to be simple if are distinct. The cycle is said to have a chord if two non-consecutive vertices in the cycle are adjacent in , i.e., if there exist such that and such that and are adjacent in . The cycle is said to be directed if for some , is present in . A graph is said to be a chain graph if it has no directed cycles. The chain components of a chain graph are the connected components left after removing all the directed edges from . A directed acyclic graph or DAG is a directed graph without directed cycles. Note that both DAGs and undirected graphs are chain graphs. An undirected graph is said to be chordal if any simple cycle in of length at least 4 has a chord.

A clique in a graph is a subset of nodes of such that any two distinct and in are adjacent in . The clique is maximal if for all , the set is not a clique.

A perfect elimination ordering (PEO), of a graph is an ordering of the nodes of such that , is a clique in , where is the set of nodes adjacent to .222Our definition of a PEO uses the same ordering convention as Hauser and Bühlmann (2014). A graph is chordal if and only if it has a perfect elimination ordering (Blair and Peyton, 1993). A topological ordering, of a DAG is an ordering of the nodes of such that whenever , where denotes the index of in . We say that is oriented according to an ordering to mean that has a topological ordering .

A v-structure in a graph is an induced subgraph of the form . It follows easily from the definitions that by orienting the edges of a chordal graph according to a perfect elimination ordering, we get a DAG without v-structures, and that the skeleton of a DAG without v-structures is chordal (see Proposition 1 of Hauser and Bühlmann (2014)). In fact, any topological ordering of a DAG without v-structures is a perfect elimination ordering of .

#### Interventions

An intervention on a partially directed graph is specified as a subset of target vertices of . An intervention set is a set of interventions. In this paper, we make the standard assumption that the “empty” intervention, in which no vertices are intervened upon, is always included in any intervention set we consider: this corresponds to assuming that information from purely observational data is always available (see, e.g., the discussion surrounding Definition 6 of Hauser and Bühlmann (2012)). The size of an intervention set is the number of interventions in , not counting the empty intervention.

is a set of atomic interventions if for all non-empty . With a slight abuse of notation, we denote a set of atomic interventions as just the set when it is clear from the context that we are talking about a set of atomic interventions.

Given an intervention set and a DAG , we denote, following Hauser and Bühlmann (2012), by the partially directed graph representing the set of all DAGs that are -Markov equivalent to . is also known as the -essential graph of . For a formal definition of -Markov equivalence, we refer to Definitions 7 and 9 of Hauser and Bühlmann (2012); we use instead the following equivalent characterization developed in the same paper. [Characterization of -essential graphs,  Definition 14 and Theorem 18 of Hauser and Bühlmann (2012)] Let be a DAG and an intervention set. A graph is an -essential graph of if and only if has the same skeleton as , all directed edges of are directed in the same direction as in , all v-structures of are directed in , and

1. is a chain graph with chordal chain components.

2. For any three vertices of , the subgraph of induced by , and is not .

3. If in (so that , are adjacent in ) and there is an intervention such that , then is directed in .

4. Every directed edge in is strongly -protected. An edge in is said to be strongly -protected if either (a) there is an intervention such that , or (b) at least one of the four graphs in Figure 1 appears as an induced subgraph of , and appears in that induced subgraph in the configuration indicated in the figure.

## 3 Universal Lower Bound

In this section, we establish our main technical result (Section 3). Our new lower bound (figs. 2 and 3) then follows easily from this combinatorial result, without having to resort to the sophisticated machinery of residuals and directed clique trees developed in previous work (Squires et al., 2020).

We begin with a definition that isolates two important properties of certain topological orderings of DAGs without v-structures. Given a DAG without v-structures, and a maximal clique of D, we denote by any vertex in such that . The fact that [D]C is uniquely defined, and that when and are distinct maximal cliques of D is guaranteed by the following observation. (The standard proof of this is deferred to Section B.1.) Let be a DAG without v-structures. Then, for every maximal clique of , there is a unique vertex of , denoted , such that . Further, for any two distinct maximal cliques and in D, we have . We refer to each vertex of that is equal to for some maximal clique of D as a vertex of the DAG . Note also that is also the unique node with out-degree in the induced subgraph .

[ () ordering] Let be a topological ordering of a DAG without v-structures. Let be the vertices of indexed so that when . (Here is the number of maximal cliques in D.) Then, is said to be a () ordering of if it satisfies the following two properties:

1. P1: Clique block property Define to be the set of nodes which occur before or at the same position as in i.e., . Similarly, for , define to be the set of nodes which occur in before or at the same position as , but strictly after (i.e., ). Then, for each the subgraph induced by in is a (not necessarily maximal) clique.

2. P2: Shared parents property If vertices and in are consecutive in (i.e., ), and also lie in the same for some , then all parents of are also parents of in .

We illustrate the definition with an example in Figure 2. In the figure, vertices , , and are the vertices of , and are highlighted with an underbar. The orderings , and in the figure are valid topological orderings of . However, does not satisfy P1 of section 3 (since is not a clique), while satisfies P1 of section 3, but does not satisfy P2, because in are consecutive in , but is a parent only of and not of . Finally, satisfies both P1 and P2 and hence is a ordering.

Our main technical result is that for any DAG that has no v-structures, there exists a ordering of , and the new lower bound is an easy corollary of this result. Further, the proof of this result uses only standard notions from the theory of chordal graphs. If is a DAG without v-structures, then has a ordering.

Towards the proof of this theorem, we note first that the existence of a topological ordering satisfying just P1 can be established using ideas from the analysis of, e.g., the “maximum cardinality search” algorithm for chordal graphs (Tarjan and Yannakakis (1984), see also Corollary 2 of Wienöbst et al. (2021)). We state this here as a lemma, and provide the proof in Section A. If is a DAG without v-structures, then has a topological ordering satisfying P1 of section 3. We now prove the theorem.

###### Proof of section 3.

Let be the set of topological orderings of which satisfy P1 of section 3. By section 3, is non-empty. If there is a which also satisfies P2 of section 3, then we are done.

We now proceed to show by contradiction that such a must indeed exist. So, suppose for the sake of contradiction that for each in , P2 is violated. Then, for each , there exist vertices and an index such that , , and there exists a parent of in that is not a parent of . For any given , we choose as above so that is as small as possible. With such a choice of for each , we then define a function by defining . Note that by the assumption that P2 is violated by each in , is defined for each in . But then, since is a finite set, there must be some for which attains its maximum value. We obtain a contradiction by exhibiting another for which is strictly larger than . We first describe the construction of from , and then prove that so constructed is in and has .

Construction. Given , let . Let and suppose that . By the definition of , there is a parent of that is not a parent of . Define to be the set . Since has no v-structures, is a clique in D. Define Let be the set of nodes occurring after in that are not in (note that , and, in general, if and only if and there is an that is not adjacent to ).

We now note the following easy to verify properties of the sets and (the proof is provided in Section B.2).

1. If and is a child of in , then .

2. Suppose that is not a node of . Then there exists such that and such that is a node in . In particular, is non-empty.

3. Suppose that is a node in the induced DAG . Then is also a node in .

The ordering is now defined as follows: the first nodes in are the same as . After this, the nodes of appear according to some topological ordering of the induced DAG that satisfies P1 of section 3 in (such a exists because of section 3 applied to the induced DAG , which also cannot have any v-structures). Finally, the nodes of appear according to their ordering in .

Proof that and . Note first that is a topological ordering of . For, if not, then there must exist and such that the edge is present in , but this cannot happen by item 1 of section 3 above.

To show that (i.e., that satisfies P1), the following notation will be useful. For each vertex in , denote by the unique such that . Similarly, denote by the unique such that . Since , we already know that is a clique for each node of . In order to show that , all we need to show is that is also a clique for each node of .

Let be the set of vertices of present in . By item 2 of section 3, the last vertex in must be an element of . Note also that since and the edge in together imply that by item 1 of section 3. We also observe that , for all . For if there exists then the edge in implies that , contradicting that .

From the construction of , we already have for all vertices that precede in . From the fact that all vertices in precede in the ordering , and from the observations above that (i) the sink node , and (ii) , for all sink nodes , we also get that for any node of such that , . Thus, when is a node of that is not in , we have that is a clique in D, since , and is a clique in D. It remains to show that is a clique when .

Let be the nodes of , arranged in increasing order by . Since satisfies P1 in , each , , is a clique in (and thus also in ). Now consider a node . Since the are nodes of (from item 3 of section 3), it follows that (if for ) or (if ). In the former case, is automatically a clique, since is a clique in . In the latter case also is a clique since , so that is a clique in (since (i) and are cliques, and (ii) by definition of , every node of is adjacent to every node in ).

Thus, we get that also satisfies P1, so that . Consider the node next to in . Since is non-empty, the construction of implies , so that is adjacent to all parents of . Since and agree on the ordering of all vertices up to , we thus have . This gives the desired contradiction to being chosen as a maximum of . Thus, there must exist some ordering in which satisfies P2. ∎

Given a DAG without v-structures with nodes, atomic interventions are necessary to learn the directed edges of , starting with . Here is the number of distinct maximal cliques in .

###### Proof.

Consider a ordering of nodes in . Let be two nodes such that for some and such that . Consider a set of atomic interventions such that . We show now that the edge is not directed in the -essential graph of .

Suppose, for the sake of contradiction, that is directed in . Then, by item 4 of fig. 1, must be strongly -protected in . Since , one of the graphs in Figure 1 must appear as an induced subgraph of .

We now show that none of these subgraphs can appear as an induced subgraph of . First, subgraphs (ii) and (iv) cannot be induced subgraphs of since they have a v-structure at while (and therefore also ) has no v-structures. For subgraph (iii) to appear as an induced subgraph, the vertex must lie between and in any topological ordering of , which contradicts the fact that and are consecutive in the topological ordering . For subgraph (i) to appear, we must have a parent of that is not adjacent to . However, since is a ordering, it satisfies property P2 of section 3, so that, since are consecutive in and belong to the same , any parent of must also be a parent of . We thus conclude that cannot be strongly -protected in , and hence is not directed in it.

The above argument implies that any set of atomic interventions that fully orients starting with (i.e., for which ) must contain at least one node of each pair of consecutive nodes (in ) of , for each . Thus, for each , must contain at least nodes of . We therefore have,

 \absI ≥r∑i=1\ceil|Li(σ)|−12≥\ceilr∑i=1|Li(σ)|−12 =\ceil∑ri=1|Li(σ)|2−r2=\ceiln−r2.\qed (1)

The following corollary for general DAGs (those that may have v-structures) follows from the previous result about DAGs without v-structures in a manner identical to previous work (Squires et al., 2020), using the fact that it is necessary and sufficient to separately orient each chordal chain component of an MEC in order to fully orient an MEC (Hauser and Bühlmann, 2014, Lemma 1). We defer the standard proof to Section B.3. Let be an arbitrary DAG and let be the chain graph with chordal chain components representing the MEC of . Let denote the set of chain components of , and the number of maximal cliques in the chain component . Then, any set of atomic interventions which fully orients must be of size at least

 ∑S∈CC\ceil\absS−r(S)2≥\ceiln−r2,

where is the number of nodes in , and is the total number of maximal cliques in the chordal chain components of (including chain components consisting of singleton vertices).

### 3.1 Tightness of Universal Lower Bound

We now show that our universal lower bound is tight up to a factor of : for any DAG , there is a set of atomic interventions of size at most twice the lower bound that fully orients the MEC of . In fact, as the proof of the theorem below shows, when has no v-structures, this intervention set can be taken to be the set of nodes of that are not nodes of .

Let be a DAG without v-structures with nodes, and let be the number of distinct maximal cliques of . Then, there exists a set of atomic interventions of size at most such that fully orients D (i.e., ).

###### Proof.

Fix any topological ordering of . Let the maximal cliques of be , and let , for . Section 3 implies that each node of is distinct. We re-index these nodes according to the ordering , i.e. when . Consider the set of atomic interventions (note that ). We show that . Note that every edge of , except those which have both end-points in , has a single end-point in one of the interventions in , and hence is directed in (by item 3 of Figure 1). We show now that all edges with both end-points in are also oriented in .

Suppose, if possible, that there exist , with such that and are adjacent in D, so that the edge is present in , but for which is not directed in . We derive a contradiction to this supposition. To start, choose an as above with the smallest possible value of . In particular, this choice implies that every edge of the form in is directed in .

Note that, by section 3, and are distinct maximal cliques in . Thus, there must exist an that is not a parent of in . Further, since , all vertices of appear before in . Thus, that is not a parent of in is also not adjacent to in D. Further, by the choice of , the edge is directed in . Thus, we have the induced subgraph in . However, according to item 2 of fig. 1, such a graph cannot appear as an induced subgraph of an -essential graph , and we have therefore reached the desired contradiction. It follows that has no undirected edges, and is therefore the same as . ∎

Using again the fact that it is necessary and sufficient to separately orient each of the chordal chain components of an MEC in order to fully orient an MEC, the following result for general DAGs follows immediately from section 3.1, and implies that the lower bound for general DAGs is also tight up to a factor of (the proof is provided in Section B.4). Let be an arbitrary DAG on nodes and let and be as in the notation of fig. 2. Then, there is a set of atomic interventions of size at most that fully orients .

Essentially the same argument as that used for proving Section 3.1 gives the following result, which in many cases improves upon the bound in the above two theorems. The proof of this result is provided in Section B.5. Let be an DAG on nodes without v-structures and let be a topological ordering of that satisfies the clique block property P1 of section 3. Suppose that has nodes, and let , be as in P1 of section 3. Then, there is a set of atomic interventions of size at most that fully orients .

### 3.2 Comparison with Known Lower Bounds

To compare our universal lower bound with the universal lower bound of Squires et al. (2020), we start with the following combinatorial lemma, whose proof can be found in Section B.6. Let be an undirected chordal graph on nodes in which the size of the largest clique is . Then, , where is the set of maximal cliques of .

Section 3.2 implies that in chordal graphs which shows that our universal lower bound is always equal to or better than the one by Squires et al. (2020). The proof of section 3.2 makes it apparent that two bounds are close only in very special circumstances. (Split graphs and k-trees are some special families of chordal graphs for which ). We further strengthen this intuition through theoretical analysis of special classes of graphs and via simulations.

#### Examples where our Lower Bound is Significantly Better

We provide two constructions of special classes of chordal graphs in which our universal lower bound is times the lower bound by Squires et al. (2020) for any . Further discussion of such examples can be found in Section C.

Construction 1. First, we provide a construction by Shanmugam et al. (2015) for graphs that require about times more number of interventions than their lower bound, where is size of the maximum independent set of the graph. This construction of a chordal graph starts with a line consisting of vertices such that each node is connected to and . For each , has a clique of size which has exactly two nodes from the line . Maximum clique size of is , number of nodes, , and number of maximal cliques, . Thus, for , we have, which implies for .

Construction 2. has cliques of size , with every pair of cliques intersecting at a unique node . The number of nodes in is , maximum clique size is , and number of maximal cliques is , thus, which implies for .

## 4 Empirical Explorations

In this section, we report the results of two experiments on synthetic data. In Experiment , we compare our lower bound with the optimal intervention size for a large number of randomly generated DAGs. Optimal intervention size for a DAG is defined as the size of the smallest set of atomic interventions such that . Next, in Experiment , we compare our universal lower bound with the one in the work of Squires et al. (2020) for randomly generated DAGs with small cliques. These experiments provide empirical evidence that strengthens our result about the tightness of our universal lower bound (Section 3.1) and the constructions presented in Section 3.2

. The experiments use the open source

causaldag (Squires, 2018) and networkx (Hagberg et al., 2008) packages. Further details about the experimental setup for both experiments are given in Section D.

#### Experiment 1

For this experiment, we generate graphs from Erdős-Rényi graph model : for each of these graphs, the number of nodes is a random integer in

and the connection probability

is a random value in . These graphs are then converted to DAGs without v-structures by imposing a random topological ordering and adding extra edges if needed. To compute the optimal intervention size, we check if a subset of nodes, of a DAG is such that , in increasing order of the size of such subsets. Next, we compute the universal lower bound value for each of these DAGs as given in Figure 2. In Figure 3, we plot the optimal intervention size and our lower bound for each of the generated DAGs. Thickness of the points is proportional to the number of points landing at a coordinate. Notice that, all points lie between lines and , as implied by our theoretical results. Further, we can see that, a large fraction of points are closer to the line compared to the line , suggesting that our lower bound is even tighter for many graphs.

#### Experiment 2

For this experiment, we generate random DAGs without v-structures for each size in by fixing a perfect-elimination ordering of the nodes and then adding edges (which are oriented according to the perfect-elimination ordering) to the DAG making sure that there are no v-structures, while trying to keep the size of each clique below . For each DAG, we compute the ratio of the two lower bounds. In Figure 4, we plot each of these ratios in a scatter plot with the -axis representing the number of nodes of the DAG. Thickness of the points is proportional to the number of DAGs having a particular value of the ratio described above. We also plot the average of the ratios for each different value of the number of nodes. We see that our lower bound can sometimes be times of the lower bound of Squires et al. (2020). Moreover, the average ratio has an increasing trend suggesting that our lower bound is much better for this class of randomly generated DAGs.

## 5 Conclusion

We prove a strong universal lower bound on the minimum number of atomic interventions required to fully learn the orientation of a DAG starting from its MEC. For any DAG , by constructing an explicit set of atomic interventions that learns completely (starting with the MEC of ) and has size at most twice of our lower bound for the MEC of , we show that our universal lower bound is tight up to a factor of two. We prove that our lower bound is better than the best previously known universal lower bound (Squires et al., 2020) and also construct explicit graph families where it is significantly better. We then provide empirical evidence that our lower bound may be stronger than what we are able to prove about it: by conducting experiments on randomly generated graphs, we demonstrate that our lower bound is often tighter (than what we have proved), and also that it is often significantly better than the previous universal lower bound (Squires et al., 2020). An interesting direction for future work is to design intervention sets of sizes close to our universal lower bound. We note that in contrast to the earlier work of Squires et al. (2020), whose lower bound proofs were based on new sophisticated constructions, our proof is based on the simpler notion of a ordering, which in turn is inspired from elementary ideas in the theory of chordal graphs. We expect that the notion of orderings may also play an important role in future work on designing optimal intervention sets.

## Acknowledgement

We thank anonymous reviewers for their helpful suggestions to improve our paper.

## Appendix A Proof of Lemma 3 of the Main Paper

Here, we provide the proof of section 3 of the main paper. As stated there, the lemma follows from well-known ideas in the theory of chordal graphs. The following generalization of the definition of the clique block property P1 of section 3 will be useful in the proof.

[-clique block ordering] Let be a DAG and a subset of vertices of . Let be a topological ordering of . Let the elements of be , arranged so that whenever . Define to be the set of nodes which occur before or at the same position as in i.e., . Similarly, for , define to be the set of nodes which occur in before or at the same position as , but strictly after (i.e., ). Then, is said to be an -clique block ordering of if is the set of all vertices of , and for each , is a (not necessarily maximal) clique in D. The following observation is immediate with this definition. Let be a DAG without v-structures, and let be the set of vertices of . Then, a topological ordering of satisfies property P1 of section 3 if and only if is an -clique block ordering of .

###### Proof.

The “if” direction follows from the definition. For the “only if” direction, we note that since every vertex of must be contained in some maximal clique of D, and since for some vertex , it follows that every vertex of must lie in some if satisfies the clique block property P1. ∎

We also note the following simple property of -clique block orderings. Let be a DAG without v-structures, and let be a topological ordering of . Let and be subsets of vertices of such that . If is an -clique block ordering of , then it is also a -clique block ordering of .

###### Proof.

When , there is nothing to prove. Thus, we can assume that there must exist a . We consider the case when . The general case then follows by straightforward induction on the size of .

For , let be as in the definition of the -clique block orderings. Let be the unique index such that . Now, for , define , and for , define . By construction, for and , the are cliques in D. Further define

 LBi(σ) \defeq\inbu∈LAi(σ)\stσ(u)≤σ(b)% , and LBi+1(σ) \defeqLAi(σ)∖LBi(σ).

Again, and are also cliques in since they are subsets of the clique . Further, by construction, . This shows that is also a -clique block ordering. ∎

We can now state the main technical lemma required for the proof of section 3 of the main paper. Let be a DAG without -structures. Let be the set of vertices of . Then, there exists a maximal clique of D with the following two properties:

1. If then . That is, if and , then the edge is not present in .

2. Let be the set of nodes of the induced DAG , where is the set of nodes of . Then is a subset of .

###### Proof.

As already alluded to in the main paper, the proof of item 1 uses ideas that are very similar to the “maximal cardinality search” algorithm for chordal graphs (Tarjan and Yannakakis (1984), see also Corollary 2 of Wienöbst et al. (2021)). Fix an arbitrary topological ordering of , and let be the top vertex in . Note that has no parents in , so any vertices adjacent to in are children of . Let be the set of these children of in . If is empty, then is isolated in and we are done with the proof of item 1 after taking . So, assume that is not empty, and let its elements be , arranged so that whenever . Now, define the sets , where as follows. First, . For ,

 C′i\defeq{C′i−1∪\inbciif C′i−1⊆\pa[D]ci,C′i−1otherwise. (2)

Define . Note that by construction, is a clique. Note also the following property of this construction: for any , if and only if there exists such that and is not adjacent to in .

We now claim that is also a maximal clique. For, if not, let be such that is adjacent to every vertex in . Then, we must have for some (since , and only the children of are adjacent to in ). But then, since , there must exist some , , such that is not adjacent to , which is a contradiction to the assumption of being adjacent to every vertex of .

We now claim that if , then for all , the edge is not present in . Suppose, if possible, that there exist and such that is present in . By the choice of as a top vertex in a topological order, we must have . Thus, must be a child of in . Suppose , for some . Then, must also be a child of , for otherwise would be a v-structure in . Thus, for some . Since , there exists some such that and and are not adjacent. But then is a v-structure in , so we again get a contradiction. This proves item 1 of the lemma for the clique .

Item 2 of the lemma trivially follows if is empty, therefore, we are interested in the case when is non-empty. Now consider the induced DAG . Since has no v-structures, neither does . Thus, by section 3 of the main paper, the nodes of and the maximal cliques of H are in one-to-one correspondence: for each maximal clique of H, there is a unique vertex of such that .

Consider now a vertex of . There exists then a maximal clique of H such that . Also, since is an induced subgraph of , there must exist a maximal clique of D such that . In fact, we must further have , for otherwise would not be a maximal clique of . Let . We will show that . Note first that we cannot have , for then, by item 1, would be contained in , and would not therefore contain . Thus, must be a node in . But then implies that must in fact be in , and must therefore be equal to . We thus see that any vertex of is also a vertex of . Item 2 of the lemma then follows by noting that is the only vertex of not contained in . ∎

We are now ready to prove section 3 of the main paper.

###### Proof of section 3 of the main paper.

We prove this claim by induction on the number of nodes in . The claim of the lemma is trivially true when has only one node. Now, fix , and assume the induction hypothesis that every DAG without v-structures which has at most nodes admits a topological ordering that satisfies the clique block property P1 of section 3. We will complete the induction by showing that if is a DAG without v-structures which has nodes, then also admits a topological ordering that satisfies the clique block property P1 of section 3.

Let the maximal clique of be as guaranteed by appendix A above. If all the nodes of are contained in , then the total ordering on the vertices of the clique in trivially satisfies the clique block property. Therefore, we assume henceforth that is non-empty. Thus, the induced DAG is a DAG on at most nodes. Let be the set of nodes of , and let be the set of nodes of . By the induction hypothesis, has a topological ordering which satisfies the clique block property. Equivalently, by appendix A, is an -clique block ordering of .

Consider now the ordering of obtained by listing first the vertices of the clique in the total order imposed on them by the DAG , followed by the vertices of in the order specified by . By item 1 of appendix A, there is no directed edge in from a vertex in to a vertex in , so we get that is in fact a topological ordering of .

Define . We now observe that is a -clique block ordering of , with and , for . By item 2 of appendix A, we have . Thus, by appendix A, is also an -clique block ordering of , and therefore (by appendix A) satisfies the clique block property P1 of section 3. ∎

## Appendix B Other Omitted Proofs

### b.1 Proof of section 3

###### Proof of section 3 of main paper.

Let be a maximal clique of . Since the induced subgraph is a DAG, there is at least one node in with out-degree . Thus, for all we have, , which implies that