 # Counting and Uniform Sampling from Markov Equivalent DAGs

We propose an exact solution for the problem of finding the size of a Markov equivalence class (MEC). For the bounded degree graphs, the proposed solution is capable of computing the size of the MEC in polynomial time. Our proposed approach is based on a recursive method for counting the number of the elements of the MEC when a specific vertex is set as the source variable. We will further use the idea to design a sampler, which is capable of sampling from an MEC uniformly in polynomial time.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

A directed acyclic graph (DAG) is a commonly used graphical model to represent causal relationships among a set of variables [Pea09]. In a DAG representation, a directed edge indicates that variable is a direct cause of variable relative to the considered variable set. Such a representation has numerous applications in fields ranging from biology [SPP05] and genetics [ZGB13]

[PJS17, KF09].

The general approach to learning a causal structure is to use statistical data from variables to find a DAG, which is maximally consistent with the conditional independencies estimated from data. This is due to the fact that under Markov property and faithfulness assumption, d-separation of variables in a DAG is equivalent to conditional independencies of the variables in the underlying joint probability distribution

[SGS00, Pea09]. However, a DAG representation of a set of conditional independencies in most cases is not unique. This restricts the learning of the structure to the Markov equivalences of the underlying DAG. The set of Markov equivalent DAGs is referred to as a Markov equivalence class (MEC). A MEC is commonly represented by a mixed graph called essential graph, which can contain both directed and undirected edges [SGS00].

In general, there is no preference amongst the elements of MEC, as they all represent the same set of conditional independencies. Therefore, for a given dataset (or a joint probability distribution), the size of the MEC, i.e., the number of its elements, can be seen as a metric for measuring the causal complexity of the underlying structure–this complexity indicates how many interventional experiments or how much additional information (e.g., knowledge about the causal mechanisms) is needed to further fully learn the DAG structure, which was only partially recovered from mere observation. Another important application of the size of the MEC is the following: As mentioned earlier, one can compare different DAGs to find the most consistent one with a given joint distribution over the variables. Here, the space of DAGs, or the space of MECs can be chosen as the search space. Chickering showed that when the size of the MECs are large, searching over the space of MECs is more efficient

[Chi02]. Hence, knowing the size of the MEC can help us decide the search space.

For the general problem of enumerating MECs, Steinsky proposed recursive enumeration formulas for the number of labelled essential graphs, in which the enumeration parameters are the number of vertices, chain components, and cliques [Ste13]

. This approach is not focused on a certain given MEC. For the problem of finding the size of a given MEC, an existing solution is to use Markov chain methods

[HJY13, BT17]. According to this method, a Markov chain is constructed over the elements of the MEC whose properties ensure that the stationary distribution is uniform over all the elements. The rate of convergence and computational issues hinders practical application of Markov chain methods. Recently, an exact solution for finding the size of a given MEC was proposed [HJY15], in which the main idea was noting that sub-classes of the MEC with a fixed unique root variable partition the class. The authors show that there are five types of MECs whose sizes can be calculated with five formulas, and for any other MEC, it can be partitioned recursively into smaller sub-classes until the sizes of all subclasses can be calculated from the five formulas. An accelerated version of the method in [HJY15] is proposed in [HY16], which is based on the concept of core graphs.

In this paper, we propose a new counting approach, in which the counting is performed on the clique tree representation of the graph. Compared to [HJY15], our method provides us with a more systematic way for finding the orientation of the edges in a rooted sub-class, and enables us to use the memory in an efficient way in the implementation of the algorithm. Also, using clique tree representation enables us to divide the graph into smaller pieces, and perform counting on each piece separately (Theorem 2). We will show that for bounded degree graphs, the proposed solution is capable of computing the size of the MEC in polynomial time. The counting technique can be utilized for two main goals: (a) Uniform sampling, and (b) Applying prior knowledge. As will be explained, for these goals, it is essential in our approach to have the ability of explicitly controlling the performed orientations in the given essential graph. Therefore, neither the aforementioned five formulas presented in (He, Jia, and Yu 2015), nor the accelerated technique in (He and Yu 2016) are suitable for these purposes.

(a) Uniform sampling: In Section 4, we show that our counting technique can be used to uniformly sample from a given MEC. This can be utilized in many scenarios; followings are two examples. 1. Evaluating effect of an action in a causal system. For instance, in a network of users, we may be interested in finding out conveying a news to which user leads to the maximum spread of the news. This question could be answered if the causal structure is known, but we often do not have the exact causal DAG. Instead, we can resort to uniformly sampling from the corresponding MEC and evaluate the effect of the action on the samples. 2. Given a DAG from a MEC, there are simple algorithms for generating the essential graph corresponding to the MEC [Mee95, AMP97, Chi02]. However, for MECs with large size, it is not computationally feasible to form every DAG represented by the essential graph. In [HH09] the authors require to evaluate the score of all DAGs in MEC to find the one with the highest score. Evaluating scores on uniform samples provides an estimate for the maximum score.

(b) Applying prior knowledge: In Section 5, we will further extend our counting and sampling techniques to the case that some prior knowledge regarding the orientation of a subset of the edges in the structure is available. Such prior knowledge may come from imposing a partial ordering of variables before conducting causal structure learning from both observational and interventional data [SSG98, HB12, WSYU17], or from certain model restrictions [HHS12, REB18, ENM17]. We will show that rooted sub-classes of the MEC allow us to understand whether a set of edges with unknown directions can be oriented to agree with the prior, and if so, how many DAGs in the MEC are consistent with such an orientation. We present the following two applications: 1. The authors in [NMR17] proposed a method for estimating the causal effect of joint interventions from observational data. Their method requires extracting possible valid parent sets of the intervention nodes from the essential graph, along with their multiplicity. They proposed the joint-IDA method for this goal, which is exponential in the size of the chain component of the essential graph, and hence, becomes infeasible for large components. In Section 5, we show that counting with prior knowledge can solve this issue of extracting possible parent sets. 2. In Section 6, we provide another specific scenario in which our proposed methodology is useful. This application is concerned with finding the best set of variables to intervene on when we are restricted to a certain budget for the number of interventions [GSKB18].

## 2 Definitions and Problem Description

For the definitions in this section, we mainly follow Andersson [AMP97]. A graph is a pair , where is a finite set of vertices and , the set of edges, is a subset of . An edge whose opposite is called an undirected edge and we write . An edge whose opposite is called a directed edge, and we write . A graph is called a chain graph if it contains no partially directed cycles. After removing all directed edges of a chain graph, the components of the remaining undirected graph are called the chain components of the chain graph. A v-structure of is a triple , with induced subgraph . Under Markov condition and faithfulness assumptions, a directed acyclic graph (DAG) represents the conditional independences of a distribution on variables corresponding to its vertices. Two DAGs are called Markov equivalent if they represent the same set of conditional independence relations. The following result due to [VP90] provides a graphical test for Markov equivalence:

###### Lemma 1.

[VP90] Two DAGs are Markov equivalent if and only if they have the same skeleton and v-structures.

We denote the Markov equivalence class (MEC) containing DAG by . A MEC can be represented by a graph , called essential graph, which is defined as . We denote the MEC corresponding to essential graph by . Essential graphs are also referred to as completed partially directed acyclic graphs (CPDAGs) [Chi02], and maximally oriented graphs [Mee95] in the literature. [AMP97] proposed a graphical criterion for characterizing an essential graph. They showed that an essential graph is a chain graph, in which every chain component is chordal. As a corollary of Lemma 1, no DAG in a MEC can contain a v-structure in the subgraphs corresponding to a chain component.

We refer to the number of elements of a MEC by the size of the MEC, and we denote the size of by . Let denote the chain components of . can be calculated from the size of chain components using the following equation [GP02, HG08]:

 Size(G∗)=c∏i=1Size(Gi). (1)

This result suggests that the counting can be done separately in every chain component. Therefore, without loss of generality, we can focus on the problem of finding the size of a MEC for which the essential graph is a chain component, i.e., an undirected connected chordal graph (UCCG), in which none of the members of the MEC are allowed to contain any v-structures.

In order to solve this problem, the authors of [HJY15] showed that there are five types of MECs whose sizes can be calculated with five formulas, and that for any other MEC, it can be partitioned recursively into smaller subclasses until the sizes of all subclasses can be calculated from the five formulas. We explain the partitioning method as it is relevant to this work as well.

Let be a UCCG and be a DAG in . A vertex is the root in if its in-degree is zero.

###### Definition 1.

Let UCCG be the representer of a MEC. The -rooted sub-class of the MEC is the set of all -rooted DAGs in the MEC. This sub-class can be represented by the -rooted essential graph .

For instance, for UCCG in Figure 1, and are depicted in Figures 1 and 1, respectively.

###### Lemma 2.

[HJY15] Let be a UCCG. For any , the -rooted sub-class is not empty and the set of all -rooted sub-classes partitions the set of all DAGs in the MEC.

From Lemma 2 we have

 Size(G)=∑v∈V(G)Size(G(v)). (2)

Hence, using equations (1) and (2), we have

 Size(G∗)=c∏i=1∑v∈V(Gi)Size(G(v)i). (3)

He et al. showed that is a chain graph with chordal chain components, and introduced an algorithm for generating this essential graph. Therefore, using equation (3), can be obtained recursively. The authors of [HJY15] did not characterize the complexity, but reported that their experiments suggested when the number of vertices is small or the graph is sparse, the proposed approach is efficient.

## 3 Calculating the size of a MEC

In this section, we present our proposed method for calculating the size of the MEC corresponding to a given UCCG. We first introduce some machinery required in our approach for representing chordal graphs via clique trees. The definitions and propositions are mostly sourced from [BP93, VA15].

For a given UCCG, , let denote the set containing the maximal cliques of , and let be a tree on , referred to as a clique tree.

###### Definition 2 (Clique-intersection property).

A clique tree satisfies the clique-intersection property if for every pair of distinct cliques , the set is contained in every clique on the path connecting and in the tree.

###### Proposition 1.

[BP93] A connected graph is chordal if and only if there exists a clique tree for , which satisfies the clique-intersection property.

###### Definition 3 (Induced-subtree property).

A clique tree satisfies the induced-subtree property if for every vertex , the set of cliques containing induces a subtree of , denoted by .

###### Proposition 2.

[BP93] The clique-intersection and induced-subtree properties are equivalent.

Since we work with a UCCG, by Propositions 1 and 2, there exists a clique tree on , which satisfies the clique-intersection and induced-subtree properties. Efficient algorithms for generating such a tree is presented in [BP93, VA15]. In the sequel, whenever we refer to clique tree , we assume it satisfies the clique-intersection and induced-subtree properties. Note that such a tree is not necessarily unique, yet we fix one and perform all operations on it. For the given UCCG, similar to [HJY15], we partition the corresponding MEC by its rooted sub-classes, and calculate the size of each sub-class separately. However, we use the clique tree representation of the UCCG for counting. Our approach enables us to use the memory in the counting process to make the counting more efficient, and provides us with a systematic approach for finding the orientations.

For the -rooted essential graph , we arbitrarily choose one of the cliques containing as the root of the tree , and denote this rooted clique tree by . Setting a vertex as the root in a tree determines the parent of all vertices. For clique in , denote its parent clique by . Following [VA15], we can partition each non-root clique into a separator set , and a residual set . For the root clique, the convention is to define , and . The induced-subtree property implies the following result:

###### Proposition 3.

[VA15] Let be the given UCCG, and let be the rooted clique tree,
(i) The clique residuals partition ,
(ii) For each , denote the clique for which is in its residual set by , and the induced subtree of cliques containing by . is the root of . The other vertices of are the cliques that contain in their separator set.
(iii) The clique separators , where ranges over all non-root cliques, are the minimal vertex separators of .

Note that sets and depend on the choice of root. However, all clique trees have the same vertices (namely, maximal cliques of ) and the same clique separators (namely, the minimal vertex separators of ). Also, since there are at most cliques in a chordal graph with vertices, there are at most edges in a clique tree and, hence, at most minimal vertex separators.

With a small deviation from the standard convention, in our approach, for the root clique of , we define , and . Recall from Proposition 3 that for each , denotes the clique for which is in its residual set, and this clique is unique. We will need the following results for our orientation approach. All the proofs are provided in the appendix.

If , then .

###### Corollary 1.

If , then .

Lemma 3 states that parents of any vertex are elements of . But, not all elements of are necessarily parents of . Our objective is to find a necessary and sufficient condition to determine the parents of a vertex.

###### Lemma 4.

If , then for every vertex , .

Lemma 4 implies that for every clique , any vertex either has directed edges to all elements of in , or has no directed edges to any of the elements of . For clique , we define the emission set, , as the subset of which has directed edges to all elements of in . Lemmas 3 and 4 lead to the following corollary:

###### Corollary 2.

is the set of parents of in .

This means that the necessary and sufficient condition for to be a parent of is . Therefore, for any clique , we need to characterize its emission set. We will show that the is the subset of which satisfies the following emission condition:

###### Definition 4 (Emission condition for a vetex).

We say vertex satisfies the emission condition in clique if , and .

As a convention, we assume that the root variable satisfies the emission condition in all the cliques containing it.

###### Theorem 1.

if and only if satisfies the emission condition in clique in .

Note that for a clique , in order to find using Definition 4, we need to learn the emission set of some cliques on the higher levels of the tree. Hence, the emission sets must be identified on the tree from top to bottom.

After setting vertex as the root, Theorem 1 allows us to find the orientation of directed edges in as follows. First, we form . Then, for each , starting from top to bottom of , we identify all vertices which satisfy the emission condition to obtain . Finally, in each clique , we orient the edges from all the variables in towards all the variables in .

###### Example 1.

Assume the UCCG in Figure 1 is the given essential graph. Setting vertex as the root of (by symmetry, is similar), the corresponding clique tree is shown in Figure 1, where in each clique, the first and the second rows represent the separator and the residual sets, respectively. In this clique tree, we obtain , and . Hence, the directed edges are , , , and . This results in in Figure 1, which is an essential graph with a single chain component , (Figure 1). Setting vertex as the root (by symmetry, is similar), the corresponding clique tree is shown in Figure 1. In this clique tree, and hence, the directed edge is . This results in a directed graph, thus, . Similarly, . Therefore, from (2), we have . Similarly, we have .

Setting vertex as the root of (by symmetry, is similar), the corresponding clique tree is shown in Figure 1. In this clique tree, , and hence, the directed edges are , , and . shown in Figure 1 is the essential graph with a single chain component , shown in Figure 1. Setting vertex as the root, the corresponding clique tree is shown in Figure 1. In this clique tree and . Hence, the directed edges are and . The result is a directed graph, thus, . Similarly, and . Therefore, from (2), we have . Similarly, we have .

Finally, using equation (2), we obtain that .

### 3.1 Algorithm

In this subsection, we present an efficient approach for the counting process. In a rooted clique tree , for any clique , let be the maximal subtree of with as its root, and let . Also, for vertex sets , let be the set of edges with one end point in and the other end point in .

###### Lemma 5.

For any clique , is an edge cut.

We need the following definition in our algorithm:

###### Definition 5 (Emission condition for a clique).

Clique satisfies the emission condition if .

###### Remark 1.

Theorem 1 implies that clique satisfies the emission condition if and only if all elements in satisfy the emission condition in .

In the recursive approach, once the algorithm finds all the directed edges in , it removes all the oriented edges for the next stage and restarts with the undirected components, i.e., the edges that it removes are the directed edges. Therefore, we require an edge cut in which all the edges are directed. This is satisfied by cliques with emission condition:

###### Theorem 2.

If clique satisfies the emission condition, is a directed edge cut.

Therefore, by Theorem 2, if clique satisfies the emission condition, in the clique tree, we can learn the orientations in trees and separately. This property helps us to perform our counting process efficiently by utilizing the memory in the process. More specifically, if rooted clique trees and share a rooted subtree whose root clique satisfies the emission condition, it suffices to perform the counting in this subtree only once. Based on this observation, we propose the counting approach whose pseudo-code is presented in Algorithm 1.

The input to Algorithm 1 is an essential graph , and it returns by computing equation (1), through calling function for each chain component of . In Function , first the clique tree corresponding to the input UCCG is constructed. If this tree has not yet appeared in the memory, for every vertex of the input UCCG, the function forms and calculates the number of -rooted DAGs in the MEC by calling the rooted-size function (lines 10-13). Finally, it saves and returns the sum of the sizes.

Function checks whether each clique of each level of the input rooted tree satisfies the emission condition (lines 7). If so, it removes the rooted subtree from the tree and the corresponding subgraph from (lines 8-10), and checks whether the size of with its current separator set is already in the memory. If it is, the function loads it as (lines 12); else, calls itself on the rooted clique tree to obtain , and then saves in the memory (lines 15 and 16). If the clique does not satisfy the emission condition, it simply orients edges from to in (lines 19). Finally, in the resulting essential graph, it calls the function for each chain component (lines 24).

For bounded degree graphs, the proposed approach runs in polynomial time:

###### Theorem 3.

Let and be the number of vertices and maximum degree of a graph . The computational complexity of MEC size calculator on is in the order of .

###### Remark 2.

From Definition 4, it is clear that if , then satisfies the emission condition in clique . We can use this property for locally orienting edges without finding emission set .

### 3.2 Simulation Results

We generated random UCCGs of size with as the number of edges based on the procedure proposed in [HJY15], where parameter controls the graph density. We compared the proposed algorithm with the counting algorithm in [HJY15] in Table 1. Note that, as we mentioned earlier, since the counting methods are utilized for the purpose of sampling and applying prior knowledge, the five formulas in [HJY15] are not implemented in either of the counting algorithms. The parameters and denote the average run time (in seconds) of the proposed algorithm and the counting algorithm in [HJY15], respectively. For dense graphs, our algorithm is at least 60 times faster.

## 4 Uniform Sampling from a MEC

In this section, we introduce a sampler for generating random DAGs from a MEC. The sampler is based on the counting method presented in Section 3. The main idea is to choose a vertex as the root according to the portion of members of the MEC having that vertex as the root, i.e., in UCCG , vertex should be picked as the root with probability .

The pseudo-code of our uniform sampler is presented in Algorithm 2, which uses functions Size and RS of Section 3.1. The input to the sampler is an essential graph , with chain components . For each chain component , we set as the root with probability , where , and then we orient the edges in as in Algorithm 1. We remove and add the created chain components to , and repeat this procedure until all edges are oriented, i.e., .

###### Example 2.

For the UCCG in Figure 1, as observed in Example 1, , , and . Therefore, we set vertices , , , and as the root with probabilities , , , and , respectively. Suppose is chosen as the root. Then as seen in Example 1, . Therefore, in , we set either of the vertices as the root with equal probability to obtain the final DAG.

###### Theorem 4.

The sampler in Algorithm 2 is uniform.

For bounded degree graphs, the proposed sampler is capable of producing uniform samples in polynomial time.

###### Corollary 3.

The computational complexity of the uniform sampler is in the order of .

## 5 Counting and Sampling with Prior Knowledge

Although in structure learning from observational data the orientation of some edges may remain unresolved, in many applications, an expert may have prior knowledge regarding the direction of some of the unresolved edges. In this section, we extend the counting and sampling methods to the case that such prior knowledge about the orientation of a subset of the edges is available. Specifically, we require that in the counting task, only DAGs which are consistent with the prior knowledge are counted, and in the sampling task, we force all the generated sample DAGs to be consistent with the prior knowledge. Note that the prior knowledge may not be necessarily realizable, that is, there may not exist a DAGs in the corresponding MEC with the required orientations. In this case, the counting should return zero, and the sampling should return an empty set.

### 5.1 Counting with Prior Knowledge

We present the available prior knowledge in the form of a hypothesis graph . Consider an essential graph . For , we call a hypothesis realizable if there is a member of the MEC with directed edges consistent with the hypothesis. In other words, a hypothesis is realizable if the rest of the edges in can be oriented without creating any v-structures or cycles. More formally:

###### Definition 6.

For an essential graph , a hypothesis graph is called realizable if there exists a DAG in , for which .

###### Example 3.

For essential graph , the hypothesis graph is not realizable, as the edge cannot be oriented without forming a v-structure.

For essential graph , let denote the number of the elements of , which are consistent with hypothesis , i.e., . Hypothesis is realizable if .

As mentioned earlier, each chain component of a chain graph contains exactly one root variable. We utilize this property to check the realizability and calculate for a hypothesis graph . Consider essential graph with chain components . Following the same line of reasoning as in equation (1), we have

 SizeH(G∗)=c∏i=1SizeH(Gi). (4)

Also, akin to equation (2), for any ,

 SizeH(G)=∑v∈V(G)SizeH(G(v)). (5)

Therefore, in order to extend the pseudo code to the case of prior knowledge, we modify functions and to get as an extra input. In our proposed pseudo code, the orientation task is performed in lines 2 and 19 of Function . Let be the set of directed edges of form , oriented in either line 2 or 19. In function , after each of lines 2 and 19, we check the following:

if  then
Return: 0
end if

This guarantees that, any DAG considered in the counting will be consistent with the hypothesis .

###### Example 4.

Consider the three hypothesis graphs in Figure 2 for the essential graph in Figure 1. For hypothesis , , , and . Therefore, we have three DAGs consistent with hypothesis , i.e., . For hypothesis , , and , Therefore, four DAGs are consistent with hypothesis , i.e., . Hypothesis is not realizable.

One noteworthy application of checking the realizability of a hypothesis is in the context of estimating the causal effect of interventions from observational data [MKB09, NMR17]. This could be used for instance, to predict the effect of gene knockouts on other genes or some phenotype of interest, based on observational gene expression profiles. The authors of [MKB09, NMR17] proposed a method called (joint-)IDA for estimating the average causal effect, which as a main step requires extracting possible valid parent sets of the intervention nodes from the essential graph, with the multiplicity information of the sets. To this end, a semi-local method was proposed in [NMR17], which is exponential in the size of the chain component of the essential graph. This renders the approach infeasible for large components. Using our proposed method to address this problem, we can fix a configuration for the parents of the intervention target, and count the number of consistent DAGs.

### 5.2 Sampling with Prior Knowledge

Suppose an experimenter is interested in generating sample DAGs from a MEC. However, due to her prior knowledge, she requires the generated samples to be consistent with a given set of orientations for a subset of the edges. In this subsection, we modify our uniform sampler to apply to this scenario. We define the problem statement formally as follows. Given an essential graph and a hypothesis graph for , we are interested in generating samples from the such that each sample is consistent with hypothesis . Additionally, we require the distribution of the samples to be uniform conditioned on being consistent. That is, for each sample DAG ,

 P(D)=⎧⎪⎨⎪⎩1SizeH(G∗),if E(D)⊆E(H),0,otherwise.

Equations (4) and (5) imply that we can use a method similar to the case of the uniform sampler. That is, we choose a vertex as the root according to the ratio of the DAGs which are consistent with and have the chosen vertex as the root, to the total number of consistent DAGs. More precisely, in UCCG , vertex should be picked as the root with probability . In fact, the uniform sampler could be viewed as a special case of sampler with prior knowledge with . Hence, the results related to the uniform sampler extend naturally.

###### Example 5.

Consider hypothesis graph in Figure 2 for the essential graph in Figure 1. As observed in Example 4, we have , , , and . Therefore, we set vertices , , , and as the root with probabilities , , , and , respectively.

## 6 Application to Intervention Design

In this section, we demonstrate that the proposed method for calculating the size of MEC with prior knowledge can be utilized to design an optimal intervention target in experimental causal structure learning. We will use the setup in [GSKB18] which is as follows: Let be the given essential graph, obtained from an initial stage of observational structure learning, and let be our intervention budget, i.e., the number of interventions we are allowed to perform. Each intervention is on only a single variable and the interventions are designed passively, i.e., the result of one intervention is not used for the design of the subsequent interventions. Let denote the intervention target set, which is the set of vertices that we intend to intervene on. Note that since the experiments are designed passively, this is not an ordered set. Intervening on a vertex resolves the orientation of all edges intersecting with [EGS05], and then we can run Meek rules to learn the maximal PDAG [PKM17]. Let be the number of edges that their orientation is resolved had the ground truth underlying DAG been , and let be the average of over the elements of the MEC, that is

 R(I)=1Size(G∗)∑D∈MEC(G∗)R(I,D). (6)

The problem of interest is finding the set with , that maximizes .

In [GSKB18], it was proved that is a sub-modular function and hence, a greedy algorithm recovers an approximation to the optimum solution. Still, calculating for a given remains as a challenge. For an intervention target candidate, in order to calculate , conceptually, we can list all DAGs in the MEC and then calculate the average according to (6). However, for large graphs, listing all DAGs in the MEC is computationally intensive. Note that the initial information provided by an intervention is the orientation of the edges intersecting with the intervention target. Hence, we propose to consider this information as the prior knowledge and apply the method in Section 5.

Let be the set of hypothesis graphs, in which each element has a distinct configuration for the edges intersecting with the intervention target. If the maximum degree of the graph is , cardinality of is at most , and hence, it does not grow with . For a given hypothesis graph , let denote the set of members of the MEC, which are consistent with hypothesis . Using the set , we can break (6) into two sums as follows:

 R(I) =1Size(G∗)∑D∈MEC(G∗)R(I,D) (7) =1Size(G∗)∑H∈H∑D∈G∗HR(I,D) =∑H∈HSizeH(G∗)% Size(G∗)R(I,D).

Therefore, we only need to calculate at most values instead of considering all elements of the MEC, which reduces the complexity from super-exponential to constant in .

### 6.1 Simulation Results

An alternative approach to calculating is to estimate its value by evaluating uniform samples. We generated 100 random UCCGs of size with edges, where . In each graph, we selected two variables randomly to intervene on. We obtained the exact using equation (7). Furthermore, for a given sample size , we estimated

from the aforementioned Monte-Carlo approach using our proposed uniform sampler and obtained empirical standard deviation of error (SDE) over all graphs with the same size, defined as

. Figure 3 depicts SDE versus the number of samples. As can be seen, SDE becomes fairly low for sample sizes greater than .

## 7 Conclusion

We proposed a new technique for calculating the size of a MEC, which is based on the clique tree representation of chordal graphs. We demonstrated that this technique can be utilized for uniform sampling from a MEC, which provides a stochastic way to enumerate DAGs in the class, which can be used for estimating the optimum DAG, most suitable for a certain desired property. We also extended our counting and sampling method to the case where prior knowledge about the structure is available, which can be utilized in applications such as causal intervention design and estimating the causal effect of joint interventions.

## Appendix A Proof of Lemma 3

Let denote the distance between vertices and in . The following result from [BT17] is used in our proof. In the proof always denotes the root vertex.

###### Lemma 6.

[BT17] In an acyclic and v-structure-free orientation of a UCCG the root variable determines the orientation of all edges , for which .

We partition vertices based on their distance from the root variable, and call each part a level. Note that clearly, there is no edge from level to , for , otherwise, the vertex in level should be moved to level . Based on Lemma 6, the direction of edges in between the levels (mid-level edges) will be determined to be away from the root. After determining the mid-level edges, we should check for the direction of in-level edges as well. We will show that the statement of Lemma 3 holds for both mid-level and in-level edges.

• is a mid-level edge:

Proof by induction:

Induction base: We need to show that for any vertex , . Let be the root clique. By definition, . For any vertex , by definition of , is adjacent to the root, and hence, there exists a clique such that . Therefore, by the clique-intersection property, is contained in every clique on the path connecting and . Specifically, should be contained in , as is the root of the subtree , i.e., , otherwise, we will have a cycle in the tree. Noting that by our convention, only appears in separator sets, concludes that . Therefore, the base of the induction is clear.

Induction hypothesis: As the induction hypothesis, we assume that for any variable and , such that , we have .

Induction step: Assume that and , such that . We need to show that .

.

###### Proof.

Variable is non-adjacent to any variable in , . Therefore, for all , . On the other hand, should have a parent in . Therefore, there exists , such that . By induction hypothesis, . Therefore, since , we have . ∎

Since and are adjacent, there exists a clique , such that .

###### Claim 2.

If and , then .

###### Proof.

By the induced-subtree property, since is the root of subtree , the path from to , passes through . Similarly, the path from to , passes through . If , then the two aforementioned paths should be distinct, which results in a cycle in the tree, which is a contradiction. ∎

By Claim 2, . Among all cliques that contain , by Proposition 3, is only in the residual set of , and by Claim 1, ; therefore . Therefore, .

• is an in-level edge: Using the proof of Theorem 6 in [HG08], if , it should have been directed according to one of two possible rules: There exists vertex such that induced on is either (1) , or (2) and . We show that the later case is not possible. This is similar to a claim in the proof of Theorem 8 in [HJY15].
Proof by contradiction: Suppose is the first edge in level oriented by rule (2), that is, the other previously oriented edges are oriented via rule (1). Therefore, for the direction of edge , there should be in or , such that , and not adjacent to . should also be adjacent to , otherwise, we would learn from rule (1), not rule (2). Then to avoid cycle , should be oriented as . But this directed edge will make a v-structure with , which is a contradiction. Therefore, we only need rule (1) for orientations.

In order to orient edges using rule (1), we can apply the mid-level orienting method recursively. That is, we can consider the subgraph induced on a level and any vertex from the previous level as the root, and orient the mid-level edges for the new root.

###### Claim 3.

Using rule (1) recursively is equivalent to applying the mid-level orienting method recursively.

This claim is clear because if there exists induced subgraph in on , since in applying the mid-level orienting method recursively every vertex becomes root once, at the time that becomes root, the edge will fall in mid-levels. Therefore, it will be oriented away from , i.e., it will be oriented as . Also, clearly we are not orienting any extra edges, as Lemma 6, is merely based on rule (1).

Therefore, by the previous part of the proof, the statement of Lemma 3 holds for this oriented edges as well. Applying this reasoning recursively concludes the desired result for all in-level edges.

## Appendix B Proof of Corollary 1

By Proposition 3, for any vertex , is the root of . That is, among the cliques containing , is located in the highest level (in terms of the distance from the root if the tree).

Proof by contradiction. By Lemma 3, in . That is, is in a strictly lower level than in the clique tree. Now if , should be in a strictly higher level than in the clique tree, which is a contradiction.

## Appendix C Proof of Lemma 4

Proof by contradiction. Suppose there exists , such that . As mentioned in Section 2, [HJY15] showed that is a chain graph with chordal chain components. Therefore, by the definition of chain graph, there should not be a partially directed cycle in this graph. Therefore, in order to prevent a partially directed cycle on , we should have . Therefore, by Lemma 3, , and by Proposition 3, is unique. Therefore, this is in contradiction to the assumption of the lemma.

## Appendix D Proof of Corollary 2

By definition of the emissio set, . To prove the opposite direction, suppose there exists vertex , such that , but . Then by Lemma 3, , and by Proposition 3, is unique. Now, since and , by Lemma 4, . This implies that . Therefore, .

## Appendix E Proof of Theorem 1

For the only if side, we need to show that (1) , and (2) . (1) is obtained from Lemma 3. We prove (2) by contradiction: Suppose . Then by Corollary 2, . Since , as seen in the proof of Lemma 3, this edge should have been directed from the induced subgraph , for some vertex . But since , every such vertex is in , and hence, is adjacent to . Therefore, the induced subgraph cannot exist.

For the if side, we first note that since , they are adjacent. By the assumption,

 Em(Ku)⊈Sep(Kv) ⇒ Pa(u)⊈Pa(v) ⇒ ∃w such that{w→u∈G(r),w→v∉G(r).

If , or , in order to avoid a partially directed cycle on , we should have . Therefore, by Corollary 1, , and by Proposition 3, is unique. Therefore, this is in contradiction to the assumption.
Therefore, and are not adjacent, and hence, we have the induced subgraph . This implies that in order to avoid v-structure, should be oriented as , which is the desired result.

## Appendix F Proof of Lemma 5

By Proposition 3, we can partition the vertices into three sets: , , and . By the clique-intersection property, any vertex in which is not in , will not be contained in , and hence, will not be adjacent with any vertices in . We note that vertices in can only appear in the separator set of . Therefore, the set of vertices in which are not in is .

Therefore, vertices in are not adjacent with any vertex in . Let