 # Proof Supplement - Learning Sparse Causal Models is not NP-hard (UAI2013)

This article contains detailed proofs and additional examples related to the UAI-2013 submission `Learning Sparse Causal Models is not NP-hard'. It describes the FCI+ algorithm: a method for sound and complete causal model discovery in the presence of latent confounders and/or selection bias, that has worst case polynomial complexity of order N^2(k+1) in the number of independence tests, for sparse graphs over N nodes, bounded by node degree k. The algorithm is an adaptation of the well-known FCI algorithm by (Spirtes et al., 2000) that is also sound and complete, but has worst case complexity exponential in N.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Preliminaries

For reference purposes a few basic graphical model concepts, terms and definitions. For details, see e.g., Richardson and Spirtes (2002).

### 1.1 Graphical model terminology

A mixed graph is a graphical model that can contain three types of edges between pairs of nodes: directed (, ), bidirected (), and undirected (). In this paper we only consider graphs with at most one edge between each pair of nodes, and with no node with an edge to itself. If there is an edge in then is a parent of its child , if then and are spouses of each other, and if then they are called neighbours. A path is an ordered sequence of distinct nodes where each successive pair along is adjacent (connected by an edge) in . A directed path is a path of the form . A directed cycle is a directed path from to in combination with a directed edge . A directed acyclic graph (DAG) is a graph that contains only directed edges, but has no directed cycle. The skeleton of a graph is the undirected graph corresponding to the structure of , so that for each edge in there is a undirected edge in . A node is an ancestor of (and a descendant of ) if there is a directed path from to in , or . A vertex is a collider on a path if there are arrowheads at on both edges from and , i.e., if (where the symbol stands for either an arrowhead mark or a tail mark), otherwise it is a noncollider. A trek is a path without colliders.

In a DAG , a path is said to be unblocked relative to a set of vertices , if and only if:

every noncollider on is not in , and

every collider along is an ancestor of , otherwise the path is blocked. We say that a path is blocked by node iff is blocked given , but unblocked relative to . If there exists an unblocked path between and relative to in then and are said to be d-connected given ; if there is no such path then and are d-separated by .

A mixed graph is an ancestral graph (AG) iff an arrowhead at on an edge to implies that there is no directed path from to in , and there are no arrowheads at nodes with undirected edges. As a result, arrowhead marks can be read as ‘is not an ancestor of’, and all DAGs are ancestral. In an ancestral graph a node is said to be anterior to a node if there is a so-called anterior path from to in of the form , possibly with (no undirected part) or with (no directed part), or if . Arrowhead marks in an ancestral graph can therefore also be read as ‘is not anterior to’. When applied to an ancestral graph d-separation is also known as m-separation.111Sometimes m-connected is defined using ‘anterior’ instead of ‘ancestor’ in condition (2) of ‘unblocked’, but as colliders have no undirected edges these two are equivalent. An ancestral graph is maximal (MAG) if for any two non-adjacent vertices there is a set that separates them. A path between two nodes in an ancestral graph is inducing with respect to a set of nodes iff every collider on is ancestor of or , and every noncollider is in . Inducing paths w.r.t  are called primitive.

Throughout the rest of this article, , and represent disjoint (subsets of) nodes (vertices, variables) in a graph, with sets denoted in boldface. The set refers to the nodes adjacent to in an AG , represents the ancestors of in , and the nodes anterior to in . Similar for sets, i.e., implies ; idem for and .

Every (maximal) ancestral graph over nodes corresponds to some underlying causal DAG over variables , where the (possibly empty) sets of unobserved latent variables and selection nodes in have been marginalized and conditioned out, see (Richardson and Spirtes, 2002). We denote the ancestors of in as , where the subscript highlights that the ancestorship relation is with respect to the underlying DAG instead of . Pairs of nodes in that share a common ancestor in the subgraph of over are said to be confounded. Nodes in are said to be subject to selection bias. Both confounding and selection bias can give rise to links between nodes in a MAG, where confounding is associated with bidirected edges and selection bias with undirected edges.

The following helpful properties for reading ancestral information from a (M)AG corresponding to an underlying causal DAG are shown in (Richardson and Spirtes, 2002):

,

,

,

. Conversely, a node subject to selection bias in has no arrowhead in , and a node not subject to selection bias is not part of an undirected edge in . Finally, note that for all , .

The following definition is a special case of the definition in section 4.2.1 of (Richardson and Spirtes, 2002). Given an ancestral graph over nodes , the marginal MAG over nodes has the following edges: are adjacent in if there does not exist a set that -separates in , and in that case the edge in has an arrowhead at if and only if , and has an arrowhead at if and only if .

### 1.2 Ancestral graph properties

We rely on the following connection between in/dependences and (non-)ancestorship in an ancestral graph.

Lemma 2. For disjoint (subsets of) nodes in an ancestral graph ,

1. .

2. .

3. ,

where square brackets indicate a minimal set of nodes.

###### Proof.

See e.g., Corollary to Lemma 14 in (Spirtes et al., 1999), and Lemma 2 in (Claassen and Heskes, 2011). ∎

We use the following result on anteriorship for nodes on unblocked paths:

Lemma 3.13 In an ancestral graph , if is a path m-connecting and given , then every vertex on is in .

###### Proof.

See 3.13 in (Richardson and Spirtes, 2002). ∎

As a result, rule (3) in Lemma 2 not only applies to the nodes in the minimal separating set, but also to all other nodes on the paths in between and that become unblocked given only a subset .

###### Corollary 8.

Let be an ancestral graph, and suppose that . If a path in between and is unblocked given some subset , then all nodes on are in .

###### Proof.

Follows from Lemma 3.13, given that Lemma 2 rule (3) ensures that . ∎

Similarly, we can freely add anterior nodes to any separating set without introducing a dependence:

###### Corollary 9.

In an ancestral graph , if , then .

###### Proof.

Adding the nodes in to the separating set one by one, then by rule (1) in Lemma 2, any node that creates a dependence cannot be anterior to any node in , contrary the assumed. So all added nodes leave the original independence intact, and therefore . ∎

## 2 D-separating sets

This part contains the proofs for section §4.1 in the main article. We start by formalizing some terminology on D-separation:

###### Definition 2.

In a MAG , two nodes and are D-separated by a set of nodes iff:

1. ,

2. .

If D-separates and , then is called a D-sep link, and a node is called a D-sep node for if:

1. ,

2. .

In words: and are D-separated by iff they are d-separated by , and all sets that can separate and contain at least one node . Such a node that cannot be made redundant by nodes adjacent to or is a D-sep node, and the relation between and is called a D-sep link.

To prove Lemma 3 from the main article we first derive a connection between ‘not separable by adjacent nodes’ and non-anteriorship:

###### Lemma 10.

In an ancestral graph , if , but is not independent of given any subset of in , then and is not part of an undirected edge.

###### Proof.

From there is no edge between and in . Let be the set of all nodes in that are adjacent to and have an anterior path to and/or . According to the assumed then , and so there are one or more unblocked paths of the form relative to in (as there is no direct edge). By Lemma 3.13 we know that implies . From and transitivity of ‘anteriorship’ then follows , which combined with the fact that is adjacent to implies .

But given that path is unblocked relative to , node must be a collider along this path with arrowhead in . This means , which leaves . But then also , otherwise (again by transitivity) would still be anterior to in . From the fact that is collider along we know that it is not part of an undirected edge, and so as descendant of also cannot be part of an undirected edge in . ∎

This also applies directly to D-sep links.

Lemma 3. In a MAG , if two nodes and are D-separated by a minimal set , then

,

,

,

and are not part of an undirected edge.

###### Proof.

(1) from the definition of D-separated nodes and Lemma 10 follows that is not anterior to ; but also cannot be anterior to any node in , otherwise by (3) and transitivity/acyclicity it would either still be anterior to , or it would by anterior to itself which would imply a directed cycle (given that (4) implies there cannot be an undirected edge at ); therefore ;

(2) idem for ;

(3) Lemma 2 rule (3), given that is minimal.

(4) follows directly from Lemma 10. ∎

Note that it is possible that one or more nodes in (including D-sep nodes) are part of an undirected edge in .

Next we introduce:

###### Definition 3.

For a set of nodes in an ancestral graph , the set (adjacent anteriors) is defined as .

In the context of D-sep links we usually refer to as the set of adjacent ancestors, as then , by Lemma 3-(4).

With this we can bring D-separation in standard form:

###### Lemma 11.

In a MAG , if two nodes and are d-separated by , then also , with , , , and where all nodes in (possibly empty) are D-sep nodes for .

###### Proof.

We use rules (1)-(3) in Lemma 2 to construct the two sets. First we remove nodes from one-by-one until no more can be removed to obtain a minimal , with . By rule (3), all nodes in are anterior to and/or . By Corollary 9 we obtain , where contains the subset of nodes from that are not adjacent to and/or .

We obtain by eliminating nodes from one by one until no more nodes can be eliminated without destroying the independence, and so then . If is not a D-sep link, then . Finally we can obtain by eliminating superfluous nodes from one by one until no more can be removed without creating a dependence.

By construction the sets and are disjoint. No additional nodes from can be eliminated during/after the process of eliminating nodes from : if can be eliminated only after some node is eliminated, then putting back after is removed should create a dependence, in contradiction with Corollary 9. Therefore, at that point the D-separating set is minimal, i.e., .

All nodes in (if nonempty) satisfy the definition of D-sep node: by construction none of them are adjacent to or , and if there were some subset that could make a node redundant then, by Lemma 2.(2), that subset must be a subset of , and so by Corollary 9 the independence should also be found given with : a contradiction. ∎

Note that neither nor need be uniquely defined for a given D-separated , but may depend on the order in which nodes are removed.

In the proof of Lemma 4 we rely on the fact that for each D-sep link there is a path blocked by a D-sep node of the form depicted in Figure 1, which imposes six identifiable minimal dependence relations in (4)-(6), below:

###### Lemma 12.

In a MAG , if nodes and are D-separable, then there are nodes such that:

1. and in ,

2. and ,

3. and ,

4. and with :
and ,

5. and with :
and ,

6. in , and :
and .

###### Proof.

By Lemma 11 we have , with , and a (sub)set of D-sep nodes not adjacent to and/or . Let and define . Then, but , and so there must be a path that is (only) blocked by noncollider (relative to the other ).

We now show that we can take this path to be of the form in , where all nodes are colliders along and are adjacent to , but only has a bidirected edge to (similar for at ), and is the first node along starting from that is not adjacent to (possibly ), and similarly for . See also Figure 1.

Firstly, all paths between and blocked by a node must be into both and : given that there are no undirected edges to and/or in (Lemma 3), then by Corollary 8 the first node encountered along any such path must be in . But if this path starts with a tail from then necessarily , so that , which in turn implies , in contradiction with Lemma 3. Idem for . Therefore all paths blocked by node , including , must have .

Secondly, all paths between and blocked by a node must go via at least two other nodes resp. , as is presumed to be not adjacent to and/or . As both these nodes satisfy the criteria for they are part of the conditioning set , and so they must be colliders along (otherwise was not needed to block it). The same holds for all subsequent nodes up to and along that are adjacent to and/or . Therefore the path blocked by must have the general form .

Next, starting from , at some point along the first node must be encountered that is not adjacent to (possibly ). Take as the first node encountered along with a bidirected edge to when starting from in the direction of . Then all other, up to nodes between and are colliders along with a directed edge into (again by Lemma 3 and Corollary 8). Similar for some nodes and for . Therefore there exists a path blocked by of the form in , as indicated in Figure 1, with as noncollider along the path. Note that (and similarly for ): if on this is immediate, and if on , then is not part of an undirected edge, which in combination with Corollary 8 leads to .

With generic path we can prove statements (1)-(6), equating with and with :

(1) By construction, we have and along .

(2) By Corollary 8, all nodes along are in . As both and are colliders along this reduces to . The bidirected edge implies , and so: ; vice versa for .

(3) If , then and transitivity would imply , contrary the bidirected edge , and so ; idem for and .

(4) For the in/dependence relations on : given that and are not adjacent, they are separated by some minimal set (not to be confused with or ). By construction, all are part of this set: is needed to block the path . Conditioning on unblocks the path so is also needed, etc., all the way up to and including (but not ). As this holds for any (minimal) set that can separate and , it means there are unblocked paths into from both and given , and so then conditioning on will make and dependent, i.e., . As is a descendant of , it also implies .

(5) Idem and .

(6) Finally, and cannot be adjacent in : they cannot be connected by a bidirected edge, for that would make the path unblocked given ; by (3) they cannot be connected by an edge or ; and they cannot be connected by an undirected edge because they are both colliders along . Therefore and are conditionally independent given some minimal set . For any such minimal separating set , no descendant of or (including and ) can be part of it, for that would imply either or was ancestor of the other. Including or in the conditioning set would make them dependent given that both and have unblocked paths to and given . Therefore, we can find both and . ∎

By Lemma 2, rule (1), each node in Lemma 12 that destroys one of the three independences cannot be anterior to any node in that independence, and so leads to identifiable invariant edge-marks (arrowheads).

To make this more precise we first introduce the following definitions:

###### Definition 4.

A minimal independence set is a set of minimal independencies consistent with a MAG . It is called a minimal independence model if it contains at least one separating set for each pair of nonadjacent nodes in the MAG .

The skeleton implied by a minimal independence set corresponds to the undirected graph with no edges between any . Note that a minimal independence model uniquely identifies the Markov equivalence class of .

###### Definition 5.

Let be the skeleton implied by a minimal independence set . Then the Augmented Skeleton is obtained by adding invariant arrowheads at all nodes on edges to in that create a single node minimal dependence , for all .

Augmentation boils down to repeated application of Lemma 2, rule (1).

From now on we assume that represents a minimal independence set as output by the PC algorithm with possible addition of one or more D-separating sets, consistent with a MAG . We also assume that we can query an independence oracle for the subsequent dependencies. This implies that the corresponding skeleton matches the skeleton of , except that it may contain zero, one, or more additional (undirected) edges that all correspond to D-sep links in . For D-sep links in the corresponding augmented skeleton this leads to the following pattern:

Lemma 4. Let be the augmented skeleton obtained from a minimal independence set consistent with a MAG , such that the only additional edges in that do not correspond with an edge in are D-sep links. Let be an edge in corresponding to a D-sep link in the MAG . If there are no (additional) edges in between other D-separable pairs of nodes in , then contains the following pattern:

1. in ,

2. and not adjacent in ,

3. paths and that do not contain arrowheads in the direction of , resp. .

###### Proof.

Follows from Lemma 12.

(1) As and are adjacent in , they are also adjacent in . Similarly for and . Nodes and are also (still) presumed to be adjacent in . The assumption ‘no edges in between other D-sep links in ’ ensures that the three non-adjacencies (4)-(6) in Lemma 12 are present in ; the six subsequent dependences in Lemma 12 are found by the augmentation procedure, each time adding arrowheads to the corresponding edge. Ultimately this means that contains the invariant pattern: .

(2) In particular (6) in Lemma 12 ensures that and are not adjacent in . The assumption ‘no edges in between other D-sep links in ’ ensures that are not adjacent in either.

(3) As there has to be a path from in that can be(come) oriented as a directed path into . This means the augmentation procedure cannot add an invariant arrowhead in the opposite direction; idem for . ∎

The following result generalizes Lemma 5 in the original article as a minimal separating set automatically implies .

Lemma 5. Let and be two possibly overlapping but nonidentical pairs of D-separable nodes in a MAG . If , then .

###### Proof.

Suppose . If then by the given and acylicity , which by transitivity implies , contrary Lemma 3 rule (1). Idem for . So either or . Idem for . But if both and , with in a D-sep link, then the two D-separable pairs would be identical. Therefore at least one is not ancestor of , and so . ∎

So two D-separable node pairs cannot both be present in each others D-separating set. In fact, the ancestor relation induces a partial order over the D-sep links:

###### Lemma 13.

Let be a set of distinct (but not necessarily disjoint) D-sep links in a MAG . Then the relation defines a partial order over .

###### Proof.

For all :

1. Reflexivity: () is trivial.

2. Antisymmetry: (if and then ) follows from Lemma 5.

3. Transitivity: (if and , then ) follows from transitivity of the ancestor relationship of nodes in a MAG.

This implies the relation satisfies the conditions of a partial order over the elements in . ∎

As a result, in every non-empty (sub)set of D-separable node pairs there is at least one pair that does not have both nodes of any of the other pairs in its ancestors:

###### Lemma 14.

If is a non-empty set of distinct (but not necessarily disjoint) D-sep links in a MAG , then there is a such that .

###### Proof.

By Lemma 3 rule (4), D-sep nodes are not part of an undirected edge, so the statement reduces to . In terms of the partial order defined in Lemma 13 this is equivalent to stating that there exists a minimal element with respect to , i.e., an element such that there is no other element (with ) that precedes it, i.e., such that . As any finite partially ordered set has at least one minimal element, this proves the lemma. ∎

This means that if there are still one or more unidentified D-sep links in the augmented skeleton , then at least one of these has no unidentified D-sep links between any two of its ancestors, and so for that D-sep link the bidirected edge pattern of Lemma 4 is guaranteed to appear in . Therefore we can employ the following search strategy to check for D-sep links.

###### Lemma 15.

In a MAG , all D-sep links can be found by repeatedly (and exclusively) checking an augmented skeleton for edges that appear as the middle link of the bidirected triple from Lemma 4, while updating for each D-sep link found.

###### Proof.

Let be the skeleton of , possibly with additional edges in that all correspond to D-sep links in , (e.g. as obtained from the PC-search stage in the FCI algorithm). Let be the augmented skeleton of w.r.t. minimal (in)dependencies implied by . Then, as long as there are one or more edges in that are not in , then by Lemma 14 at least one of these edges will have no unidentified D-sep links (edges in that are not in ) between its ancestors, and so by Lemma 4 this D-sep link will show up in as the middle edge of the bidirected triple. Given a procedure to establish whether or not a candidate edge satisfying the bidirected pattern is a D-sep link (e.g., FCI’s Possible-D-SEP search), then testing all candidate edges, while updating for each D-sep link identified (remove edge and compute arrowheads for new bidirected triples) until no more can be found, is guaranteed to find all D-sep links. This means that at the end the skeleton of matches that of the MAG , and all arrowheads in are also in . ∎

This greatly improves the practical running speed of FCI, as often no or hardly any edges need to be checked (after the augmented skeleton has been constructed), but in itself it is not sufficient to guarantee a reduction of the overall complexity to polynomial time, as even a single edge may still require searching through all subsets of order nodes. The next section shows how a different search strategy can resolve this problem.

### 2.2 Proofs - Capturing the D-sep nodes

In proving some of the Lemmas below we often consider marginal MAGs, i.e. MAGs obtained by marginalizing out one or more nodes from a base MAG in accordance with the rules in (Richardson and Spirtes, 2002) (see also section 1.1 for a definition).

First some properties of unblocked paths in an ancestral graph relative to the adjacent ancestors of D-sep link , used in the proof of Lemma 6.

All paths ultimately blocked by one or more of the D-sep nodes are unblocked relative to .

###### Lemma 16.

In a MAG , if with , and is a path between and that is unblocked relative to for , then:

1. all colliders on are in ;

2. is unblocked given , .

###### Proof.

(1) A path is unblocked relative to a set iff every noncollider along is not in , and every collider on is ancestor of some node in . If every noncollider along is not present in then they are also not present for a subset . Furthermore, every node that is a descendant of some collider along is in (given and the arrowhead at as descendant of the collider), and so has a directed path to . This directed path goes via penultimate node , and so it follows that , and so by transitivity the colliders along as well, are ancestor of a node in .

(2) Therefore remains unblocked relative to in combination with any subset , including . ∎

Also, paths blocked by D-sep nodes correspond to sequences of bidirected edges in marginal MAGs.

###### Lemma 17.

In a MAG with D-separable and , if with , then a path between and in that is unblocked relative to corresponds to a sequence of three or more bidirected edges connecting and in all marginal MAGs over , with .

###### Proof.

Below we first construct a sequence of unblocked treks in the MAG between nodes in that connects and (steps 1-3). Then we map this sequence to the bidirected edge path in the marginal MAG (steps 4-6). Let be a path in between D-separable that is unblocked relative to .

Step 1: map to sequence of unblocked treks in .
Let be the colliders in along the path blocked by . By Lemma 16-(1), all colliders . Using similar reasoning as in the beginning of the proof of Lemma 12, the path blocked by (which is nonadjacent to and ) must be of the form . Each successive pair of colliders along unblocked must be connected by a trek (possibly a single edge ) that does not contain any node in , and so corresponds to a sequence of treks connecting and in that are unblocked relative to any subset of .

Step 2: map to sequence of unblocked treks between nodes in .
By Lemma 16 the path in is also unblocked given , for any subset , and each collider along is in . For each let be the first descendant of in that is in (possibly ; in particular, and ). If there are two or more such descendants (along different paths) then simply pick one of these at random. As a result, in there are treks between and , and each such trek is again unblocked given any subset of . Note that the concatenation of the three treks , , is not necessarily a trek, as a node may occur more than once. This can be remedied by taking a “shortcut” via that node. Note that this node cannot become a collider, as at least one of the occurrences of that node must be on one of the directed paths or (because is a trek), and that means that at least one of the edges at that node will have a tail. The result is a trek in between and that is unblocked given any subset of . Therefore corresponds to a sequence of unblocked treks in between nodes in .

Step 3: map to sequence of unblocked treks between distinct nodes in in .
It is possible that there are duplicates in , (i.e.  with ), e.g. in case a descendant in is shared by multiple . In that case we can remove all nodes from the sequence while still keeping a contiguous sequence of unblocked treks between and . Assume we repeatedly merge such doublets (removing all intermediate nodes) until we are left with a sequence of distinct nodes , with for , and where each is connected by a trek (with arrows into and ) in that is unblocked given any subset .
Note that the mapping is not necessarily unique, e.g. if , then either or will do.

Step 4: match sequence to path in .
As each pair in the sequence is connected by a trek in that does not contain any noncolliders that are in , it follows that there is an unblocked path between each such pair given any subset of