# Causal inference via algebraic geometry: feasibility tests for functional causal structures with two binary observed variables

We provide a scheme for inferring causal relations from uncontrolled statistical data based on tools from computational algebraic geometry, in particular, the computation of Groebner bases. We focus on causal structures containing just two observed variables, each of which is binary. We consider the consequences of imposing different restrictions on the number and cardinality of latent variables and of assuming different functional dependences of the observed variables on the latent ones (in particular, the noise need not be additive). We provide an inductive scheme for classifying functional causal structures into distinct observational equivalence classes. For each observational equivalence class, we provide a procedure for deriving constraints on the joint distribution that are necessary and sufficient conditions for it to arise from a model in that class. We also demonstrate how this sort of approach provides a means of determining which causal parameters are identifiable and how to solve for these. Prospects for expanding the scope of our scheme, in particular to the problem of quantum causal inference, are also discussed.

There are no comments yet.

## Authors

• 5 publications
• 5 publications
08/11/2019

### Learning Linear Non-Gaussian Causal Models in the Presence of Latent Variables

We consider the problem of learning causal models from observational dat...
09/02/2016

### The Inflation Technique for Causal Inference with Latent Variables

The problem of causal inference is to determine if a given probability d...
10/30/2021

### Causal Discovery in Linear Structural Causal Models with Deterministic Relations

Linear structural causal models (SCMs) – in which each observed variable...
01/03/2017

### Semidefinite tests for latent causal structures

Testing whether a probability distribution is compatible with a given Ba...
04/09/2020

### A category theoretical argument for causal inference

The goal of this paper is to design a causal inference method accounting...
07/20/2017

### The inflation technique solves completely the classical inference problem

The causal inference problem consists in determining whether a probabili...
05/26/2017

### Learning Causal Structures Using Regression Invariance

We study causal inference in a multi-environment setting, in which the f...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Causal relationships, unlike statistical dependences, support inferences about the effects of interventions and the truths of counterfactuals. While a randomised controlled experiment can be used to determine causal relationships, these may not be available for various reasons: they could be restrictively expensive, technologically infeasible, unethical (e.g., assessing the effect of smoking on lung cancer), or indeed physically impossible (e.g., for variables describing properties of distant astronomical bodies). Therefore, inferring causal relationships from uncontrolled statistical data is an important problem, with broad applicability across scientific disciplines. Over the past-twenty five years, there has been much progress in developing methods to solve this problem [1, 2, 3, 4, 5].

As has become standard practice, we formalize the notion of causal structure using directed acyclic graphs (DAGs) with random variables as nodes and arrows representing direct causal influence

[1, 2]. A more refined description of causal dependences specifies not only what causes what, but also, for every variable, its functional dependence on its causal parents. We shall use the term functional causal structure to refer to the specification of the set of functions, which includes a specification of the DAG. As is standard, the variables that are not observed are termed latent, and the DAG does not include any latent variables that act as causal mediaries, so that all the latent variables are parentless. We shall use the term causal model

to describe the functional causal structure together with a specification, for each latent variable, of a probability distribution over its values. Each causal model associated to a given functional causal structure defines a possible joint probability distribution over the observed variables. We are interested in the set of possible joint distributions over the observed variables for a given functional causal structure, that is, those that can arise from

some set of distributions on the latent variables. We will say that two functional causal structures are observationally equivalent if they are characterized by the same set of distributions over the observed variables.111This should not be confused with the notion of observational equivalence as applied to DAGs [1].

Many causal inference algorithms, such as those of [1] and [2], only make use of conditional independence relations among the observed variables. If two causal structures are such that the same set of conditional independence relations are faithful to them, then they are said to be Markov equivalent. Note that Markov equivalence can be decided purely on the basis of the DAG (i.e., the causal structure), while the notion of observational equivalence of interest here depends on the functional dependences (i.e., the functional causal structure). In the case of just two observed variables, which is the one we consider here, the set of all causal structures are partitioned into just two Markov equivalence classes: those wherein the variables are causally connected, and those wherein they are not. As we show, however, the joint distribution over the observed variables supports many more inferences about the functional causal structure, thereby providing a more fine-grained classification than is provided by Markov equivalence.

In recent years, several methods have been suggested that make use not only of conditional independences, but also other properties of the joint statistical distribution between the observed variables [3, 4, 5, 6] (See also the works discussed in Secs. 6.2 and 6.3). These newer methods also have limitations in the sense that they impose restrictions on the number of latent variables allowed in the underlying causal model and also on the mechanisms by which these latent variables influence the observed ones.

In the present work, we restrict attention to the causal inference problem where there are just two observed variables, each of which is binary (that is, discrete with just two possible values). We allow any functional causal structure involving latent variables that are discrete (with a finite number of values), and we impose no restriction on the number of latent variables or the mechanisms by which these influence the observed ones.

We provide an inductive scheme for characterizing the observational equivalence classes of functional causal structures. This scheme has a few steps. First we show that, in each observational class, there is a functional causal structure wherein all of the latent variables are binary. Restricting ourselves to the latter sort of functional causal structure, we show that one can inductively build up any functional causal structure from a pair of others having fewer latent variables. Thus, starting with functional causal structures with no latent variables, we can recursively build up all functional causal structures, and therefore all observational equivalence classes of these, by applying our inductive scheme.

Using this scheme, we catalogue all observational equivalence classes generated by functional causal structures with four or fewer binary latent variables. We have evidence, but no proof yet, that our catalogue is complete in the sense that a functional causal structure with any number of binary latent variables—and hence, by the connection described above, any functional causal structure with discrete latent variables—belongs to one of the classes we have identified.

We also describe a procedure for deriving, for each class, the set of necessary and sufficient conditions on the joint distribution over observed variables for it to be possible to generate it from functional causal structures in this class. We call such a set of conditions a feasibility test for the class. The procedure for deriving these is as follows. We start with a particular functional causal structure within the class, express the parameters in the joint probability distribution over the observed variables in terms of the parameters in the probability distributions over the latent variables, then eliminate the latter using techniques from algebraic geometry.

Finally, we consider applications to the problem of identifying causal parameters. For the parameters describing the probability distributions over the latent variables, we note that our technique allows one to find expressions for these in terms of the observational data for each observational equivalence classes that we have considered. For the parameters describing the functional relations, we note that the limits to what one can infer about these, which may be different for different points in the space of possible joint distributions over the observed variables, can be inferred from our feasibility tests.

## 2 Setting up the problem

Consider the causal model of Fig.1(a). From the DAG, it is clear that is a cause of , while is noise local to and is noise local to .

The functional dependences are given by and . A model with this sort of functional dependence is referred to as an additive noise model (ANM) in Refs. [3, 4, 5, 6]. The values of , for different values of and , are given in the table below.

In Ref. [5], it was shown that one can distinguish between the causal model of Fig.1(a) and the causal models depicted in Fig. 2(a) and Fig. 2(c), except for special cases of the distributions over the noise variables, such as, for instance, when and

are uniformly distributed. Thus if we are promised that the causal model is an ANM, then (except for the special cases) we can distinguish between

causing , causing and and being causally disconnected. To see how this works we will need to determine the correlations generated by this model.

To describe the correlations we adopt the following notational convention.

 P(A) =[x]meansP(A=x)=1 P(A,B) =[x][y]=[xy]meansP(A=x,B=y)=1 P(A) =q[x]meansP(A=x)=q.

Let be the probability that and be the probability that , then the correlations for the above causal model are

 P(A,B)=q 1q2[00]+(1−q1)(1−q2)[01] +q1(1−q2)[10]+(1−q1)q2[11],

This means and so on. From now on, we will use the shorthand to simplify expressions.

Note that if a latent variable were to take one of its values with probability 1, then it would be trivial and could be eliminated from the functional causal structure. We therefore consider only functional causal structures with nontrivial latent variables, that is, latent variables that have some statistical variation in their value, so that the probability of any value is bounded away from 0 and 1. In the present example, therefore, .

For a general causal model we have , where . We note that . As we only need three real parameters to specify , we can plot it in . It is easy to see that the points form the vertices of a tetrahedron in and so the plot of must lie within this tetrahedron.

We can rewrite for our current example as

 P(A,B)=q1(q2[00]+¯¯¯q2[10])+¯¯¯q1(q2[11]+¯¯¯q2[01]).

So, if we fix the value of in the range and vary over the interval , the plot of consists of the line passing through a point on the edge of the tetrahedron containing the vertices and a point on the edge containing the vertices (but excluding these points). The full plot of , as and each range over the interval , is depicted in Fig. 1(b) (where the boundary points are excluded). We refer to this shape as a fan. Fig. 2(b) and Fig. 2(d) depict the set of joint distributions for the ANM where causes and the causal structure where and are causally disconnected.

Given some joint distribution, , how do we determine if it lies on one of the fans of Fig. 1(b), Fig. 2(b) or Fig. 2(d)? Recall that, because the latent variables are unobserved, we do not have access to the ’s directly, only the observed ’s. Thus, the problem can be posed as follows: what are the defining equations of the fans in terms of the observed ’s?

This problem was solved for the example of Fig. 1 in Ref. [5] using the following technique. First, it was noted that the DAG implies that is marginally independent of , and therefore . Given that

is a binary variable, this is true if and only if

. We wish to eliminate from this condition. Recall from the definition of conditional probability that . The functional dependence can be used to conclude that . Note that this last step is only possible because the noise is additive, so that one can infer from and . Therefore, reverting to our notational conventions, where and , the condition becomes

 p10p00+p10=p01p11+p01,

which can be rewritten as:

 p00p01=p11p10.

This equation, together with the open-interval constraints,

 0

defines the fan in Fig. 1(b). Using similar techniques, one can show that Figs. 2(b) and  2(d) are defined by equation

 p00p10=p11p01,

respectively

 p00p11=p10p01,

together with the open-interval constraint.

The question is: how can one find feasibility tests for generic causal models? In particular, how does one treat models where the noise is not additive? Consider, for instance, the causal model that has the same DAG as in Fig. 1(a), but where the noise is multiplicative, that is, . In this case, the value of cannot be inferred from and (given that these could be zero), and consequently one cannot use the approach of Ref. [5]. It is also unclear how one can characterize the possibilities for the joint distribution when the causal model involves an arbitrary number of latent variables. We will show that these questions can be answered using powerful tools from algebraic geometry, which we describe in the next section.

## 3 Deriving the feasibility tests

We begin with an introduction to some of the main concepts of algebraic geometry following the presentation given in [7]. For a more detailed discussion, see appendix A.

Denote the set of all polynomials in variables with coefficients in some field by . When dealing with polynomials, we are mainly interested in the solution set of systems of polynomial equations. This leads us to the main geometrical objects studied in algebraic geometry, algebraic varieties and semi-algebraic sets.

An algebraic variety222Also called an affine variety or an algebraic set. is the solution set of the system of polynomial equations A basic semi-algebraic set is defined to be the solution set of a system of polynomial equalities and inequalities, that is, where are polynomials ove the reals333Note that one can replace the real field used in the last definition with any ordered field. and where . Note that algebraic varieties are examples of basic semi-algebraic sets. A semi-algebraic set is formed by taking finite combinations of unions, intersections, or complements of basic semi-algebraic sets. For instance, the fan in Fig.1(b) is the semi-algebraic set that results from the intersection of the algebraic variety defined by the single polynomial equation and the set of inequalities that define the interior of the tetrahedral probability simplex (requiring each probability to be in the interval ).

More generally, for any causal model, the set of possible joint distributions that can be generated by it are represented by a semi-algebraic set. It follows that two causal models are observationally equivalent if and only if they generate the same semi-algebraic set.

We now define ideals, the main algebraic object studied in algebraic geometry. A subset is an ideal if it satisfies: (1) , (2) If , then , and (3) If and , then .

A natural example of an ideal is the ideal generated by a finite number of polynomials, defined as follows. Let be polynomials in , then the ideal generated by is:

 ⟨f1,…,fs⟩={s∑i=1hifi:h1,…,hs∈k[x1,…,xn]}.

The polynomials are called the basis of the ideal.

Studying the relations between certain ideals and varieties forms one of the main areas of study in algebraic geometry. One can even define the algebraic variety defined by the ideal , where

 V(I)={(a1,…,an)∈kn:f(a1,…,an)=0,∀f∈I}.

Interestingly, it can also be shown that if , then , which is to say that the variety defined by a set of polynomials is the same as the variety defined by the ideal generated by those polynomials. Hence, varieties are determined by ideals.

We can now use the language of algebraic geometry to restate the question asked at the end of the last section. Let be an algebraic variety given parametrically as

 p1 =g1(q1,…,qm), (3.0.1) ⋮ pn =gn(q1,…,qm),

where the are polynomials in . The conjunction of the above equalities with the inequalities ensuring that the variables are in the interval (probabilities bounded away from 0 and 1) defines a semi-algebraic set on . We seek to infer which values of are possible for some values of the in their allowed intervals. By the Tarski-Seidenberg theorem [8], the solution to this problem is also a semi-algebraic set. We determine the latter as follows. First, we eliminate the variables to find a system of polynomial equations in ,. These define the smallest algebraic variety on that contains the semi-algebraic set that we seek to characterize. This problem is known as implicitization. The second step is to determine which points in this algebraic variety can be extended to a solution of the equalities and inequalities of the original parametric characterization.

For example, consider the algebraic variety that is defined parametrically by the polynomial equations

 p00=q1q2,p10=q1¯¯¯q2,p01=¯¯¯q1¯¯¯q2,p11=¯¯¯q1q2.

We would like to characterize the semi-algebraic set that this variety defines on the observed variables alone when one eliminates the parameters and while enforcing that they are probabilities in . In Sec. 2, it was shown how one can do so, and that the resulting semi-algebraic set is the one depicted in Fig.1(b). However, the technique was not generalizable to arbitrary functional causal structures. Here, we reconsider this example using techniques that are generally applicable.

The problem can be solved by employing a specific choice of basis for the ideal generated by the system of polynomial equations that define the variety (3.0.1). The basis that achieves this feat is known as the Groebner basis.

Groebner bases simplify many calculations in algebraic geometry and they have many interesting properties [7]. There are efficient algorithms for calculating Groebner bases and many software packages that one can use to implement them.

We discovered in this section that the fan of Fig. 1(b) is in fact the intersection of the algebraic variety defined by the ideal

 ⟨p00−q1q2,p10−q1¯¯¯q2,p01−¯¯¯q1¯¯¯q2,p11−¯¯¯q1q2⟩

with the tetrahedron. The Groebner basis 444with respect to the lex order , see appendix A of this ideal is found to be

 g1 =q1+p01+p11−1 g2 =q2+p01+p10−1 g3 =p00+p01+p10+p11−1 g4 =p201+p01p10+p11p01−p01+p10p11.

Solutions to provide solutions to

 p00=q1q2,p10=q1¯¯¯q2,p01=¯¯¯q1¯¯¯q2,p11=¯¯¯q1q2

which define our algebraic variety. Looking more closely at the Groebner basis we note that the variables have been eliminated from the polynomials and . The solution of is exactly the normalisation condition. The solution of gives us the following

 p01(p10+p01+p11−1)+p10p11=0,

which, using the normalization condition, then gives us

 p00p01=p10p11.

On demanding and (i.e. on taking the intersection of this algebraic variety with the tetrahedron), we obtain the semi-algebraic set corresponding to the fan of Fig.1(b), which we derived in section 2. This is a special case of a general result, known as the elimination theorem, which provides us with a way of using Groebner bases to systematically eliminate certain variables from a system of polynomial equations and, thus, to solve the implicitization problem.

The general procedure for finding the semi-algebraic set is as follows. First, given the system of polynomial equations defining the implicitization problem, as in Eq. (3.0.1), form the ideal generated by these polynomials and compute555with respect to the lexicographic order its Groebner basis. The elements of this basis that do not contain the variables constitute constraints on the variables alone. These constraints consitute polynomial equalities, and therefore define an algebraic variety on the variables . Second, we determine which points on this variety correspond to solutions of the original equalities and inequalities on and . This will result in inequality constraints. The equality constraints from the first step and the inequality constraints from the second step together characterize the semi-algebraic set on that is compatible with the given functional causal structure. We note that one trivial consequence of the fact that each of the is in the interval is that each of the is in the interval . As such, the semi-algebraic set we seek to characterize is always a subset of the geometric intersection of the algebraic variety we find in the first step and the probability simplex on . Note, however, that it is generically a strict subset of this intersection.

These inequality constraints manifest themselves in different ways. We present an example of one such manifestation below and leave the remaining examples to appendix B.

Consider the causal model of Fig. 3(a). Defining , and to be the probabilities for , and respectively, the joint distribution generated by this model is

 P(A,B)=(q1 +¯¯¯q1q2q3)[00]+¯¯¯q1q2¯¯¯q3[01] +¯¯¯q1¯¯¯q2q3[10]+¯¯¯q1¯¯¯q2¯¯¯q3[11]. (3.0.2)

We begin by providing an intuitive account of the semi-algebraic set describing such joint distributions. Note first that can be rewritten as

 P(A,B)=q1[00]+¯¯¯q1(q2q3 [00]+q2¯¯¯q3[01] +¯¯¯q2q3[10]+¯¯¯q2¯¯¯q3[11]),

implying that it is the convex combination, with weight , of the point distribution , and with weight , of the distribution arising from the functional causal structure of Fig. 2(c), shown above to be characterized by the equality . It follows that the semi-algebraic set defined by contains all interior points on any line extending from the vertex to a point on the fan depicted in Fig. 2(d); this variety is depicted in Fig. 3(b).

Reading off the expressions for , and from Eq. (3.0.2), we obtain the set of polynomials that define the full algebraic variety. The ideal generated by these is:

 ⟨p00−q1−¯¯¯q1q2q3,p01−¯¯¯q1q2¯¯¯q3,p10 −¯¯¯q1¯¯¯q2q3, p11−¯¯¯q1¯¯¯q2¯¯¯q3⟩.

To implement the first step of the general procedure outlined above, we derive the Groebner basis for this ideal 666with respect to the lex order :

 g1 =q2q1−q1−q2−p10−p11+1 g2 =q3q1−q1−q3−p01−p11+1 g3 =q3p10+q3p11−p10 g4 =q2p01+q3p11−p01 g5 =p00+p01+p10+p11−1 g6 =p211+p01p10+p11p10−p11+p01p11+p11q1.

Now is just the normalisation condition and gives the following:

 p11(p10+p01+p11−1)+p01p10+p11q1=0

which, using the normalisation condition, results in

 q1=p11p00−p10p01p11. (3.0.3)

To implement the second step of our procedure, we begin by enforcing . This results in the following inequality

 p11p00>p10p01.

None of the remaining constraints , for result in nontrivial relations among the ’s, so the latter inequality is the only nontrivial constraint. Together with the open-interval constraints it describes the necessary and sufficient conditions for the distribution on observed variables to be compatible with the functional causal structure of Fig. 3(a). These conditions define the semi-algebraic set depicted in Fig. 3(b).

## 4 Characterizing the observational equivalence classes

In this section, we will provide a scheme for inductively characterizing all observational equivalence classes. As noted in the introduction, we consider only causal models where there is a pair of binary observed variables, which we denote by and .

### 4.1 Sufficiency of considering purely common-cause models

A causal model having no directed causal influences between the observed variables will be termed purely common-cause.

###### Lemma 4.1.1.

Every causal model wherein there is a directed causal influence between and (either or ) is observationally equivalent to one that is purely common-cause.

The proof is as follows. Suppose that there is a directed causal influence . If the collection of all latent variables is denoted by , then a general causal model can be specified by the functional dependences and for some functions and . But this is observationally equivalent to the causal model that is purely common-cause with functional dependences and where . In characterizing the distinct observational equivalence classes, therefore, it suffices for us to consider the models that are purely common-cause, and therefore we restrict our attention to these henceforth.

An explicit example serves to illustrate this equivalence. The causal model depicted in Fig. 4(a), with functional dependences and , involving a directed causal influence from to , is observationally equivalent to the causal model depicted in Fig. 4(b), with functional dependences and , which is purely common-cause. To see this, note that one can express the functional dependences of the first causal model as and Performing the substitution described in the previous paragraph yields , which on identifying with , results in the second causal model.

### 4.2 Sufficiency of considering models with binary latents

We call a causal model where all the latent variables are binary a causal model with binary latents. If there are binary latent variables, it is called an -latent-bit causal model.

###### Theorem 4.2.1.

Consider the family of causal models where the latent variables are discrete and finite, but not necessarily binary. Every such model is observationally equivalent to one with binary latents. Equivalently, there is a causal model with binary latents in each observational equivalence class.

The proof is provided in appendix C, but we now present a simple example which illustrates the main idea of the proof.

Consider the causal model of Fig. 5(a), where are binary, but is a three-valued variable, i.e., a trit. Suppose the functional relationships are as follows: and , where means addition modulo . The values of for different values of are given in the table below.

One can see that the distributions over that can be generated by this model correspond to the face of the tetrahedron that contains the vertices .

The trick to simulating this model using a -latent-bit model is to replace the latent three-valued variable with a pair of binary variables and and to imagine that these are causally related in the manner depicted in Fig. 5(b). That is, we imagine a latent bit acting locally on and a latent bit acting as a common cause of and with the functional dependence and . This causal model can generate any distribution over and that has support only on the values , as can be seen by consulting the row containing class from the -page table appearing later in this paper, where and play the role of and .

If we take and to be related to by , so that the values and of map respectively to the values and of , then any distribution over can be emulated by some distribution over the values and of and hence some distribution over and . Finally, we can express and explicitly in terms of and by eliminating and , obtaining the causal model depicted in Fig. 5(c) with dependences and . By construction, we must obtain precisely the same semi-algebraic set for and in the model of Fig. 5(c) as one does in the model of Fig. 5(a). We have therefore defined a -latent-bit model that simulates our latent trit model.

The key ingredient of the above example was that we were able find a causal model which could—by appropriately varying over the distribution of its latent variables—generate any distribution over a given face of the tetrahedron, and hence any distribution on a trit. In the case of an -valued latent variable however, one would need to find a -latent-bit model which could generate any distribution on an -simplex. We provide an inductive procedure for constructing such a latent-bit model in appendix C.

Theorem 4.2.1 implies that for the project of determining the observational equivalence classes, it suffices to consider models with binary latents. and so we restrict our attention to these henceforth.

### 4.3 Inductive scheme

Next, we define a scheme for composing pairs of -latent-bit causal models into a single -latent bit causal model, such that if we start with all possible pairs of -latent-bit causal models, and apply the composition operation, we generate all possible -latent-bit causal models.

Denote the latent binary variables by . A general -latent-bit causal model is then defined by the functional dependences

 A=∑αaαλαandB=∑αbαλα (4.3.1)

where is shorthand for the monomial for some set of exponents , and are parameters that specify the nature of the functional dependences.

We assume that the first causal model is defined by parameters and , and the second is defined by parameters and . The additional binary latent variable, which supplements the binary variables of the original two models is denoted . The -latent-bit model which is the composition of the two models is defined by the functional dependences

 A=∑α[(δ⊕1)a(0)α+δa(1)α]λα, B=∑α[(δ⊕1)b(0)α+δb(1)α]λα. (4.3.2)

This construction has been chosen such that acts as a switch variable: if we set in the resulting -latent-bit model, we recover the first -latent-bit model, while if we set , we recover the second -latent-bit model.

With these definitions, our composition result can be summarized as follows.

###### Theorem 4.3.1.

Consider the map that takes a pair of -latent-bit causal models defined by the functional dependences of Eq. (4.3.1) with parameters for the first model, and parameters for the second model, and returns the -latent-bit causal model defined by the functional dependences of Eq. (4.3). Under this map, the image of the set of all pairs of -latent-bit causal models is the set of all -latent-bit causal models.

###### Proof.

The functional dependences of Eq. (4.3) can equivalently be expressed as polynomials in and as

 A=∑α(a(0)αλα+(a(0)α⊕a(1)α)λαδ), B=∑α(b(0)αλα+(b(0)α⊕b(1)α)λαδ) (4.3.3)

It now suffices to note that as one varies over all possible joint values for the variables in the set (there are possibilities), one necessarily varies over all possible joint values for the variables in the set , which in turn implies that one is varying over all possible polynomials in and in the expresson for . By a similar argument, as one varies over all possible joint values for the variables in the set , one varies over all possible polynomials in and in the expression for . It follows that as one varies over all possible joint values for the variables in the set , one obtains all possible manners in which and might be functionally dependent on the latent variables in the -latent-bit causal model. Thus as one varies over all possible pairs of -latent-bit causal models in our switch-variable construction, one varies over all possible -latent-bit causal models. ∎

We can therefore generate all causal models with binary latents by this inductive rule starting from the -latent-bit causal models.

### 4.4 Catalogue of observational equivalence classes

Recall that two causal models are observationally equivalent if they define the same semi-algebraic set. Thus, to characterize the observational equivalence classes, we proceed as follows. For each new causal model that we generate by the inductive scheme, we determine the corresponding semi-algebraic set. Every time one obtains a variety that has not appeared previously, one adds it to the catalogue of observational equivalence classes.

Note that if a causal model has been obtained from two simpler models via our composition scheme, then the semi-algebraic set associated to it necessarily includes as subsets both of the semi-algebraic sets of the simpler models (note that this semi-algebraic set is generally not the convex hull of the semi-algebraic sets of the two simpler models). It follows that if the semi-algebraic set of a given causal model is found to be the entire tetrahedron, then composing this model with any other will also yield the tetrahedron. In this case, there are no new observational equivalence classes to be found among the descendants of this causal model in the inductive scheme.

In particular, if it were to occur that at some level of the inductive scheme, every newly generated causal model could be shown either to reduce to a previously generated causal model or to yield a semi-algebraic set that is the entire tetrahedron, then one could conclude that one’s catalogue of the observational equivalence classes of causal models was complete in the sense that any -latent bit causal model belongs to one of these classes.

We have used our inductive scheme to construct all observational equivalence classes generated by causal models with four or fewer binary latent variables. We have also considered a large number of causal models with five binary latent variables and found no new observational equivalence classes. This suggests that our catalogue may already be complete, although we do not have a proof of this. Above, we noted circumstances in which our inductive scheme would terminate, which provides one strategy for attempting to settle the question. Even in the absence of a proof of completeness, the inductive scheme presented here for classifying observational equivalence classes may be of independent interest to researchers in the field.

The observational equivalence classes of causal models that we have obtained (which cover all causal models with four or fewer binary latent variables) are presented in the table covering the next three pages. For each class, we depict the semi-algebraic set that defines the class, the feasibility test for the class, and a representative causal model from the class. Note that the open-interval constraints are part of every feasibility test unless explicitly stated otherwise. The corresponding constraint on the affine varieties is that those varieties confined to the edges exclude the vertices, those confined to the faces exclude the edges, and those in the bulk exclude the faces.

The task of describing the catalogue is simplified by the fact that many of the observational equivalence classes are related to one another by simple symmetries. We therefore organize the classes into orbits, where an orbit is a set of classes whose elements are related to one another by a set of symmetry transformations. For one of the classes in the orbit (which we term the ‘fiducial’ class), we provide a full description, and below this description, we specify the set of symmetry transformations that must be applied to it to obtain the other elements of the orbit. Formally, this is a set of representatives of the right cosets of the subgroup of symmetries of the semi-algebraic set in the full symmetry group of the tetrahedron.

We express these representatives as compositions of the following set of symmetry transformations, which we define below: . For each of the five, we specify both their action on the causal model, i.e., their action on the functional dependences, from which their action on the DAG can be inferred, and on the elements of the joint distribution , from which their action on the feasibility test can be inferred. Each symmetry transformation also defines an action on the tetrahedron in an obvious manner. Id is the identity transformation, leaving the model and invariant; is the bit flip on , replacing the functional dependence with and mapping ; is the bit flip on , defined analogously to ; is the swap transformation, replacing the functional dependences with , and mapping ; is the “add to ” transformation, replacing the functional dependences with and mapping . We denote a composition of two symmetry transformations by a right-to-left product: for instance, a bit flip on followed by a swap is denoted . The conjunction of a bit flip on and a bit flip on yields the same transformation regardless of the order in which they are implemented and is denoted .

Finally, a given observational equivalence class will be distinguished by a label of the form . Here, is the number of binary latent variables in the causal model, is the number of these that act as common causes, is an optional label that is used for distinguishing functional dependences that are consistent with a given but are observationally inequivalent, and labels the symmetry transformation that relates the class to the fiducial class