# The evolution of the global Markov property for multivariate regression graphs: differences and conflicts

Depending on the interpretation of the type of edges, a chain graph can represent different relations between variables and thereby independence models. Three interpretations, known by the acronyms LWF, MVR, and AMP, are prevalent. Multivariate regression (MVR) chain graphs were introduced by Cox and Wermuth in 1993. We review Markov properties for MVR chain graphs chronologically, and for the first time we show that two different and incompatible interpretations have been proposed in the literature. Differences and inconsistencies between them will be discussed by several examples. The older (original) interpretation has no factorization associated to it in the published literature. We derive such a factorization. For the newer (alternative) interpretation we provide an explicit global Markov property, which implies the other Markov properties published in the literature for this interpretation. We provide a summary table comparing different features of LWF, AMP, and the two kinds of MVR chain graph interpretations, which we call MVR and Alternative MVR (AMVR) respectively

## Authors

• 18 publications
• 15 publications
• ### The evolution of multivariate regression chain graphs

Depending on the interpretation of the type of edges, a chain graph can ...
03/09/2018 ∙ by Mohammad ali Javidian, et al. ∙ 0

• ### On the Properties of MVR Chain Graphs

Depending on the interpretation of the type of edges, a chain graph can ...
03/09/2018 ∙ by Mohammad ali Javidian, et al. ∙ 0

• ### An Alternative Markov Property for Chain Graphs

Graphical Markov models use graphs, either undirected, directed, or mixe...
02/13/2013 ∙ by Steen A. Andersson, et al. ∙ 0

• ### On the Herbrand Functional Interpretation

We show that the types of the witnesses in the Herbrand functional inter...
12/03/2019 ∙ by Paulo Oliva, et al. ∙ 0

• ### Factorization, Inference and Parameter Learning in Discrete AMP Chain Graphs

We address some computational issues that may hinder the use of AMP chai...
01/27/2015 ∙ by Jose M. Peña, et al. ∙ 0

• ### Learning AMP Chain Graphs under Faithfulness

04/24/2012 ∙ by Jose M. Peña, et al. ∙ 0

• ### Geometric Representations of Random Hypergraphs

A parametrization of hypergraphs based on the geometry of points in R^d ...
12/18/2009 ∙ by Simón Lunagómez, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

A probabilistic graphical model is a probabilistic model for which a graph represents the conditional dependence structure between random variables. There are several classes of graphical models; Bayesian networks (BN), Markov networks, chain graphs, and ancestral graphs are commonly used

(Lauritzen, 1996; Richardson and Spirtes, 2002). Chain graphs, which admit both directed and undirected edges, are a type of graphs in which there are no partially directed cycles. Chain graphs were introduced by Lauritzen, Wermuth and Frydenberg (Frydenberg, 1990; Lauritzen and Wermuth, 1989) as a generalization of graphs based on undirected graphs and directed acyclic graphs (DAGs). Later Andersson, Madigan and Perlman introduced an alternative Markov property for chain graphs (Andersson et al., 1996). In 1993 (Cox and Wermuth, 1993), Cox and Wermuth introduced multivariate regression chain graphs (MVR CGs).

Acyclic directed mixed graphs (ADMGs), also known as semi-Markov(ian) (Pearl, 2009) models contain directed () and bidirected () edges subject to the restriction that there are no directed cycles (Richardson, 2003; Evans and Richardson, 2014). An ADMG that has no partially directed cycle is called a multivariate regression chain graph. In this paper we focus on the class of multivariate regression chain graphs and we discuss their Markov properties.

It is worthwhile to mention that unlike in the other CG interpretations, bidirected edges in an MVR CG have a strong intuitive meaning. It can be seen that a bidirected edge represents one or more hidden common causes between the variables connected by it. In other words, in an MVR CG any bidirected edge can be replaced by to obtain a Bayesian network representing the same independence model over the original variables, i.e. excluding the new variables H. These variables are called hidden, or latent, and have been marginalized away in the CG model (Sonntag, 2014). This causal interpretation of bidirected edges in MVR CGs along with the discussion preceding Theorem 3 provides strong motivation for the importance of MVR CGs.

In the first decade of the 21st century, several Markov property (global, pairwise, block recursive, and so on) were introduced by authors and researchers (Richardson and Spirtes, 2002; Wermuth and Cox, 2004; Marchetti and Lupparelli, 2008, 2011; Drton, 2009). Lauritzen, Wermuth, and Sadeghi (Sadeghi and Lauritzen, 2014; Sadeghi and Wermuth, 2016) proved that the global and (four) pairwise Markov properties of an MVR chain graph are equivalent for any independence model that is a compositional graphoid. The major contributions of this paper may be summarized as follows:

An alternative local Markov property for MVR chain graphs, which is equivalent to other Markov properties in the literature for compositional semi-graphoids.

A comparison of different proposed Markov properties for MVR chain graphs in the literature and conditions under which they are equivalent.

An alternative explicit factorization criterion for MVR chain graphs based on the proposed factorization criterion for acyclic directed mixed graphs in (Evans and Richardson, 2014).

## 2 Definitions and Concepts

A vertex is said to be an ancestor of a vertex if either there is a directed path from to , or . A vertex is said to be anterior to a vertex if there is a path from to on which every edge is either of the form , or with between and , or ; that is, there are no edges and there are no edges pointing toward . Such a path is said to be an anterior path from to . We apply these definitions disjunctively to sets: , and . If necessary we specify the graph by a subscript, as in . The usage of the terms ancestor and anterior differs from Lauritzen (Lauritzen, 1996), but follows Frydenberg (Frydenberg, 1990). A mixed graph is a graph containing three types of edges, undirected (), directed () and bidirected (). An ancestral graph G is a mixed graph in which the following conditions hold for all vertices in G:

(i) if and are joined by an edge with an arrowhead at , then is not anterior to .

(ii) there are no arrowheads present at a vertex which is an endpoint of an undirected edge. A nonendpoint vertex on a path is a collider on the path if the edges preceding and succeeding on the path have an arrowhead at , that is, . A nonendpoint vertex on a path which is not a collider is a noncollider on the path. A path between vertices and in an ancestral graph G is said to be m-connecting given a set Z (possibly empty), with , if:

(i) every noncollider on the path is not in Z, and

(ii) every collider on the path is in .

If there is no path m-connecting and given Z, then and are said to be -separated given Z. Sets X and Y are m-separated given Z, if for every pair , with and , and are -separated given (X, Y, and Z are disjoint sets; X, Y are nonempty). This criterion is referred to as a global Markov property. We denote the independence model resulting from applying the m-separation criterion to G, by (G). This is an extension of Pearl’s -separation criterion to mixed graphs in that in a DAG D, a path is -connecting if and only if it is m-connecting. Let denote the induced subgraph of on the vertex set , formed by removing from all vertices that are not in , and all edges that do not have both endpoints in . Two vertices and in an MVR chain graph are said to be collider connected if there is a path from to in on which every non-endpoint vertex is a collider; such a path is called a collider path. (Note that a single edge trivially forms a collider path, so if and are adjacent in an MVR chain graph then they are collider connected.) The augmented graph derived from , denoted , is an undirected graph with the same vertex set as such that

Disjoint sets and ( may be empty) are said to be -separated if and are separated by Z in . Otherwise and are said to be -connected given . The resulting independence model is denoted by . Richardson and Spirtes in (Richardson and Spirtes, 2002, Theorem 3.18.) show that for an ancestral graph , . Note that in the case of ADMGs and MVR CGs, anterior sets in definitions 2, 2 can be replaced by ancestor sets, because in both cases anterior sets and ancestor sets are the same.

An ancestral graph G is said to be maximal if for every pair of vertices if and are not adjacent in G then there is a set Z (), such that . Thus a graph is maximal if every missing edge corresponds to at least one independence in the corresponding independence model.

A simple example of a nonmaximal ancestral graph is shown in Figure 1: and are not adjacent, but are -connected given every subset of , hence .

If is an undirected graph or a directed acyclic graph, then is a maximal ancestral graph (Richardson and Spirtes, 2002, Proposition 3.19).

The absence of partially directed cycles in MVR CGs implies that the vertex set of a chain graph can be partitioned into so-called chain components such that edges within a chain component are bidirected whereas the edges between two chain components are directed and point in the same direction. So, any chain graph yields a directed acyclic graph of its chain components having as a node set and an edge whenever there exists in the chain graph at least one edge connecting a node u in with a node v in . In this directed graph, we may define for each the set as the union of all the chain components that are parents of in the directed graph . This concept is distinct from the usual notion of the parents of a set of nodes in the chain graph, that is, the set of all the nodes outside such that with (Marchetti and Lupparelli, 2011).

Given a chain graph G with chain components , we can always define a strict total order of the chain components that is consistent with the partial order induced by the chain graph, such that if then (we draw to the right of as in the example of Figure 2).

For each , the set of all components preceding is known and we may define the cumulative set of nodes contained in the predecessors of component , which we sometimes call the past of . The set

captures the notion of all the potential explanatory variables of the response variables within

(Marchetti and Lupparelli, 2011). In fact, MVR CGs can model the possible presence of residual associations among the responses using a bidirected graph, and this is consistent with an interpretation of bidirected edges in terms of latent variables (Roverato, 2017; Evans, 2016).

## 3 Markov Properties for MVR Chain Graphs

In this section, first, we show, formally, that MVR chain graphs are a subclass of the maximal ancestral graphs of Richardson and Spirtes (Richardson and Spirtes, 2002) that include only observed and latent variables. Latent variables cause several complications. First, causal inference based on structural learning algorithms such as the PC algorithm (Spirtes et al., 2000) may be incorrect. Second, if a distribution is faithful to a DAG, then the distribution obtained by marginalizing out on some of the variables may not be faithful to any DAG on the observed variables i.e., the space of DAGs is not closed under marginalization (Colombo et al., 2012). Consider that the DAG in Figure 3(a) is a perfect map of the distribution of , and suppose that is latent. There is no DAG on that encodes exactly the same d-separation relations among as . Hence, there does not exist a perfect map of the marginal distribution of .

Mixed graphs provide a useful approach to address these problems without explicit modeling of latent variables (e.g., (Richardson and Spirtes, 2002; Pearl, 2009; Wermuth and Sadeghi, 2012)). The nodes of these graphs index the observed variables only. The edges, however, may be of two types, directed and bidirected. This added flexibility allows one to represent the more complicated dependence structures arising from a DAG with latent variables. A straightforward generalization of d-separation determines conditional independencies in mixed graph models (Drton and Maathuis, 2017). For instance, the MVR chain graph in Figure 3 (b) is a perfect map for the distribution in Example 3. As a result, one possibility for solving the above mentioned problems is exploiting MVR chain graphs that cope with these problems without explicit modeling of latent variables. This motivates the development of studies on MVR CGs, and (Drton and Maathuis, 2017) emphasize that methods that account for the effects of latent variables need to be developed further. If is an MVR chain graph, then is an ancestral graph. Obviously, every MVR chain graph is a mixed graph without undirected edges. So, it is enough to show that condition (i) in Definition 2 is satisfied. For this purpose, consider that and are joined by an edge with an arrowhead at in MVR chain graph G. Two cases are possible. First, if is an edge in G, by definition of an MVR chain graph, both of them belong to the same chain component. Since all edges on a path between two nodes of a chain component are bidirected, then by definition cannot be an anterior of . Second, if is an edge in G, by definition of an MVR chain graph, and belong to two different components ( is in a chain component that is to the right side of the chain component that contains ). We know that all directed edges in an MVR chain graph are arrows pointing from right to left, so there is no path from to in G i.e. cannot be an anterior of in this case. We have shown that cannot be an anterior of in both cases, and therefore condition (i) in Definition 2 is satisfied. In other words, every MVR chain graph is an ancestral graph.

The following result is often mentioned in the literature (Wermuth and Sadeghi, 2012; Peña, 2015; Sadeghi and Lauritzen, 2014; Sonntag, 2014), but we know of no published proof. Every MVR chain graph has the same independence model as a DAG under marginalization. From Theorem 3, we know that every MVR chain graph is an ancestral graph. The result follows directly from (Richardson and Spirtes, 2002, Theorem 6.3).

If is an MVR chain graph, then is a maximal ancestral graph. To characterize maximal ancestral graphs, we need the following notion: A chain is a primitive inducing chain between and if and only if for every , :

• is a collider on the chain; and

Based on Corollary 4.4 in (Richardson and Spirtes, 2002), every nonmaximal ancestral graph contains a primitive inducing chain between a pair of nonadjacent vertices. So, it is enough to show that an MVR chain graph does not contain a primitive inducing chain between any pair of nonadjacent vertices of . For this purpose, consider that and are a pair of nonadjacent vertices in MVR chain graph such that chain is a primitive inducing chain between and . So, for every , : is a collider on the chain. Since, for every , : , there is a partially directed cycle in , which is a contradiction.

### 3.1 Global and Pairwise Markov Properties

The following properties have been defined for conditional independences of probability distributions. Let and be disjoint subsets of , where may be the empty set.

1. Symmetry: ;

2. Decomposition: ;

3. Weak union: ;

4. Contraction: ;

5. Intersection: ;

6. Composition: . An independence model is a semi-graphoid if it satisfies the first four independence properties listed above. Note that every probability distribution satisfies the semi-graphoid properties (Studený, 1989). If a semi-graphoid further satisfies the intersection property, we say it is a graphoid (Pearl and Paz, 1987; Studený, 2005, 1989). A compositional graphoid further satisfies the composition property (Sadeghi and Wermuth, 2016). If a semi-graphoid further satisfies the composition property, we say it is a compositional semi-graphoid.

For a node in the connected component , its past, denoted by , consists of all nodes in components having a higher order than . To define pairwise Markov properties for MVR CGs, we use the following notation for parents, anteriors and the past of node pair : and The distribution of satisfies a pairwise Markov property (Pm), for , with respect to MVR CG() if for every uncoupled pair of nodes and (i.e., there is no directed or bidirected edge between and ):

(P1): , (P2): , (P3): , and (P4): if .

Notice that in (P4), may be replaced by whenever the two nodes are in the same connected component. Sadeghi and Wermuth in (Sadeghi and Wermuth, 2016) proved that all of above mentioned pairwise Markov properties are equivalent for compositional graphoids. Also, they show that each one of the above listed pairwise Markov properties is equivalent to the global Markov properties in Definitions 2, 2 (Sadeghi and Wermuth, 2016, Corollary 1). The necessity of intersection and composition properties follows from (Sadeghi and Lauritzen, 2014, Section 6.3).

### 3.2 Block-recursive, Multivariate Regression (MR), and Ordered Local Markov Properties

Given a chain graph , the set is the union of itself and the set of nodes that are neighbors of , that is, coupled by a bidirected edge to some node in . Moreover, the set of non-descendants of a chain component , is the union of all components such that there is no directed path from to in the directed graph of chain components . (multivariate regression (MR) Markov property for MVR CGs (Marchetti and Lupparelli, 2011))111A generalization of this property for regression graphs is the ordered regression graph Markov property in (Roverato, 2017). Let G be a chain graph with chain components

. A joint distribution P of the random vector X obeys the

multivariate regression (MR) Markov property with respect to if it satisfies the following independences. For all and for all :

(MR1) if A is connected:.

(MR2) if is disconnected with connected components : .

(Marchetti and Lupparelli, 2011, Remark 2) One immediate consequence of Definition 3.2 is that if the probability density p(x) is strictly positive, then it factorizes according to the directed acyclic graph of the chain components: (Chain graph Markov property of type IV (Drton, 2009)) Let G be a chain graph with chain components and directed acyclic graph of components. The joint probability distribution of obeys the block-recursive Markov property of type IV if it satisfies the following independencies:

(IV0): , for all ;

(IV1): , for all , and for all ;

(IV2): , for all , and for all connected subsets

The following example shows that independence models, in general, resulting from Definitions 3.2, 3.2 are different. Consider the MVR chain graph in Figure 4.

For the connected set the condition (MR1) implies that while the condition (IV2) implies that , which is not implied directly by (MR1) and (MR2). Also, the condition (MR2) implies that while the condition (IV2) implies that , which is not implied directly by (MR1) and (MR2). Theorem 1 in (Marchetti and Lupparelli, 2011) states that for a given chain graph , the multivariate regression Markov property is equivalent to the block-recursive Markov property of type IV. Also, Drton in (Drton, 2009, Section 7 Discussion) claims (without proof) that the block-recursive Markov property of type IV can be shown to be equivalent to the global Markov property proposed in (Richardson and Spirtes, 2002; Richardson, 2003).

Now, we introduce a local Markov property for ADMGs proposed by Richardson in (Richardson, 2003), which is an extension of the local well-numbering Markov property for DAGs introduced in (Lauritzen et al., 1990). For this purpose, we need to consider the following definitions and notations: For a given acyclic directed mixed graph (ADMG) G, the induced bidirected graph is the graph formed by removing all directed edges from G. The district (aka c-component) for a vertex x in G is the connected component of x in , or equivalently

 disG(x)={y|y↔⋯↔x in G, or x=y}.

As usual we apply the definition disjunctively to sets: A set C is path-connected in if every pair of vertices in C are connected via a path in ; equivalently, every vertex in C has the same district in G. In an ADMG, a set A is said to be ancestrally closed if in G with implies that . The set of ancestrally closed sets is defined as follows:

 A(G)={A|anG(A)=A}.

If is an ancestrally closed set in an ADMG (), and is a vertex in that has no children in then we define the Markov blanket of a vertex with respect to the induced subgraph on as

 mb(x,A)=paG(disGA(x))∪(disGA(x)∖{x}),

where is the district of in the induced subgraph . Let be an acyclic directed mixed graph. Specify a total ordering () on the vertices of , such that ; such an ordering is said to be consistent with . Define [Ordered local Markov property] Let be an acyclic directed mixed graph. An independence model over the node set of satisfies the ordered local Markov property for , with respect to the ordering , if for any , and ancestrally closed set such that ,

 {x}⊥⊥[A∖(mb(x,A)∪{x})]|mb(x,A).

Since MVR chain graphs are a subclass of ADMGs, the ordered local Markov property in Definition 3.2 can be used as a local Markov property for MVR chain graphs.

Five of the Markov properties introduced in this and the previous subsection are equivalent for all probability distributions, as shown in the following theorem.

Let be an MVR chain graph. For an independence model over the node set of , the following conditions are equivalent:

(i) satisfies the global Markov property w.r.t. in Definition 2;

(ii) satisfies the global Markov property w.r.t. in Definition 2;

(iii) satisfies the block recursive Markov property w.r.t. in Definition 3.2;

(iv) satisfies the MR Markov property w.r.t. in Definition 3.2.

(v) satisfies the ordered local Markov property w.r.t. in Definition 3.2. See Appendix A for the proof of this theorem.

### 3.3 An Alternative Local Markov Property for MVR Chain Graphs

In this subsection we formulate an alternative local Markov property for MVR chain graphs. This property is different from and much more concise than the ordered Markov property proposed in (Richardson, 2003). The new local Markov property can be used to parameterize distributions efficiently when MVR chain graphs are learned from data, as done, for example, in (Javidian and Valtorta, 2019, Lemma 9). While the new local Markov property is not equivalent to the five ones in Theorem 3.2 in general, we show that it is equivalent to the global and ordered local Markov properties of MVR chain graphs for compositional graphoids. If there is a bidirected edge between vertices and , and are said to be neighbors. The boundary of a vertex is the set of vertices in that are parents or neighbors of vertex . The descendants of vertex are . The non-descendants of vertex are .

The local Markov property for an MVR chain graph with vertex set holds if, for every : In DAGs, , and the local Markov property given above reduces to the directed local Markov property introduced by Lauritzen et al. in (Lauritzen et al., 1990). Also, in covariance graphs 222Equivalently, bidirected graphs, as explained in (Richardson, 2003, section 4.1). the local Markov property given above reduces to the dual local Markov property introduced by Kauermann in (Kauermann, 1996, Definition 2.1).

Let be an MVR chain graph. If an independence model over the node set of G is a compositional semi-graphoid, then satisfies the alternative local Markov property w.r.t. in Definition 3.3 if and only if it satisfies the global Markov property w.r.t. in Definition 2. : Let . So, is an ancestor set, and separates from in ; this shows that the global Markov property in Definition 2 implies the local Markov property in Definition 3.3.

: We prove this by considering the following two cases:

Case 1): Let is connected. Using the alternative local Markov property for each implies that: . Since , using the decomposition and weak union property give: . Using the composition property leads to (MR1): .

Case 2): Let is disconnected with connected components . For we have: . Since , using the decomposition and weak union property give: . Using the composition property leads to (MR2): .

: The result follows from Theorem 3.2. The necessity of composition property in Theorem 3.3 follows from the fact that local and global Markov properties for bidirected graphs, which are a subclass of MVR CGs, are equivalent only for compositional semi-graphoids (Kauermann, 1996; Banerjee and Richardson, 2003).

## 4 An Alternative Factorization for MVR Chain Graphs

According to the definition of MVR chain graphs, it is obvious that they are a subclass of acyclic directed mixed graphs (ADMGs). In this section, we derive an explicit factorization criterion for MVR chain graphs based on the proposed factorization criterion for acyclic directed mixed graphs in (Evans and Richardson, 2014). For this purpose, we need to consider the following definition and notations:

An ordered pair of sets

form the head and tail of a term associated with an ADMG if and only if all of the following hold:

1. , where .

2. H contained within a single district of .

3. Evans and Richardson in (Evans and Richardson, 2014, Theorem 4.12) prove that a probability distribution obeys the global Markov property for an ADMG() if and only if for every ,

 p(XA)=∏H∈[A]Gp(XH|tail(H)), (1)

where denotes a partition of A into sets (for a graph , the set of heads is denoted by ), defined with , as above. The following theorem provides an alternative factorization criterion for MVR chain graphs based on the proposed factorization criterion for acyclic directed mixed graphs in (Evans and Richardson, 2014).

Let G be an MVR chain graph with chain components . If a probability distribution P obeys the global Markov property for G then According to Theorem 4.12 in (Evans and Richardson, 2014), since , it is enough to show that and , where . In other words, it is enough to show that for every in , satisfies the three conditions in Definition 4.

1. Let and . Then is not a descendant of . Also, we know that , by definition. Therefore,

2. Let , then from the definitions of an MVR chain graph and induced bidirected graph, it is obvious that is a single connected component of the forest . So, contained within a single district of .

3. by definition. So, . Therefore, and . In other words, . Consider the MVR chain graph G in Example 4. Since so, and . Therefore, based on Theorem 4 we have: . However, the corresponding factorization of G based on the formula in (Drton, 2009; Marchetti and Lupparelli, 2011) is: .

The advantage of the new factorization is that it requires only graphical parents, rather than parent components in each factor, resulting in smaller variable sets for each factor, and therefore speeding up belief propagation. Moreover, the new factorization is the same as the outer factorization of LWF and AMP CGs, as described in (Lauritzen, 1996; Lauritzen and Richardson, 2002; Cowell et al., 1999; Andersson et al., 1996).

## 5 Intervention in MVR Chain Graphs

In the absence of a theory of intervention for chain graphs, a researcher would be unable to answer questions concerning the consequences of intervening in a system with the structure of a chain graph (Richardson, 1998). Fortunately, an intuitive account of the causal interpretation of MVR chain graphs is as follows. We interpret the edge as being a cause of . We interpret the edge as and having an unobserved common cause , i.e. a confounder.

Given the above causal interpretation of an MVR CG , intervening on so that is no longer under the influence of its usual causes amounts to replacing the right-hand side of the equations for the random variables in with expressions that do not involve their usual causes and normalizing. Graphically, it amounts to modifying as follows. Delete from all the edges and with (Peña, 2016).

## Conclusion and Summary

Based on the interpretation of the type of edges in a chain graph, there are different conditional independence structures among random variables in the corresponding probabilistic model. Other than pairwise Markov properties, we showed that for MVR chain graphs all Markov properties in the literature are equivalent for semi-graphoids. We proposed an alternative local Markov property for MVR chain graphs, and we proved that it is equivalent to other Markov properties for compositional semi-graphoids. Also, we obtained an alternative formula for factorization of an MVR chain graph. Table 1 summarizes some of the most important attributes of different types of common interpretations of chain graphs.

## acknowledgements

This work has been partially supported by Office of Naval Research grant ONR N00014-17-1-2842. This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), award/contract number 2017-16112300009. The views and conclusions contained therein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes, notwithstanding annotation therein.

An early version of this work was presented at the workshop of the Ninth International Conference on Probabilistic Graphical Models, Prague, September 11-14, 2018. Comments by reviewers and workshop participants are gratefully acknowledged.

## Appendix A. Proof of Theorem 3.2

(i)(ii): This has already been proved in (Richardson, 2003, Theorem 1).

(ii)(iii): Assume that the independence model over the node set of MVR CG() satisfies the global Markov property w.r.t. in Definition 2. We have the following three cases:

Case 1: Let . So, is an ancestor set, and separates from in ; this shows that the global Markov property in Definition 2 implies (IV0) in Definition 3.2.

Case 2: Assume that . Consider that . We know that there is no directed edge from to elements of , and also there is no collider path between nodes of and in . So, every connecting path that connects to in has intersection with , which means separates from in ; this shows that the global Markov property in Definition 2 implies (IV1) in Definition 3.2.

Case 3: Assume that is a connected subset of . Also, assume that . Obviously, and are two subsets of such that there is no connection between their elements. Consider that is the ancestor set containing , , and . Clearly, . Since and are disconnected in , so any connecting path between them in (if it exists) must pass through in ; this shows that the global Markov property in Definition 2 implies (IV2) in Definition 3.2.

(iii)(iv): Assume that the independence model over the node set of MVR CG() satisfies the block recursive Markov property w.r.t. in Definition 3.2. We show that satisfies the MR Markov property w.r.t. in Definition 3.2 by considering the following two cases:

Case 1 (IV0 and IV1 MR1): Assume that is a connected subset of . From (IV1) we have:

Also, from (IV0) we have , the decomposition property implies that

Using the contraction property for (2) and (3) gives: Using the decomposition property for this independence relationship gives (MR1): because .

Case 2 (IV0 and IV2 MR2): Consider that is a disconnected subset of that contains connected components i.e., . From (IV2) we have: . Using the decomposition property gives:

Also, using decomposition for (IV0) gives: . Applying the weak union property for this independence relation gives: . Using the contraction property for this and (4) gives: . Using the weak union property leads to . Similarly, we can prove that for every : .

(iv)(v): Assume that the independence model over the node set of MVR CG() satisfies the MR Markov property w.r.t. in Definition 3.2, and is an ordering that is consistent with . Let , We show that satisfies the ordered local Markov property w.r.t. in Definition 3.2 by considering the following two cases:

Case 1: There is a chain component such that . Consider that is a connected subset of . From (MR1) we have: . Using the weak union property gives: . Since , using the decomposition property leads to: .

Case 2: There is a chain component such that , and is a disconnected subset of with connected components i.e., . It is clear that there is a such that . We have the following two sub-cases:

Sub-case I): is a connected subset of .

Using the contraction property for (5) gives: . Using the weak union property gives: . Since , using the decomposition property leads to: .

Sub-case II): is a disconnected subset of with connected component i.e., . From (MR1) we have: . Since , using the decomposition and weak union property give: . Using the symmetry property implies that .

Using the contraction property for (6) gives: .

Using the contraction property for (7) gives: . Using the decomposition property gives: . Since , using the decomposition property leads to: .

(v)(i): This has already been proved in (Richardson, 2003, Theorem 2).

## References

• Andersson et al. (1996) S. A. Andersson, D. Madigan, and M. D. Perlman. Alternative markov properties for chain graphs.

Uncertainty in artificial intelligence

, pages 40–48, 1996.
• Banerjee and Richardson (2003) M. Banerjee and T. Richardson. On a dualization of graphical gaussian models: A correction note. Scandinavian Journal of Statistics, 30(4):817–820, 2003.
• Colombo et al. (2012) D. Colombo, M. H. Maathuis, M. Kalisch, and T. S. Richardson. Learning high-dimensional directed acyclic graphs with latent and selection variables. The Annals of Statistics, 40(1):294–321, 2012.
• Cowell et al. (1999) R. Cowell, A. P. Dawid, S. Lauritzen, and D. J. Spiegelhalter. Probabilistic networks and expert systems. Statistics for Engineering and Information Science. Springer-Verlag, 1999.
• Cox and Wermuth (1993) D. R. Cox and N. Wermuth. Linear dependencies represented by chain graphs. Statistical Science, 8(3):204–218, 1993.
• Cox and Wermuth (1996) D. R. Cox and N. Wermuth. Multivariate Dependencies-Models, Analysis and Interpretation. Chapman and Hall, 1996.
• Drton (2009) M. Drton. Discrete chain graph models. Bernoulli, 15(3):736–753, 2009.
• Drton and Maathuis (2017) M. Drton and M. H. Maathuis. Structure learning in graphical modeling. Annual Review of Statistics and Its Application, 15(4):365–393, 2017.
• Evans (2016) R. Evans. Graphs for margins of bayesian networks. Scandinavian Journal of Statistics, 43(3):625–648, 2016.
• Evans and Richardson (2014) R. Evans and T. S. Richardson. Markovian acyclic directed mixed graphs for discrete data. The Annals of Statistics, 42(4):1452–1482, 2014.
• Frydenberg (1990) M. Frydenberg. The chain graph markov property. Scandinavian Journal of Statistics, 17(4):333–353, 1990.
• Javidian and Valtorta (2019) M. A. Javidian and M. Valtorta. Structural learning of multivariate regression chain graphs via decomposition. https://arxiv.org/abs/1806.00882, 2019.
• Kauermann (1996) G. Kauermann. On a dualization of graphical gaussian models. Scandinavian Journal of Statistics, 23(1):105–116, 1996.
• Lauritzen (1996) S. Lauritzen. Graphical Models. Oxford Science Publications, 1996.
• Lauritzen and Richardson (2002) S. Lauritzen and T. Richardson. Chain graph models and their causal interpretations. Journal of the Royal Statistical Society. Series B, Statistical Methodology, 64(3):321–348, 2002.
• Lauritzen and Wermuth (1989) S. Lauritzen and N. Wermuth. Graphical models for associations between variables, some of which are qualitative and some quantitative. The Annals of Statistics, 17(1):31–57, 1989.
• Lauritzen et al. (1990) S. Lauritzen, A. P. Dawid, B. N. Larsen, and H.-G. Leimer. Independence properties of directed markov fields. Networks, 20(5):491–505, 1990.
• Ma et al. (2008) Z. Ma, X. Xie, and Z. Geng. Structural learning of chain graphs via decomposition.

Journal of Machine Learning Research

, 9:2847–2880, 2008.
• Marchetti and Lupparelli (2008) G. Marchetti and M. Lupparelli. Parameterization and fitting of a class of discrete graphical models. COMPSTAT: Proceedings in Computational Statistics. P. Brito. Heidelberg, Physica-Verlag HD, pages 117–128, 2008.
• Marchetti and Lupparelli (2011) G. Marchetti and M. Lupparelli. Chain graph models of multivariate regression type for categorical data. Bernoulli, 17(3):827–844, 2011.
• Pearl (2009) J. Pearl. Causality. Models, reasoning, and inference. Cambridge University Press, 2009.
• Pearl and Paz (1987) J. Pearl and A. Paz. Graphoids: a graph based logic for reasoning about relevancy relations. Advances in Artificial Intelligence II Boulay, BD, Hogg, D & Steel, L (eds), North Holland, Amsterdam, pages 357–363, 1987.
• Peña (2014) J. M. Peña. Learning marginal AMP chain graphs under faithfulness. European Workshop on Probabilistic Graphical Models PGM: Probabilistic Graphical Models, pages 382–395, 2014.
• Peña (2015) J. M. Peña. Every LWF and AMP chain graph originates from a set of causal models. Symbolic and quantitative approaches to reasoning with uncertainty, Lecture Notes in Comput. Sci., 9161, Lecture Notes in Artificial Intelligence, Springer, Cham, pages 325–334, 2015.
• Peña (2016) J. M. Peña. Learning acyclic directed mixed graphs from observations and interventions. Proceedings of the Eighth International Conference on Probabilistic Graphical Models, PMLR, 52:392–402, 2016.
• Peña (2018) J. M. Peña. Reasoning with alternative acyclic directed mixed graphs. Behaviormetrika, pages 1–34, 2018.
• Peña et al. (2014) J. M. Peña, D. Sonntag, and J. Nielsen. An inclusion optimal algorithm for chain graph structure learning. In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, pages 778–786, 2014.
• Richardson (1998) T. S. Richardson. Chain graphs and symmetric associations. In: Jordan M.I. (eds) Learning in Graphical Models. NATO ASI Series (Series D: Behavioural and Social Sciences), vol 89, pages 229–259, 1998.
• Richardson (2003) T. S. Richardson. Markov properties for acyclic directed mixed graphs. Scandinavian Journal of Statistics, 30(1):145–157, 2003.
• Richardson and Spirtes (2002) T. S. Richardson and P. Spirtes. Ancestral graph markov models. The Annals of Statistics, 30(4):962–1030, 2002.
• Roverato (2017) A. Roverato. Graphical Models for Categorical Data. Cambridge University Press, 2017.
• Sadeghi and Lauritzen (2014) K. Sadeghi and S. Lauritzen. Markov properties for mixed graphs. Bernoulli, 20(2):676–696, 2014.
• Sadeghi and Wermuth (2016) K. Sadeghi and N. Wermuth. Pairwise markov properties for regression graphs. Stat, 5:286–294, 2016.
• Sonntag (2014) D. Sonntag. A Study of Chain Graph Interpretations (Licentiate dissertation)[https://doi.org/10.3384/lic.diva-105024]. Linköping University, 2014.
• Sonntag and Peña (2012) D. Sonntag and J. M. Peña. Learning multivariate regression chain graphs under faithfulness. Proceedings of the 6th European Workshop on Probabilistic Graphical Models, pages 299–306, 2012.
• Spirtes et al. (2000) P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction and Search, second ed. MIT Press, Cambridge, MA., 2000.
• Studený (1989) M. Studený. Multiinformation and the problem of characterization of conditional independence relations. Problems of Control and Information Theory, 18:3–16, 1989.
• Studený (1997) M. Studený. A recovery algorithm for chain graphs. International Journal of Approximate Reasoning, 17:265–293, 1997.
• Studený (2005) M. Studený. Probabilistic Conditional Independence Structures. Springer-Verlag London, 2005.
• Wermuth and Cox (2004) N. Wermuth and D. R. Cox. Joint response graphs and separation induced by triangular systems. Journal of the Royal Statistical Society. Series B, Statistical Methodology, 66(3):687–717, 2004.
• Wermuth and Sadeghi (2012) N. Wermuth and K. Sadeghi. Sequences of regressions and their independences. Test, 21:215–252, 2012.