 # On Functions of Markov Random Fields

We derive two sufficient conditions for a function of a Markov random field (MRF) on a given graph to be a MRF on the same graph. The first condition is information-theoretic and parallels a recent information-theoretic characterization of lumpability of Markov chains. The second condition, which is easier to check, is based on the potential functions of the corresponding Gibbs field. We illustrate our sufficient conditions at the hand of several examples and discuss implications for practical applications of MRFs. As a side result, we give a partial characterization of functions of MRFs that are information-preserving.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Since the late 1950s, researchers have actively investigated properties of functions of Markov chains. In particular, considerable effort has been devoted to obtain sufficient and necessary conditions for lumpability, the rare scenario in which a function of a Markov chain has the Markov property [1, 2, 3].

In this work, we extend the concept of lumpability and its investigation to Markov random fields (MRFs). Specifically, given a MRF on a graph , we determine conditions for a set of functions such that the transformation is a MRF on a graph that is a subgraph of .

Aside from being an interesting problem in its own regard, our endeavor is also practically motivated from the perspective of inference. Multidimensional data is often modeled as a hidden MRF, i.e., the data

is hidden and can be inferred from some observed random variable

, where each is conditionally independent of given .

In some scenarios, however, not is of interest but its transformation . For example, in image processing, in which is a graph on a lattice with a distance-based neighborhood structure and in which and denote the true and observed pixel values, respectively, one may be interested in subsampling the image, clustering regions of the image, or quantizing pixel values for the sake of identifying regions with similar intensities. Transforming to potentially creates additional or breaks existing dependencies, i.e., the graph w.r.t. which is a MRF is generally different from . Rather than inferring from the observed and subsequently computing via the known transformations, in this work, we are interested in scenarios where is directly inferred from . This is computationally tractable if turns out to be a hidden MRF itself. Among other things, this requires determining the graph w.r.t. which is a MRF.

The remainder of this paper can be summarized as follows. Section II introduces notation and basic definitions, and Section III formulates the problem and provides some examples. Section IV places the current work in context with previous results on stochastic transformations of MRFs [4, Sec. IV] and subfields of MRFs [5, 6, 4]. Section V gives two sufficient conditions for to be a MRF on the same graph as , i.e., for . The first condition is based on the characterization of MRFs via clique potentials, while the second is information-theoretic and resembles the information-theoretic characterization of Markov chain lumpability [3, Th. 2]. As a side result, Section VI presents necessary and sufficient conditions for the transformation to have the same information content as . For the sake of readability, proofs are deferred to Section VII.

## Ii Notation and Preliminaries

Let be an undirected graph with vertices and edges , where is the set of two-element subsets of . We call complete if , chordal if every induced cycle of has length three, a tree if is connected and acyclic, and a path if there is a permutation of the vertices such that . If , then the vertices and are neighbors, and we use to denote the neighbors of , i.e.,

 Ni:={j∈V∖{i}: {i,j}∈E}. (1)

A set is called a clique if it is a singleton or if . We use to denote the set of cliques of .

We denote random variables (RVs) by upper case letters, e.g., , alphabets by calligraphic letters, e.g., , and realizations by lower case letters, e.g.,

. We assume that all our RVs are defined on a common probability space

. Specifically, let be a discrete RV with alphabet that is associated with vertex . For a set , we write and . We furthermore use the abbreviations and , and similarly for the alphabets of these RVs. The RV is characterized by its probability mass function (PMF)

 pXA(xA):=P({ω∈Ω: XA(ω)=xA}),∀xA∈XA. (2)
###### Definition 1.

Let be a graph and be a RV with PMF , then is a Markov random field (MRF) on , abbreviated is a -MRF, if

 ∀i∈V:pXi|Xi/=pXi|XNi, (3)

i.e., if the distribution of depends on the other RVs only via the RVs neighboring . If is unspecified, but known to belong to a family of distributions for which (3) holds for every member, then we say that is a -MRF.

For any , the entropy of is defined as

 H(XA):=−∑xA∈XApXA(xA)logpXA(xA) (4)

and the conditional entropy of given as . With this notation, the lemma below follows immediately from Definition 1.

###### Lemma 1.

is a -MRF if and only if (iff), for every ,

Note that if is a -MRF, then it is a MRF on every graph with vertices whose edge set is a superset of . Trivially, every is a MRF on the complete graph. Of particular interest is thus the minimal graph w.r.t. which is a MRF. We will assume throughout this paper that the graph w.r.t. which is a MRF is minimal.

## Iii Problem Statement and Motivating Examples

In this work, we consider functions of MRFs. Specifically, let (subsequently abbreviated as to simplify notation) be a set of functions indexed by the vertices , and let . For , we define the function as the functions , , applied to coordinate-wise, i.e., , and, as before, use the abbreviation . We call a set of functions non-trivial if at least one function is non-injective. Given a -MRF and a set of functions , we call the tuple the lumping of . We will focus on the following two problems:

###### Problem 1 (Lumpability).

Determine conditions on the lumping so that is a MRF w.r.t. , where in this case we say is lumpable, see Fig. 1. By the remark below Lemma 1, is lumpable whenever it does not introduce new edges, i.e., whenever is a -MRF with and

###### Problem 2 (Information Preservation).

Determine conditions on the lumping so that , where in this case we say is information-preserving.

Throughout this work we assume the set of functions is non-trivial. Otherwise, if all the functions are injective, then and would have the same distribution, aside from a relabeling of its domain, and so the lumping would be trivially lumpable and information preserving. We also assume that is connected. This is w.l.o.g., since the RVs of different components of the graph are independent, and this independence is retained for any set of functions .

To get some intuition on why a function of a MRF may not be a MRF on the same graph, note that and are conditionally independent given only when contains all the information about that is available in . Taking a function of may reduce this information to a point where no longer contains all the information about that is available in , which effectively introduces edges in the minimal graph for that have not been present in . This parallels the fact that a function of a Markov chain rarely results in a Markov chain [1, Th. 31]. (A Markov chain is a -MRF where is the infinite path graph, i.e., with the natural numbers as the set of vertices and as the set of edges.) Regarding information-preservation, a lumping is information preserving iff maps the support of injectively. Thus, both lumpability and information-preservation appear to be the exception rather than the rule. The following examples demonstrate different lumpability and information-preservation scenarios and give some intuition on the corresponding lumpings .

###### Example 1 (Neither Information-Preserving nor Lumpable).

Let be a Markov path, i.e., a -MRF on the path graph , where each RV takes values from . Suppose that , , and . For all other configurations, assume and are positive. Let for every , then one can verify that , while . Thus, and are not conditionally independent given , and so the minimal graph for contains the new edge , i.e., the lumping is not lumpable. (In this example the minimal graph for is the complete graph, see Fig. 1.) Furthermore, since, e.g., and both have positive probabilities, but are mapped to the same , the lumping is not information-preserving.

###### Example 2 (Information-Preserving but not Lumpable).

Let and , where , , and are mutually independent RVs. It follows that is a Markov path as in the previous example with edges . Assume and are the identity functions and . Since is constant, and are conditionally independent given iff and are independent, which is not true due to the coupling through . (Assuming is strictly positive.) Hence, the lumping is not lumpable since the minimal graph for must contain the edge , which is not in . (Indeed, is a MRF w.r.t. the graph .) Furthermore, one can show that iff and that iff , hence contains the same information as , i.e., the lumping is information-preserving.

###### Example 3 (Lumpable and Information-Preserving).

Let , where , , and are mutually independent. Then, we have the Markov path again, where the PMF satisfies

 pX(x1,(z1,z2,z3),x3)={pX1(x1)pZ2(z2)pX3(x3),x1=z1,x3=z30,else. (5)

Now suppose that and are the identity mappings and that is such that . Obviously, the thus defined RVs , , and are independent, i.e., is a MRF on the empty graph, and so is lumpable. Furthermore, it is clear that , and so the lumping is information-preserving.

## Iv Previous Work on MRFs

Yeung et al. characterized MRFs using the -measure [5, 6]. Specifically, if is a -MRF and , they investigated the minimal graph on which is a MRF. They showed that contains if either or if there is a path between and in of which all intermediate vertices lie in , see [5, Th. 5] or [6, Th. 8]. More generally, Sadeghi  characterized probabilistic graphical models, admitting mixed graphs with directed, doubly-directed, and undirected edges, and presented an algorithm that generates a corresponding graph for a subset of the vertices of , cf. [7, Algorithm 1]. With the restriction to undirected graphs, this algorithm terminates with as discussed in .

Much earlier, Pérez and Heitz investigated this problem from a Gibbs field perspective, i.e., using potential functions. They showed that is a -MRF [4, Th. 2], but that is only minimal if additional conditions are fulfilled [4, Th. 3]. The authors applied this to decimating lattices and to restrictions on tree-like graphs (e.g., they chose corresponding to a hierarchy level of ). Moreover, they investigated coarsening using the renormalization group approach.

Below we clarify some connections between previous works and the current one. Given a MRF w.r.t. a graph and a transformation of to , assume the joint RV is a MRF on a graph . (According to the problem formulation in Problem 1, such a graph is not needed in the current paper and only assumed in this paragraph to facilitate discussions relative to previous works.) The vertex set of this graph is the disjoint union of the vertices of and a set of vertices associated with , and the edge set is obtained from the edges of and the transformation . Determining on which graph the RV is a MRF can then be done by applying [5, Th. 5] or [4, Th. 2 & 3] to for the subset of vertices that are associated with . With this setup, the primary distinctions between previous works and the current one are the following: [5, 6] make no assumptions on ,  assumes is strictly positive, and this work assumes

 pY|X(y|x)=∏i∈VI[gi(xi)=yi], (6)

where is the indicator function, i.e., factors as the product of degenerate distributions that account to the fact that is a deterministic function of .

Unfortunately, Problem 1 cannot be solved with the framework in  since the conditional distribution (6) is not strictly positive, nore can it be solved using [5, 6] since the framework therein finds a graph that is minimal for any (in fact for any ) and any in the family of distributions specified by . In contrast, here we are given a fixed set of transformations and (often) a fixed distribution . Indeed, if is connected, then [5, Th. 5] leads to being complete. In other words, for any MRF on a connected graph, [5, Th. 5] states that one can find a PMF and a set of functions (more precisely as the theorem does not assume deterministic mappings) such that does not satisfy any conditional independence statements. (See Example 1 for an explicit choice of and in the case of the Markov path .)

Little work has been done regarding information-preserving lumpings of a MRF, see Problem 2. A work in a related direction is , which shows that under certain conditions the entropy can be bounded from above by the entropy of a MRF w.r.t. the subgraph of induced by the vertex subset .

## V Sufficient Conditions for MRF Lumpability

Below we investigate Problem 1, namely, we determine sufficient conditions for the lumping to be lumpable. Note that according to Problem 1, is lumpable if is a -MRF, even if is not minimal for . We further assume within this section that for every

, which allows the characterization of a MRF via its connection to Gibbs fields. (Despite this assumption, the joint distribution

is not strictly positive, see (6).) Specifically, let be a potential function. We abuse notation and extend the domain of to , i.e., for we write , where . The following lemma gives the characterization required in this section.

###### Lemma 2 (Hammersley-Clifford ).

is a -MRF satisfying for every iff there exists a family of potential functions such that

 ∀x∈X:pX(x)=1Z∏C∈CψC(x) (7a) where Z=∑x∈X∏C∈CψC(x). (7b)

Since the potential functions in the family are defined on cliques, we call a clique potential. Note that the choice of is not unique. Indeed, Lemma 2 may be satisfied with a subset of potential functions being identically one.

For a non-trivial set of functions , is a -MRF iff we can find a family of potential functions such that, for every

 Z⋅pY(y)=Z⋅∑x∈g−1(y)pX(x)=∑x∈g−1(y)∏C∈CψC(x)=∏C∈CUC(y) (8)

where is the partition function from (7b). Such a family can obviously be found if, for all , the family is constant on the preimage . Specifically, if for every and for every we have

 ψC(x)=ψC(x′), ∀x,x′∈g−1(y), (9)

then we can define as this common value, multiplied with a constant that depends on , and thus obtain a family of potential functions which ensures that is a -MRF. The remainder of this section will give milder conditions than (9) that gurantee lumpability.

For any clique that contains vertex , we say depends on only via if for all and

 ψC(xi/,xi)=ψC(xi/,x′i),∀xi/∈Xi/, (10)

otherwise, we say strictly depends on . The following result will assume that for every vertex there is at most one clique potential that is allowed to strictly depend on . For all , let denote such a clique. (If no potential function strictly depends on then is chosen as any clique involving .) We can view this as a mapping that assigns to each vertex the unique clique that may strictly depend on , which in effect partitions into equivalence classes , , such that all the vertices are assigned the same clique . For convenience, the clique , common to all , will be denoted .

###### Proposition 1.

Assume is a -MRF characterized by a family of potential functions such that, for all , there is at most one clique whose potential may strictly depend on , then is a -MRF.

Moreover, with and as above, the -MRF is characterized by the family of potential functions, where

 UC′(Vℓ)(g(x))=∑x′Vℓ∈g−1Vℓ(gVℓ(xVℓ))ψC′(Vℓ)(x′Vℓ,xV∖Vℓ) (11a) for ℓ=1,…,L, and UC(g(x))=ψC(x),∀C∈C∖∪j∈VC′(j). (11b)
###### Corollary 1.

If (9) holds, then Proposition 1 is trivially fulfilled. In this case, is any clique of which is a member and (11a) simplifies to

 UC′(Vℓ)(g(x))=|g−1Vℓ(gVℓ(xVℓ))|⋅ψC′(Vℓ)(x). (12)

Since, even for a fixed joint PMF , the family of potential functions is not unique, is a -MRF if we can find at least one family of potential functions that characterizes and for which Proposition 1 holds.

###### Example 4.

Let be a Markov path and fix a set of functions . Suppose that , for , are arbitrary, and and for some and . Thus, only may strictly depend on , and so Proposition 1 applies. Now, the same PMF can be characterized using the potentials , , , , and . Assuming strictly depends on , then both and strictly depend on , and so the condition in Proposition 1 is violated.

Proposition 1 restricts the number of cliques whose potential functions strictly depend on ; it does not restrict the number of components of on which the potential function of a given clique may strictly depend on. The following proposition characterizes the scenario in which every clique potential strictly depends on at most one element of .

###### Proposition 2.

Let be a -MRF as in Proposition 1 with for every pair of distinct vertices , then

 H(Yi|YNi)=H(Yi|XNi), ∀i∈V. (13)

There is some similarity between (13) and an information-theoretic sufficient condition for the lumpability of an irreducible and aperiodic Markov chain (see  for terminology). Suppose that is stationary, i.e., the alphabets of are all the same, for every , and the initial distribution coincides with the unique distribution invariant under the one-step conditional distribution . If further all the functions are identical, i.e., , , for some function , then one can show that the tuple is lumpable if [3, Th. 2]

 H(Yi|Xi−1)=H(Yi|Yi−1). (14)

(By stationarity, it suffices that (14) holds for any .) Indeed, the main difference between (13) and (14) is that the latter is conditioned on only a subset of the neighbors, which corresponds to the case in which is directed, i.e., for . The following proposition shows that, for undirected graphs, (13) takes the place of (14) in a sufficient condition for lumpability.

###### Proposition 3.

Let be a -MRF. If, for every ,

 H(Yi|YNi)=H(Yi|XNi) (15)

then is a -MRF.

Equation (15) gives an intuitive interpretation for lumpability of MRFs: If (but not only if, see Example 5 below) the neighbors of are not more informative about the outcome of than the function of these neighbors, then is a -MRF. In other words, is a -MRF if the lumping is such that captures all information in that is relevant to .

###### Example 5.

Let be a Markov path, i.e., and . Trivially, since is the complete graph, is a -MRF for every set of functions . However, one can construct examples for and such that there exists and a pair such that

 pY2|X1(y2|x1)≠pY2|X1(y2|x′1). (16)

Thus, the condition of Proposition 3 does not hold, showing that it is only sufficient but not necessary.

## Vi Information-Preserving MRF Lumpings

We next briefly talk about information-preserving lumpings of MRFs, see Problem 2. A lumping can only be information-preserving if maps the support of injectively. If the support of coincides with , then only trivial sets of functions , in which every is injective, can be information-preserving. In this section, we therefore drop the assumption that is positive on . However, while it is clear that iff is injective on the support of , this does not imply that any is injective on the support of . In other words, a lumping can be information-preserving even if some or all of the functions are non-injective, i.e., even if for some .

###### Proposition 4.

Let be a -MRF.

• For all graphs , if the lumping is information-preserving, then

 ∀i∈V:H(Xi|Yi,XNi)=0. (17a)
• For chordal graphs , the lumping is information-preserving if there exist a vertex permutation and sets such that

 ∀i∈V:H(Xvi|Yvi,XAvi)=0. (17b)
###### Example 6.

Let , i.e., is a MRF on a path, which is a cordal graph. Assume that and that is non-injective on the support of . Thus, .

Since and , we have and , i.e., the necessary condition for information preservation (17a) holds. However, we have that due to the non-injectivity of , and so (17b) does not hold for the permuations . A similar argument holds for . Thus, the sufficient condition for chordal graphs (17b) is violated.

###### Remark 1.

Let be an irreducible, aperiodic, and stationary Markov chain. The graph w.r.t. which is a MRF is an (infinite) path, which is chordal. Since (under the cboice for all ) and due to stationarity, the sufficient condition in (17b) simplifies to . We thus recover [3, Prop. 4].

While the condition that maps the support of injectively is an equivalent characterization of information-preservation, the conditions in Proposition 4 (that are only necessary or sufficient) have practical justification. Indeed, for alphabets with fixed cardinality, the support of grows exponentially in the number of vertices. In contrast, (17a) requires checking whether maps the support of injectively for every ; the number of parameters characterizing this conditional PMF is exponential only in the size of the neighborhood of , which is much smaller than for sparse graphs. Thus, rather than checking globally, which is exponential in , it suffices to check a computationally less expensive local condition for each .

We finally remark that Proposition 4 holds regardless whether is a -MRF or not, i.e., whether is lumpable or not. A better understanding of the interactions between lumpability and information-preservation, i.e., between Problems 1 and 2, seems to be of practical and theoretical interest. Thus, a closer investigation of these interactions shall be the subject of future work.

## Vii Proofs

### Vii-a Proof of Proposition 1

First, note that if for and the clique potential is constant on the preimages under and , then is also constant on the Cartesian product of these preimages. Indeed, if we have that for all , for all , , and all ,

 ψC(xi,xj,xV∖{i,j}) = ψC(x′i,xj,xV∖{i,j}) (18a) ψC(xi,xj,xV∖{i,j}) = ψC(xi,x′j,xV∖{i,j}) (18b) then we also have ψC(xi,xj,xV∖{i,j})=ψC(x′i,xj,xV∖{i,j})=ψC(xi,x′j,xV∖{i,j})=ψC(x′i,x′j,xV∖{i,j}). (18c)

We write

 Z⋅pY(y)=∑x∈g−1(y)∏C′(i):i∈VψC′(i)(x)∏C∈C∖∪j∈VC′(j)ψC(x) (19)

where the second product is a product over cliques the potentials of which are constant on the preimages under . Furthermore, for the second product we note that, since the clique potential is constant on the preimages of under , we can define a potential function via setting . Thus, we get

 Z⋅pY(y) = ∏C∈C∖∪j∈VC′(j)UC(y) ×∑x∈g−1(y)∏C′(i):i∈VψC′(i)(x) = ∏C∈C∖∪j∈VC′(j)UC(y) ×∑xV1∈g−1V1(yV1)⋯∑xVL∈g−1VL(yVℓ)∏C′(i):i∈VψC′(i)(x) = ∏C∈C∖∪j∈VC′(j)UC(y) ×L∏ℓ=1∑xVℓ∈g−1Vℓ(yVℓ)ψC′(Vℓ)(xVℓ,x′V∖Vℓ)

where is such that . The last equality follows from the fact that, by assumption, strictly depends on , but on only via . This allows us to define clique potentials via setting

 UC′(Vℓ)(g(x)):=∑x′Vℓ∈g−1Vℓ(gVℓ(xVℓ))ψC′(Vℓ)(x′Vℓ,xV∖Vℓ). (20)

Thus,

 Z⋅pY(y)=∏C