Correct Convergence of Min-Sum Loopy Belief Propagation in a Block Interpolation Problem

This work proves a new result on the correct convergence of Min-Sum Loopy Belief Propagation (LBP) in an interpolation problem on a square grid graph. The focus is on the notion of local solutions, a numerical quantity attached to each site of the graph that can be used for obtaining MAP estimates. The main result is that over an N× N grid graph with a one-run boundary configuration, the local solutions at each i ∈ B can be calculated using Min-Sum LBP by passing difference messages in 2N iterations, which parallels the well-known convergence time in trees.

Authors

• 13 publications
• 2 publications
• 1 publication
10/20/2017

Belief Propagation Min-Sum Algorithm for Generalized Min-Cost Network Flow

Belief Propagation algorithms are instruments used broadly to solve grap...
05/26/2021

Convex Combination Belief Propagation Algorithms

We introduce new message passing algorithms for inference with graphical...
03/07/2022

MS2MP: A Min-Sum Message Passing Algorithm for Motion Planning

Gaussian Process (GP) formulation of continuoustime trajectory offers a ...
05/16/2018

Max-Product for Maximum Weight Matching - Revisited

We focus on belief propagation for the assignment problem, also known as...
01/14/2022

Error estimates for harmonic and biharmonic interpolation splines with annular geometry

The main result in this paper is an error estimate for interpolation bih...
12/28/2019

Approximating Subset Sum is equivalent to Min-Plus-Convolution

Approximating Subset Sum is a classic and fundamental problem in compute...
12/02/2012

Gaussian belief propagation (GaBP) is an iterative algorithm for computi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

An abbreviated version of this paper has been submitted to ISIT 2017.Authors addresses: Yutong Wang, Matthew G. Reyes, David L. Neuhoff, EECS Dept., University of Michigan; email: {yutongw, mgreyes, neuhoff}@umich.edu.

his paper demonstrates the correct convergence of Loopy Belief Propagation (LBP) in the MAP interpolation of a block of sites given a configuration on its boundary, in the context of a uniform Ising Markov random field. There has been considerable work in analyzing the performance of LBP in the context of maximization problems, for example [1, 2, 3, 4]. This paper presents both a new problem setting for Belief Propagation (BP) and a new method of analysis.

In the context of Markov models, a very natural setting is that of MAP estimation of a subset of sites conditioned on a configuration on its boundary, as the Markov property itself tells us that a subset of sites is conditionally independent of all other sites if we know the configuration on the subset’s boundary. Markov models are often expressed as products of functions on single nodes and edges of the associated Markov graph. Thus by taking the negative logarithm of the probability, MAP estimation can be formulated as what is referred as a

min-sum problem, that of finding configurations that minimize a sum of functions defined on single nodes and edges of the graph. Belief Propagation is a recursive distributed algorithm that can be applied to a min-sum problem.111Belief Propagation in the context of MAP estimation is more often studied as a max-product problem, the variant obtained without taking the negative logarithm.

The Markov model considered in this paper is a uniform Ising model with positive correlation on a square grid graph with edges connecting horizontally and vertially adjacent nodes and with nodes assigned values (black) or (white) [5]. This is a single-parameter binary model that favors configurations in which neighboring nodes have the same value. Edges on which the two endpoints have different values are called odd bonds. Our problem is to MAP estimate the configuration on a subset of sites conditioned on a boundary configuration . In this context, MAP estimation amounts to finding configurations that minimize the sum

 O(xB,x∂B)=∑{i,j}:i∈BI(xj≠xi) (1)

of odd bonds over all edges in the graph with at least one endpoint in . This problem arose in the context of an image compression application modeling binary images as instances of such an Ising model [6] and also in the context of grayscale image reconstruction [7]. Analytical solutions for the set of MAP configurations conditioned on boundary configurations containing 2 or 4 odd bonds have been found [8, 6]. Such boundaries are termed, respectively, 1-run and 2-run boundaries. The MAP configurations on a block conditioned on its boundary are referred to as global solutions for the boundary.

Min-Sum LBP is a popular distributed message-passing algorithm for minimizing a sum of functions defined on edges of a graph. As a distributed algorithm, it does not attempt to compute global solutions, but rather, for each site, the minimum numbers of odd bonds in configurations where site is black () or white (),

 O∗i(±1,x∂B):=minxB:xi=±1O(xB,x∂B). (2)

These minimum numbers of odd bonds provide some information regarding the set of global solutions. For example, if , then we can say that site has value -1 in all global solutions, whereas if , site has value 1 in all global solutions. On the other hand, if , what we can say is that there exists a global solution in which site has value -1, and there exists a global solution in which site has value 1. Moreover, as pointed out in [2] when there are multiple sites such that , a joint configuration on these sites cannot be chosen independently of each other.

In practice, the messages are normalized to prevent numerical overflow. As a result, the goal of BP becomes computing the difference

 o∗i(x∂B)=O∗i(−1,x∂B)−O∗i(1,x∂B), (3)

which we refer to as the local solution at site given boundary configuration . At the -th iteration of message-passing, an estimate of the local solution at site is produced. If were a tree, i.e., an acyclic graph, the usual argument of the correct convergence of BP on trees could be adapted here to show that converges . For cyclic graphs such as the grid graphs considered in the present paper, general convergence is unknown except in special cases such as when the graph is a single cycle [1]. However, it was observed in [8] that empirically LBP converged to the correct local solutions for a 1-run boundary.

It is our belief that LBP can be an effective distributed algorithm for the MAP interpolation problem posed here. While a complete understanding of the correct convergence properties of LBP is currently beyond our means, in this paper we prove in Theorem 6 that it correctly converges in the case where is an grid graph with horizontal and vertical edges with a one-run boundary configuration. Specifically, we show that the local solution at each can be calculated using Min-Sum Belief Propagation by passing difference messages in iterations. We define the Forward and Backward Convergence Property, which are crucial for our analysis of the convergence. To verify the correctness of the converged results of LBP, we use Proposition 4, which is proven by leveraging the results in [6]. Thus, the results of this paper demonstrate that at least in the case of one-run boundaries, LBP converges to the correct local solution in what amounts with a minimal number of iterations. We hope our work here gives some theoretical justification for using LBP and local solutions for interpolation problems beyond this setting.

The remainder of this paper is as organized as follows. In Section 2, we introduce background on graphs, the boundary interpolation problem, and Belief Propagation. In Section 3, we introduce our message recursion, local solutions, and state what the correct local solutions are. In Section 4, we introduce and discuss the concepts of forward and backward convergence used to prove our results. In Section 5, we present the proof of our main result, Theorem 6.

2 Background and Problem Formulation

We introduce notation on graphs, configurations, etc. Let and . Edges in an undirected graph are written as . In a directed graph, edges are written as . For an undirected graph and a subset , we let . Abusing notation, we often refer to a subset as if it is a subgraph. For instance, the statement “suppose is connected” means “suppose the -induced subgraph on is connected”. Another abuse of notation is .

2.1 Grid graphs, configurations, and odd bonds

In this subsection, we define the setting that we work in for the majority of the paper. Let be the grid graph with the 4-neighbor topology in which the sites are arranged in a square lattice and the edges consist of horizontally and vertically adjacent sites of . Two sites connected by an edge are referred to as neighbors. The interior is the set of sites having four neighbors. The set is the boundary of , i.e., .

For each site , is an assignment to site . An assignment to a set of sites is called a configuration and denoted . For concreteness, (resp. ) means that site is colored black (resp. white). An edge with is called an odd bond. A configuration on is called a boundary configuration. Finally, we define one-run boundaries:

Definition 1 (One-run boundary).

Let be a boundary configuration. Define and define similarly. We say that is a one-run configuration if is a connected subgraph.

2.2 MAP Estimation and Global Solutions

Given and interior configuration , the quantity as defined by equation (1) is the number of odd bonds within and between and . The global minimum number of odd bonds between an interior configuration and the given boundary configuration is

 O∗B(x∂B):=minxBO(xB,x∂B).

In [8, 6], all MAP solutions were found for all one-run boundary configurations and at least one MAP solution was for found for every two-run boundary configuration. Specifically, for boundaries consisting of one-run of white and one-run of black, the MAP solutions consisted of configurations generated by a shortest path connecting the endpoints of either runs. In this work, we refer to a MAP solution as a global solution.

2.3 Belief Propagation

We first review BP for interpolating from the boundary in the context of a tree. For now, let be an arbitrary tree apart from the grid graph currently under consideration. Let be a subtree and a boundary configuration. Consider any two adjacent nodes . Removing edge from disconnects into two connected components. Let be the connected component containing . Define to be the minimal number of odd bonds in over all possible configurations on plus the odd bond between and , if any. More precisely,

 Mj→i(−1)=I(xj≠1)+minxTijO(xTij,x∂B).

Recall as defined by equation (2). Below, we suppress the dependency on and simply write . Likewise, we write . Since is a tree, it is easy to see that could be expressed as

 O∗i(−1)=∑j∈∂iMj→i(−1). (4)

We define to be the message from site to site . Using a recursive argument, it is straightforward to show

 Mj→i(xi) = minxj∈{±1}⎧⎨⎩I(xi≠xj)+∑k∈∂j∖iMk→j(xj)⎫⎬⎭

The above recursion relation induces a message-passing algorithm:

Definition 2.

Given a boundary configuration , for each edge such that or define

Boundary condition: for all , if ,

Initialization: if ,

Update: If and , then

 Mnj→i(xi):=minxj∈{±1}⎧⎨⎩I(xi≠xj)+∑k∈∂j∖iMn−1k→j(xj)⎫⎬⎭

Since the graph considered in this subsection is acyclic, this algorithm is referred to as Belief Propagation (BP), or more specifically, as Min-Sum BP. After a number of iterations equal to the length of the longest path in , equation (4) permits computation of for each site in . For cyclic graphs, such as the graph considered in this paper as defined in Section 2.1, the algorithm above can still be used and is referred to as Loopy Belief Propagation.

3 Difference messages and local solutions

For the remainder of this paper, we are in the setting of Section 2.1. To avoid numerical overflow, it is standard practice to normalize the messages in LBP. In our case, we pass the difference messages which satisfy the folowing recursion.

Lemma 3 (Difference Messages).

Definition 2 induces the following message-passing dynamics on the difference messages as follows: Given a boundary configuration , for each edge such that or we have

Boundary condition: for all , if ,

Initialization: if ,

Update: If and , then

 mnj→i:=sign⎧⎨⎩∑k∈∂j∖imn−1k→j⎫⎬⎭

where for and .

Proof.

For the boundary condition, if , then .

There is nothing to check for the initialization.

For the update, fix an and an edge such that . For , let By definition, we have

 mnj→i =Mnj→i(1)−Mnj→i(−1) =minz∈{±1}{I(1≠z)+Φ(z)} −minz∈{±1}{I(−1≠z)+Φ(z)} =min{Φ(1),1+Φ(−1)} −min{1+Φ(1),Φ(−1)}

From the above and the fact that maps into , one gets

 mnj→i =⎧⎨⎩−1Φ(1)>Φ(−1)1Φ(1)<Φ(−1)0Φ(1)=Φ(−1) =sign{Φ(−1)−Φ(1)}

By definitions, Thus, we are done. ∎

By passing difference messages rather than the original messages we are unable to compute the quantities and . Nevertheless, we can use the difference messages to estimate the local solutions as defined by equation (3). In Theorem 6, we show that these estimates indeed converge to the truth. Below, we make the dependence of on the boundary condition implicit and simply write .

Before discussing the convergence of Min-Sum LBP, we first use the global solutions found in [8, 6] to directly compute the local solutions for sites . Using these local solutions, we will in Section 5 show that Min-Sum LBP converges to the correct local solutions for 1-run boundaries.

Let be a one-run boundary. A positive simple path is a subset of such that

1. The subgraph is a path,

2. are the two endpoints of ,

3. for , either or, if , ,

4. is minimal satisfying the above properties.

A negative simple path is defined in the same way by replacing with in item 3 above. Define the positive inner (resp. outer) path (resp. ) be the positive simple path that minimizes (resp. maximizes) over all positive simple paths the number of nodes enclosed by . Similarly, define and . See examples in Figure 1.

Define to be the set together with the set of nodes enclosed by them. Similarly, define , and . We use the calligraphic font to denote intersection of these sets with , e.g. , and so on. We call the positive inner region the positive outer region, and so on. Define and . Note that is not the same as , with which we are not concerned. The following proposition gives the local solutions over regions that we have defined.

Proposition 4.

For a one-run boundary ,

 o∗i(x∂B)=⎧⎪⎨⎪⎩±4:i∈I±∖δI±±2:i∈δI±0:i∈O+∩O−

are the local solutions for sites .

Proof.

Let . Thus , which means that there is a MAP configuration, namely the one generated by the black outer path, in which site is black. Likewise, , which means that there is a MAP configuration in which site is white. Therefore, , and hence .

Now let . Since , site is black in all MAP configurations, therefore . Moreover, since , in all MAP configurations, every neighbor is also black. Therefore, in any MAP configuration, if we flip site to white, then, this will increase the number of odd bonds by 4. If we also flip a neighbor to white, this will further increase the number of odd bonds by at least 2. Therefore .

Let . Since , site is black in all MAP configurations, therefore . By definition of , there is a MAP configuration in which a neighbor is white. We claim that there can only be one such neighbor . This is because if there were two such neighbors, then we could flip to white and keep the number of odd bonds the same, which would imply that there is a MAP configuration in which is white. If there were more than two such neighbors , then flipping to white would strictly decrease the number of odd bonds, which contradicts . Thus, .

The remaining two cases follow from arguments analogous to those in the previous two paragraphs. This completes the proof. ∎

For each , define to be the estimates of the local solution . Applying the usual arguments for correct convergence of BP on trees (see [9] for instance) gives the following proposition

Proposition 5.

Let be a tree and be a subtree, and be any boundary configuration. Then the difference messages converge and the estimates converge to in number of steps equal to the diameter of the tree plus 1.

We now present our main result.

Theorem 6.

Let and be as defined in Section 2.1, and be a one-run boundary configuration. Then for every edge in with , the difference messages converge and the estimates converge to in number of steps equal to .

We note that the diameter of is rather than . Thus, Theorem 6 parallels Proposition 5.

4 Forward and backward convergence

In this section, we present the two technical lemmas, the Forward and Backward Convergence Lemmas, that allow us to prove the convergence of the difference messages in the setting of Section 2.1. The main idea is that convergence of messages on certain rectangular subsets of takes place in two phases, forward from the corners of the graph, and then backward towards the boundary.

Recall that we are in the setting of Section 2.1. We identify with the point set in the usual way, i.e., is the bottom-left corner, is the top-left corner, and so on. We represent nodes in in two ways. The first way uses (with possible subscripts) to represent a vertex in a coordinate-free way, e.g., . The second way uses (again, with possible subscripts) to represent a vertex in coordinates, e.g.  where . At times, we will say “let be a vertex” to simultaneously refer to both representations.

Define to be the directed graph such that the vertex set of and are the same, and for each undirected edge , there are exactly two directed edges and in . Let be the set of directions

north, south, east, and west, respectively. Define corresponding direction vectors

, , and . For each and direction , if , where is just the usual vector summation, then is said to be the neighbor of . For example, is the eastern neighbor of . Given a node and a direction such that , define as For example, .

For a message as defined in Lemma 3, we often drop the superscript and simply write when the time index is not of concern. With this notation, denotes the message from to its eastern neighbors and so on. Now, given a subset and a direction , a message received by from direction is a message of the form for some such that . A message sent from in the direction is a message of the form for some . Note that according to these defintion, a message sent from is not considered to be received by even if is in . See Figure 3 which illustrates the messages sent from and received by where is the set of blue nodes.

The sets for which we are interested in messages received by and sent from are rectangular subsets of defined by the boundary runs. Let and . Define the rectangle with corners at and by

 Ri2i1:={(a,b)∈B: min(α1,α2)≤a≤max(α1,α2), min(β1,β2)≤b≤max(β1,β2)}.

For , define the cut-rectangle of nodes of distance from the corner as

 Ri2i1(D):={(a,b)∈Ri2i1:|a−α1|+|b−β1|≤D−1}.

Define the L-shaped region with corner at to be

 Li2i1:= {(α1,b)∈B:min(β1,β2)≤b≤max(β1,β2)} ∪{(a,β1)∈B:min(α1,α2)≤a≤max(α1,α2)}.

See Figure 2 for examples.

We say the unordered pair of directions is adjacent if and are orthogonal. The set of adjacent pairs of directions is

 A:={(E,N),(E,S),(W,N),(W,S)}.

For two elements , we use the notation to denote an ordered pair of vertices which are not necessarily neighbors.

Definition 7 (Compatible tuples).

Let . We say that the tuple is compatible if and there exists non-negative coefficients such that .

The ordering of is crucial because if , then is compatible implies is not compatible. However, the ordering of and is irrelevant.

Definition 8 (Convergence of messages).

Let and . For a given directed edge , we say that the messages converges in iterations to if for all .

The definition below is the salient feature of the one-run boundary configuration underlying the proof of our main result

Definition 9 (Forward Convergence (FC) Property).

The compatible tuple is forward convergent to at time , abbreviated as , if each message received by from directions and converge in iterations to .

We will refer to the following as the FC Lemma.

Lemma 10 (Forward Convergence).

Let and . Suppose that is . Then all messages sent from in the directions and converge in iterations to .

Proof.

Letting in Lemma 11 (proven next), then . Thus, we have the desired result. ∎

Lemma 10 is essentially a corollary of the lemma below.

Lemma 11.

Let and . Suppose is . Then for all , all edges sent from in the directions and converge in iteration to .

Proof.

For , let be the coordinate representation of . Since is embedded in , rotations and reflections on that preserve induce graph automorphisms on that respects the adjacency of directions. Thus, because the message-passing dynamics defined in Lemma 3 is isotropic, we can assume , and and after applying appropriate rotations and reflections. Relabeling using the rule , we may further assume . Relabeling time, we may assume . The dynamics defined in Lemma 3 is invariant under multiplication by , i.e., we may replace every instance of by . Hence, we may assume . Let .

Given the reductions in the preceding paragraph, we now have only to prove that for each such that and , the messages and converge in iterations to . In other words, for all

 (5) (6)

In order to prove (5) and (6), it suffices to show

 mn−1E(a−1,b)=mn−1N(a,b−1)=1 (7)

because in this case the value of the third term inside the in (5) and (6) above cannot influence the message.

The boundary conditions part of the definition of property translates to the fact that

 mnE(0,b)=mnN(a,0)=1,∀n≥0 (8)

for all and .

We proceed by induction on . For the base case, and . Let . Observe that (7) follows from (8). Thus, (5) and (6) are proven for and .

Now, let and suppose the conclusion of Lemma 11 holds for . Let and . If , then by the induction hypothesis, we’re done. Thus, below, we assume , that is, . Our goal as before is to show (7).

First, consider the case that and . Then and so by the induction hypothesis and so (7) holds.

Next, suppose . Then implies that . From the boundary condition (8), we have . On the other hand, and so , which proves (7).

The case when is analogous to the above argument. This proves the induction step and the lemma. ∎

The following definition builds on the notion of the FC property defined previously.

Definition 12 (Backward Convergence (BC) Property).

Two compatible tuples and are said to be backward convergent to at time , abbreviated as , if both tuples are and there is exactly one direction in common, i.e., . The unique element not in is called the backward convergence (BC) direction.

Lemma 13 (Backward Convergence).

Let and . Suppose and are and let be the BC direction. Then all messages sent from in the direction converge in iterations to .

Proof.

Let for . Without the loss of generality, we consider the case when and . Hence, . Furthermore, we assume and . Now, let . Let . Our goal is to show

 mnW(a,b):=sign{mn−1N(a,b−1)+mn−1S(a,b+1)+mn−1W(a+1,b)}=1 (9)

We first show . Now, if , then by FC Lemma 10. On the other hand, if , then . Since is compatible with by assumption, we have . This shows . Next, since , is a message received by from the direction . Thus, the defining property of being to obtain .

An analogous argument shows . This proves (9). ∎

See Figure 3 for an illustration of Lemma 10 and 13.

5 Proof of Theorem 6

Let be a given one-run boundary. Below, we assume the algorithm has run for iterations, i.e., , so that we can use the FC and BC Lemma. The goal is to show that for any where is as in Proposition 4 and

 ^on(a,b)=mnE(a−1,b)+mnN(a,b−1)+mnW(a+1,b)+mnS(a,b+1).

Without the loss of generality, we assume satisfies:

1. Positive run is contracted: Suppose are such that has degree , and are the two neighbors of , and . In such cases, we always assume . This is because does not touch any nodes in , so the value of does not affect the message-updates.

2. Positive run is smaller i.e., .

A node in with two neighbors in is called a corner. For easier visualization, the four corners are given names: , , and . Define to be the set of corners with two positive boundary neighbors, i.e.,

 C={i∈{sw,se,ne,nw}:|{j∈∂B∩∂i∣xj=+1}|=2}

Notice that because otherwise . Thus, and the proof is correspondingly divided into the three cases.

Below, for brevity, we will write subsets of using probabilist notation, i.e., for a logical statement , let be a shorthand for . For example, is simply written as .

Case . Let be the two end points of such that . Without loss of generality, we assume all the positive boundary conditions are restricted to the left side. More precisely, if and , then . See Figure 4 for an example.

From the definitions, it is easy to see that

 I+∖δI+=δI+=O+∩O−=