 # Solving Non-parametric Inverse Problem in Continuous Markov Random Field using Loopy Belief Propagation

In this paper, we address the inverse problem, or the statistical machine learning problem, in Markov random fields with a non-parametric pair-wise energy function with continuous variables. The inverse problem is formulated by maximum likelihood estimation. The exact treatment of maximum likelihood estimation is intractable because of two problems: (1) it includes the evaluation of the partition function and (2) it is formulated in the form of functional optimization. We avoid Problem (1) by using Bethe approximation. Bethe approximation is an approximation technique equivalent to the loopy belief propagation. Problem (2) can be solved by using orthonormal function expansion. Orthonormal function expansion can reduce a functional optimization problem to a function optimization problem. Our method can provide an analytic form of the solution of the inverse problem within the framework of Bethe approximation.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Boltzmann machine learning, which is known as the inverse Ising problem in statistical mechanics, is one of the important problems in the statistical machine learning field and has a long history. Suppose that we have sample points, i.e., data points, stochastically generated from an unknown distribution (referred to as a generative model). The task of statistical machine learning is to specify the unknown distribution using only the sample points. In standard Boltzmann machine learning, we assume that the generative model that generates data points is an Ising model, and prepare an Ising model (referred to as the learning model) with controllable parameters, e.g., external fields and exchange interactions. The Boltzmann machine learning is achieved by optimizing the values of the controllable parameters in the learning model through maximum likelihood estimation.

Unfortunately, we cannot perform Boltzmann machine learning exactly because of the computational cost. Therefore, many approximations for Boltzmann machine learning have been proposed. In particular, approximations based on mean-field methods have been developed in the field of statistical mechanics Roudi et al. (2009): mean-field approximation Kappen and Rodríguez (1998), Bethe approximation Parise and Welling (2009); Yasuda and Horiguchi (2006); Yasuda and Tanaka (2009); Mézard and Mora (2009); Marinari and Kerrebroeck (2010); Ricci-Tersenghi (2012); Nguyen and Berg (2012); Furtlehner (2013), Plefka expansion Tanaka (1998); Sessak and Monasson (2009), and so on. In many of these methods, we can obtain the solution to the maximum likelihood estimation analytically. However, they are applicable to only an Ising-type learning model, that is, the variables in the model are binary and the energy function of the model is a quadratic form of the variables.

We proposed a method for a more general situation that uses Bethe approximation and orthonormal function expansion Yasuda et al. (2012). Using the method, we can solve the inverse problem with general pair-wise Markov random fields and obtain the solution analytically. However, this method cannot be applied to Markov random fields with continuous variables.

In this paper, we propose a method for solving the inverse problem in general pair-wise Markov random fields with continuous variables, which is an extension of our previous method Yasuda et al. (2012). The proposed method can give us the analytical solution of the inverse problem. This is the main contribution of this paper. In this paper, we refer to a pair-wise Markov random field with continuous variables as a continuous Markov random field (CMRF).

The remainder of this paper is organized as follows. In Sec. II, we explain loopy belief propagation (LBP) in a CMRF. LBP is equivalent to Bethe approximation Yedidia et al. (2005); Pelizzola (2005). We formulate the inverse problem in a CMRF in Sec. III, as well as its Bethe approximation. Our method is shown in Sec. IV. In this section, we derive the solution to the inverse problem using the Bethe approximation shown in Sec. III. Since the solution is obtained in the form of infinite series, it cannot be implemented as it is. We describe a means of implementing our method and show the results of numerical experiments in Sec. V. We conclude the paper with some remarks in Sec. VI.

## Ii Formalism of Loopy Belief Propagation in Continuous Markov Random Field

Consider an undirected graph , where is the set of nodes and is the set of undirected links. We denote the link between nodes and by . Because the links have no direction, and indicate the same link. On the undirected graph, we define the non-parametrized pair-wise energy function as

 Ψ(x):=−∑i∈Vθi(xi)−∑{i,j}∈Ew{i,j}(x{i,j}), (1)

where is the energy on node , is the energy on link , and . We regard as the same function as . With the energy function, we define the CMRF as

 P(x):=1Zexp(−Ψ(x)), (2)

where

represents the continuous random variables over the continuous space

, and is the partition function defined as

 Z:=∫\mcalXnexp(−Ψ(x))dx,

where denotes the multiple integration over whole variables, , and denotes the integral over . and are arbitrary functions of the assigned variables.

Given the CMRF, it is difficult to evaluate its marginal distributions because of the existence of intractable multiple integration. LBP is one of the most effective methods for approximately evaluating marginal distributions and is the same as Bethe approximation in statistical mechanics. LBP can be obtained from the minimum condition of the variational Bethe free energy of the CMRF in Eq. (2). We denote the marginal distribution over by and that over and , which are neighboring pair of nodes, by . These marginal distributions are sometimes called beliefs in the context of LBP. We regard as the same belief as . In the context of the cluster variation method Kikuchi (1951); Yedidia et al. (2005), the variational Bethe free energy of the CMRF is expressed as

 \mcalF[b,ξ] :=∫\mcalXnΨ(x)P(x)dx+∑i∈V(1−|∂i|)∫\mcalXbi(xi)lnbi(xi)dxi+∑{i,j}∈E∫\mcalX2ξ{i,j}(x{i,j})lnξ{i,j}(x{i,j})dxidxj\nn =−∑i∈V∫\mcalXθi(xi)bi(xi)dxi−∑{i,j}∈E∫\mcalX2w{i,j}(x{i,j})ξ{i,j}(x{i,j})dxidxj+∑i∈V(1−|∂i|)∫\mcalXbi(xi)lnbi(xi)dxi\nn\aleq+∑{i,j}∈E∫\mcalX2ξ{i,j}(x{i,j})lnξ{i,j}(x{i,j})dxidxj, (3)

where is the set of nodes connected to node . The variational Bethe free energy is regarded as the functional with respect to and . The beliefs, that minimize the variational Bethe free energy, are regarded as the Bethe approximation of the corresponding marginal distributions. From the extremal condition of the variational Bethe free energy under the normalizing constraints,

 ∫\mcalXbi(xi)dxi=∫\mcalX2ξ{i,j}(x{i,j})dxidxj=1, (4)

and the marginalizing constraints,

 ∫\mcalXξ{i,j}(x{i,j})dxi =bj(xj), (5) ∫\mcalXξ{i,j}(x{i,j})dxj =bi(xi), (6)

we obtain the message-passing equation (MPE)

 mi→j(xj)=1Zi→j∫\mcalXπi∖j(xi)ew{i,j}(x{i,j})dxi, (7)

where the constant is frequently set to

 Zi→j:=∫\mcalX2πi∖j(xi)ew{i,j}(x{i,j})dxidxj (8)

to normalize the messages. The distribution is defined as

 πi∖j(xi):=eθi(xi)∏k∈∂i∖{j}mk→i(xi)∫\mcalXeθi(xi)∏k∈∂i∖{j}mk→i(xi)dxi. (9)

The quantity is the normalized message (or the effective field) from node to node , which is non-negative and originates from the Lagrange multipliers appearing in the conditional minimization of the variational Bethe free energy. The two different messages, and , are defined on link . The beliefs (the approximate marginal distributions) are computed from the messages as

 bi(xi) ∝eθi(xi)∏k∈∂imk→i(xi), (10) ξ{i,j}(x{i,j}) ∝ew{i,j}(x{i,j})πi∖j(xi)πj∖i(xj). (11)

In principle, by solving the MPE in Eq. (7), we can compute the one- and two-variable marginal distributions using Eqs. (10) and (11). However, finding the functional forms of the messages is not straightforward, because the messages are continuous functions over , and therefore, the MPE we have to solve is an integral equation. Some methods that are based mainly on a stochastic method have been developed for approximately solving the MPE Sudderth et al. (2003); Ihler and McAllester (2009); Noorshams and Wainwright (2013).

## Iii Inverse Problem in Continuous Markov Random Field

In this section, we consider the inverse problem, in other words, the machine learning problem, for the CMRF in Eq. (2). The inverse problem for the CMRF can be solved by maximum likelihood estimation. Given data points , we define the log-likelihood functional as

 l[θ,w]:=1NN∑μ=1lnP(x(μ)), (12)

where and are the set of functions and respectively in the exponent in Eq. (2). The goal of the maximum likelihood estimation is to find the functions and that maximize the log-likelihood functional. Eq. (12) can be rewritten as

 l[θ,w] =−1NN∑μ=1Ψ(x(μ))−lnZ. (13)

However, the maximization problem of the log-likelihood functional is intractable because of the existence of the partition function.

To avoid evaluating the intractable partition function, we approximate the log-likelihood functional using LBP, i.e., Bethe approximation. The Bethe approximation of the log-likelihood functional in Eq. (13) can be expressed by using the variational Bethe free energy shown in Eq. (3) as

 l\mrmBethe[θ,w]:=−1NN∑μ=1Ψ(x(μ))+minb,ξ\mcalF[b,ξ]. (14)

We refer to this as the Bethe log-likelihood functional. The main purpose of this study was to maximize the Bethe log-likelihood functional with respect to the functions and . The solution obtained by maximizing Eq. (14), of course coincides to that obtained by the true maximum likelihood estimation when the CMRF has a tree structure, because Bethe approximation is exact in tree systems. However, the maximization of the Bethe log-likelihood functional is not straightforward for the following reasons. The variations of the functional with respect to and are

 δl\mrmBethe[θ,w]δθi(xi) =1NN∑μ=1δ(xi−\mrmx(μ)i)−bi(xi), δl\mrmBethe[θ,w]δw{i,j}(x{i,j}) =1NN∑μ=1δ(xi−\mrmx(μ)i)δ(xj−\mrmx(μ)j)\nn\aleq−ξ{i,j}(x{i,j}),

where and are the beliefs minimizing the variational Bethe free energy, in other words, the solution to the LBP presented in the previous section. This variation means that we have to find and that satisfy the relations

 1NN∑μ=1f(\mrmx(μ)i)=∫\mcalXf(xi)bi(xi)dxi (15)

and

 1NN∑μ=1g(\mrmx(μ)i,\mrmx(μ)j)=∫\mcalXg(xi,xj)ξ{i,j}(x{i,j})dxidxj (16)

for any test functions and . Thus, if we could obtain the solution of the LBP, by using a method that has already proposed Sudderth et al. (2003); Ihler and McAllester (2009); Noorshams and Wainwright (2013), the solution to the maximization of the Bethe log-likelihood functional is not immediately obtained.

## Iv Proposed Method

In this section, we propose a method to solve the maximization problem of the Bethe log-likelihood function in Eq. (14) in terms of orthonormal function expansion. Via orthonormal function expansion, we can reduce the functional maximization problem in the previous section to a tractable function maximization problem. The basic idea of our method is similar to that presented in our previous paper Yasuda et al. (2012).

### iv.1 Orthonormal Function System

Before deriving our method, we introduce an orthonormal function system over satisfying

 ∫\mcalXϕs(x)ϕt(x)dx=δs,t, (17)

where is the Kronecker delta function. By using the orthonormal function system, function over is expanded as

 f(x)=∞∑s=0αsϕs(x), (18)

where the expanding coefficients are given by

 αs=∫\mcalXf(x)ϕs(x)dx. (19)

The orthonormal function expansion in Eq. (18) plays an important role in our method.

In the following, we assume that is the finite space, , and that is constant over , i.e.,

 ϕ0(x)=1√χ,χ:=β−α. (20)

From Eqs. (17) and (20), we have

 ∫\mcalXϕs(x)dx={√χs=00s>0. (21)

Examples of this orthonormal function are described in Appendix A. We use Eqs. (17), (20), and (21) frequently throughout the paper.

The orthonormal function expansion introduced in this section plays a central role in our proposed method described in the following. However, a similar idea can be useful for solving the LBP in Sec. II. Indeed, a method for solving the LBP was proposed by using orthonormal function expansion Noorshams and Wainwright (2013).

### iv.2 Variational Bethe Free Energy with Orthonormal Function Expansion

First, we rewrite the CMRF in Eq. (2) by expanding and . By using the orthonormal function expansion in Eq. (18), the functions and can be expanded as follows.

 θi(xi) =∞∑s=0h(s)iϕs(xi)=∞∑s=1h(s)iϕs(xi)+\mrmconstant (22)

and

 w{i,j}(x{i,j}) =∞∑s,t=0J(s,t){i,j}ϕs(xi)ϕt(xj)\nn =1√χ∞∑s=1(J(s,0){i,j}ϕs(xi)+J(0,s){i,j}ϕs(xj))\nn\aleq+∞∑s,t=1J(s,t){i,j}ϕs(xi)ϕt(xj)+\mrmconstant, (23)

where, from Eq. (19), the expanding coefficients are

 h(s)i :=∫\mcalXθi(xi)ϕs(xi)dxi, (24) J(s,t){i,j} :=∫\mcalX2w{i,j}(x{i,j})ϕs(xi)ϕt(xj)dxidxj. (25)

It is noteworthy that, from the symmetric property of , is satisfied. In Eqs. (22) and (23), Eq. (20) is used. Using Eqs. (22) and (23), we can rewrite the energy function in Eq. (1) as

 Ψ(x)=Ψ†(x;H,J)+C0, (26)

where

 Ψ†(x;H,J) :=−∑i∈V∞∑s=1H(s)iϕs(xi)\nn\aleq−∑{i,j}∈E∞∑s,t=1J(s,t){i,j}ϕs(xi)ϕt(xj) (27)

and

 H(s)i:=h(s)i+1√χ∑j∈∂iJ(s,0){i,j}.

The constant in Eq. (26) originates from the constants in Eqs. (22) and (23). Therefore, using the new energy function, the CMRF in Eq. (2) can be rewritten as

 P(x)=P(x∣H,J)∝exp(−Ψ†(x;H,J)). (28)

This rewriting makes the CMRF the parametric model, parameterized by

and . In Eq. (28), the constant in Eq. (26) is neglected, because it is irrelevant to the distribution.

Now, we introduce the orthonormal function expansions of the beliefs in the variational Bethe free energy, as follows.

 bi(xi) =∞∑s=0c(s)iϕs(xi), (29) ^ξ{i,j}(x{i,j}) =∞∑s,t=0d(s,t){i,j}ϕs(xi)ϕt(xj). (30)

From Eq. (19), the expanding coefficients are

 c(s)i :=∫\mcalXbi(xi)ϕs(xi)dxi, (31) d(s,t){i,j} :=∫\mcalX2ξ{i,j}(x{i,j})ϕs(xi)ϕt(xj)dxidxj. (32)

The beliefs must satisfy the normalizing constraints in Eq. (4) and the marginalizing constraints in Eqs. (5) and (6). From Eqs. (4), (20), (31), and (32), we have

 c(0)i=1√χ,d(0,0){i,j}=1χ.

From Eqs. (5), (6), (20), (31), and (32), we obtain

 d(s,0){i,j}=c(s)i√χ,d(0,s){i,j}=c(s)j√χ.

From the above equations, the beliefs in Eqs. (29) and (30) can be expressed as

 bi(xi) =1χ+∞∑s=1c(s)iϕs(xi)=:bi(xi∣c), (33) ξ{i,j}(x{i,j}) =1χ2+1χ∞∑s=1(c(s)iϕs(xi)\nn\aleq+c(s)jϕs(xj))+∞∑s,t=1d(s,t){i,j}ϕs(xi)ϕt(xj)\nn =:ξ{i,j}(x{i,j}∣c,d), (34)

where and . By using Eq. (21), one can confirm that the beliefs in Eqs. (33) and (34) satisfy the normalization constraints and the marginal constraints for any and .

From Eqs. (27), (33), and (34), in the same way as in Eq. (3), we formulate the variational Bethe free energy for the CMRF in Eq. (28) as

 \mcalF(c,d) =−∑i∈V∞∑s=1H(s)ic(s)i−∑{i,j}∈E∞∑s,t=1J(s,t){i,j}d(s,t){i,j}\nn +∑i∈V(1−|∂i|)∫\mcalXbi(xi∣c)lnbi(xi∣c)dxi\nn +∑{i,j}∈E∫\mcalX2ξ{i,j}(x{i,j}∣c,d)\nn ×lnξ{i,j}(x{i,j}∣c,d)dxidxj. (35)

For specific and , this variational Bethe free energy is not the functional, but the function of and . The variational Bethe free energy in Eq. (35) coincides with that in Eq. (3), except for the irrelevant constant neglected in Eq. (28), i.e.,

 \mcalF[b,ξ]=\mcalF(c,d)+C0. (36)

As mentioned above, the beliefs in Eqs. (33) and (34) satisfy the normalization constraints and the marginal constraints for any and , so that we can minimize with no constraint. At the minimum point of , and satisfy

 H(s)i =(1−|∂i|)∫\mcalXϕs(xi)lnbi(xi∣c)dxi\nn +1χ∑j∈∂i∫\mcalX2ϕs(xi)lnξ{i,j}(x{i,j}∣c,d)dxidxj, (37) J(s,t){i,j} =∫\mcalX2ϕs(xi)ϕt(xj)lnξ{i,j}(x{i,j}∣c,d)dxidxj. (38)

Eqs. (37) and (38) are derived from the extremal condition of Eq. (35) with respect to and , respectively. In the derivation of these equations, we used Eq. (21).

### iv.3 Maximization of the Bethe Log-likelihood Function

By using the new energy function in Eqs. (27) and the variational Bethe free energy in Eq. (35), the Bethe log-likelihood functional in Eq. (14) is represented as

 l\mrmBethe(H,J)=−1NN∑μ=1Ψ†(x(μ);H,J)+minc,d\mcalF(c,d). (39)

This is the function with respect to and and we refer to this function as the Bethe log-likelihood function. Thus, the functional optimization problem of the maximum likelihood estimation is reduced to the function optimization problem. The Bethe log-likelihood function is equivalent to the Bethe log-likelihood functional in Eq. (14), because, from Eqs. (26) and (36),

 l\mrmBethe[θ,w] =−1NN∑μ=1(Ψ†(x(μ);H,J)+C0)\nn\aleq+minc,d(\mcalF(c,d)+C0)\nn =l\mrmBethe(H,J).

Therefore, the maximization of the the Bethe log-likelihood function with respect to and is equivalent to the maximization of the Bethe log-likelihood functional with respect to and . At the maximum point of the Bethe log-likelihood function, we have equations for the expanding coefficients in Eqs. (33) and (34) as

 ^c(s)i =\aveϕs(xi)\mcalD, (40) ^d(s,t){i,j} =\aveϕs(xi)ϕt(xj)\mcalD, (41)

where is the sample average over data points . Coefficients and are the solutions to the minimization of the variational Bethe free energy in Eq. (35), that is, the solutions to Eqs. (37) and (38). In the following, we denote the beliefs, the coefficients of which are fixed by Eqs. (40) and (41), by and , i.e., and .

By substituting Eqs. (40) and (41) into Eqs. (37) and (38), we can obtain the solution, and , to the maximization of the Bethe log-likelihood function in Eq. (39), and then, identify the energy function . It should be noted that the solution obtained by our method satisfies Eqs. (15) and (16), which is easily confirmed as follows. A test function is expanded as in Eq. (18). Therefore, the left side of Eq. (15) is

 1NN∑μ=1f(xi)=α0χ+∞∑s=1αs\aveϕs(xi)\mcalD.

On the other side, the right hand side of Eq. (15) is

 ∫\mcalXf(xi)^bi(xi)dxi =α0χ+∞∑s=1αs∫\mcalXϕs(xi)^bi(xi)\nn =α0χ+∞∑s=1αs^c(s)i,

where we use Eq. (33). From these equations and Eq. (40), the solution obtained by our method satisfying Eq. (15) is confirmed. Similarly, we can verify the equality in Eq. (16).

By using the method described above, within the framework of Bethe approximation we can identify the functional form of the energy function through the use of the given data points, and then obtain the resulting CMRF as

 ^P(x)∝exp(−Ψ†(x;^H,^J)). (42)

Unfortunately, one cannot computationally treat the infinite series in Eqs. (37) and (38). Thus, in practice, we truncate the infinite series and approximate them by a finite series obtained by the truncation. The details of this approximation are described in Sec. V.1.

The proposed method includes the integration procedures (cf. Eqs. (37) and (38)). The following rewriting allows us to identify the functional form of without the integration procedures. We now consider the energy function defined by

 Ψ‡(x):=−∑i∈V(1−|∂i|)ln^bi(xi)−∑{i,j}∈Eln^ξ{i,j}(x{i,j}). (43)

This energy function satisfies the relation

 Ψ‡(x)=Ψ†(x∣^H,^J)+C1, (44)

where is the constant unrelated to (see Appendix B). From the relation and Eq. (42), we obtain the CMRF determined by the Bethe approximation of the MLE as

 ^P(x)∝exp(−Ψ‡(x)), (45)

and obtain the energy function in the form of Eq. (43).

## V Implementation

### v.1 Approximation for Implementation

The beliefs, and , are expressed by an infinite series, as shown in Eqs. (33) and (34). Because an infinite series is not implementable, we approximate them by the truncation up to a finite order:

 ^b(K)i(xi):=1χ+K∑s=1^c(s)iϕs(xi) (46)

and

 ^ξ(K){i,j}(x{i,j}) :=1χ2+1χK∑s=1(^c(s)iϕs(xi)\nn\aleq+^c(s)jϕs(xj))+K∑s,t=1^d(s,t){i,j}ϕs(xi)ϕt(xj), (47)

where the positive integer controls the order of the approximation. In the limit of , and coincide with and , respectively. The approximate beliefs in Eqs. (46) and (47) are normalized for any .

Because of the above truncating approximation, the non-negativity of the beliefs may not be retained. Thus, to preserve the positivity of the beliefs, we have to make a further approximation to them. For a small positive value , we define distributions

 ~b(K)i(xi):=max(ε,^b(K)i(xi))∫\mcalXmax(ε,^b(K)i(xi))dxi (48)

and regard the cut-off distribution as the approximation of . If over , . In a similar manner, we approximate by

 ~ξ(K){i,j}(x{i,j}):=max(ε,^ξ(K)i,j(xi,xj))∫\mcalX2max(ε,^ξ(K)i,j(xi,xj))dxidxj. (49)

By using Eqs. (48) and (49) instead of and , the CMRF in Eq. (45) is approximated by

 ^P(x)≈~PK(x)∝exp(−~Ψ‡K(x)), (50)

where

 ~Ψ‡K(x)\nn :=−∑i∈V(1−|∂i|)ln~b(K)i(xi)−∑{i,j}∈Eln~ξ(K){i,j}(x{i,j})\nn =−∑i∈V(1−|∂i|)ln(max(ε,^b(K)i(xi)))\nn\aleq−∑{i,j}∈Eln(max(ε,^ξ(K){i,j}(x{i,j})))+C2 (51)

is the approximation of Eq. (43). Constant , which originates from the denominators of Eqs. (48) and (49), is negligible in Eq. (50).

The procedure of our method is summarized as follows. First, given we compute and in Eqs. (40) and (41). Then, using and , we compute and in Eqs. (48) and (49), and then in Eq. (51) for certain and . Finally, we obtain the CMRF determined in our method by Eq. (50), and regard the CMRF as the solution to the inverse problem.

If one wants to obtain coefficients and , one can approximately obtain them by using Eqs. (37) and (38) together with and instead of and