Boltzmann machine learning, which is known as the inverse Ising problem in statistical mechanics, is one of the important problems in the statistical machine learning field and has a long history. Suppose that we have sample points, i.e., data points, stochastically generated from an unknown distribution (referred to as a generative model). The task of statistical machine learning is to specify the unknown distribution using only the sample points. In standard Boltzmann machine learning, we assume that the generative model that generates data points is an Ising model, and prepare an Ising model (referred to as the learning model) with controllable parameters, e.g., external fields and exchange interactions. The Boltzmann machine learning is achieved by optimizing the values of the controllable parameters in the learning model through maximum likelihood estimation.
Unfortunately, we cannot perform Boltzmann machine learning exactly because of the computational cost. Therefore, many approximations for Boltzmann machine learning have been proposed. In particular, approximations based on mean-field methods have been developed in the field of statistical mechanics Roudi et al. (2009): mean-field approximation Kappen and Rodríguez (1998), Bethe approximation Parise and Welling (2009); Yasuda and Horiguchi (2006); Yasuda and Tanaka (2009); Mézard and Mora (2009); Marinari and Kerrebroeck (2010); Ricci-Tersenghi (2012); Nguyen and Berg (2012); Furtlehner (2013), Plefka expansion Tanaka (1998); Sessak and Monasson (2009), and so on. In many of these methods, we can obtain the solution to the maximum likelihood estimation analytically. However, they are applicable to only an Ising-type learning model, that is, the variables in the model are binary and the energy function of the model is a quadratic form of the variables.
We proposed a method for a more general situation that uses Bethe approximation and orthonormal function expansion Yasuda et al. (2012). Using the method, we can solve the inverse problem with general pair-wise Markov random fields and obtain the solution analytically. However, this method cannot be applied to Markov random fields with continuous variables.
In this paper, we propose a method for solving the inverse problem in general pair-wise Markov random fields with continuous variables, which is an extension of our previous method Yasuda et al. (2012). The proposed method can give us the analytical solution of the inverse problem. This is the main contribution of this paper. In this paper, we refer to a pair-wise Markov random field with continuous variables as a continuous Markov random field (CMRF).
The remainder of this paper is organized as follows. In Sec. II, we explain loopy belief propagation (LBP) in a CMRF. LBP is equivalent to Bethe approximation Yedidia et al. (2005); Pelizzola (2005). We formulate the inverse problem in a CMRF in Sec. III, as well as its Bethe approximation. Our method is shown in Sec. IV. In this section, we derive the solution to the inverse problem using the Bethe approximation shown in Sec. III. Since the solution is obtained in the form of infinite series, it cannot be implemented as it is. We describe a means of implementing our method and show the results of numerical experiments in Sec. V. We conclude the paper with some remarks in Sec. VI.
Ii Formalism of Loopy Belief Propagation in Continuous Markov Random Field
Consider an undirected graph , where is the set of nodes and is the set of undirected links. We denote the link between nodes and by . Because the links have no direction, and indicate the same link. On the undirected graph, we define the non-parametrized pair-wise energy function as
where is the energy on node , is the energy on link , and . We regard as the same function as . With the energy function, we define the CMRF as
represents the continuous random variables over the continuous space, and is the partition function defined as
where denotes the multiple integration over whole variables, , and denotes the integral over . and are arbitrary functions of the assigned variables.
Given the CMRF, it is difficult to evaluate its marginal distributions because of the existence of intractable multiple integration. LBP is one of the most effective methods for approximately evaluating marginal distributions and is the same as Bethe approximation in statistical mechanics. LBP can be obtained from the minimum condition of the variational Bethe free energy of the CMRF in Eq. (2). We denote the marginal distribution over by and that over and , which are neighboring pair of nodes, by . These marginal distributions are sometimes called beliefs in the context of LBP. We regard as the same belief as . In the context of the cluster variation method Kikuchi (1951); Yedidia et al. (2005), the variational Bethe free energy of the CMRF is expressed as
where is the set of nodes connected to node . The variational Bethe free energy is regarded as the functional with respect to and . The beliefs, that minimize the variational Bethe free energy, are regarded as the Bethe approximation of the corresponding marginal distributions. From the extremal condition of the variational Bethe free energy under the normalizing constraints,
and the marginalizing constraints,
we obtain the message-passing equation (MPE)
where the constant is frequently set to
to normalize the messages. The distribution is defined as
The quantity is the normalized message (or the effective field) from node to node , which is non-negative and originates from the Lagrange multipliers appearing in the conditional minimization of the variational Bethe free energy. The two different messages, and , are defined on link . The beliefs (the approximate marginal distributions) are computed from the messages as
In principle, by solving the MPE in Eq. (7), we can compute the one- and two-variable marginal distributions using Eqs. (10) and (11). However, finding the functional forms of the messages is not straightforward, because the messages are continuous functions over , and therefore, the MPE we have to solve is an integral equation. Some methods that are based mainly on a stochastic method have been developed for approximately solving the MPE Sudderth et al. (2003); Ihler and McAllester (2009); Noorshams and Wainwright (2013).
Iii Inverse Problem in Continuous Markov Random Field
In this section, we consider the inverse problem, in other words, the machine learning problem, for the CMRF in Eq. (2). The inverse problem for the CMRF can be solved by maximum likelihood estimation. Given data points , we define the log-likelihood functional as
where and are the set of functions and respectively in the exponent in Eq. (2). The goal of the maximum likelihood estimation is to find the functions and that maximize the log-likelihood functional. Eq. (12) can be rewritten as
However, the maximization problem of the log-likelihood functional is intractable because of the existence of the partition function.
To avoid evaluating the intractable partition function, we approximate the log-likelihood functional using LBP, i.e., Bethe approximation. The Bethe approximation of the log-likelihood functional in Eq. (13) can be expressed by using the variational Bethe free energy shown in Eq. (3) as
We refer to this as the Bethe log-likelihood functional. The main purpose of this study was to maximize the Bethe log-likelihood functional with respect to the functions and . The solution obtained by maximizing Eq. (14), of course coincides to that obtained by the true maximum likelihood estimation when the CMRF has a tree structure, because Bethe approximation is exact in tree systems. However, the maximization of the Bethe log-likelihood functional is not straightforward for the following reasons. The variations of the functional with respect to and are
where and are the beliefs minimizing the variational Bethe free energy, in other words, the solution to the LBP presented in the previous section. This variation means that we have to find and that satisfy the relations
for any test functions and . Thus, if we could obtain the solution of the LBP, by using a method that has already proposed Sudderth et al. (2003); Ihler and McAllester (2009); Noorshams and Wainwright (2013), the solution to the maximization of the Bethe log-likelihood functional is not immediately obtained.
Iv Proposed Method
In this section, we propose a method to solve the maximization problem of the Bethe log-likelihood function in Eq. (14) in terms of orthonormal function expansion. Via orthonormal function expansion, we can reduce the functional maximization problem in the previous section to a tractable function maximization problem. The basic idea of our method is similar to that presented in our previous paper Yasuda et al. (2012).
iv.1 Orthonormal Function System
Before deriving our method, we introduce an orthonormal function system over satisfying
where is the Kronecker delta function. By using the orthonormal function system, function over is expanded as
where the expanding coefficients are given by
The orthonormal function expansion in Eq. (18) plays an important role in our method.
In the following, we assume that is the finite space, , and that is constant over , i.e.,
The orthonormal function expansion introduced in this section plays a central role in our proposed method described in the following. However, a similar idea can be useful for solving the LBP in Sec. II. Indeed, a method for solving the LBP was proposed by using orthonormal function expansion Noorshams and Wainwright (2013).
iv.2 Variational Bethe Free Energy with Orthonormal Function Expansion
where, from Eq. (19), the expanding coefficients are
This rewriting makes the CMRF the parametric model, parameterized byand . In Eq. (28), the constant in Eq. (26) is neglected, because it is irrelevant to the distribution.
Now, we introduce the orthonormal function expansions of the beliefs in the variational Bethe free energy, as follows.
From Eq. (19), the expanding coefficients are
For specific and , this variational Bethe free energy is not the functional, but the function of and . The variational Bethe free energy in Eq. (35) coincides with that in Eq. (3), except for the irrelevant constant neglected in Eq. (28), i.e.,
As mentioned above, the beliefs in Eqs. (33) and (34) satisfy the normalization constraints and the marginal constraints for any and , so that we can minimize with no constraint. At the minimum point of , and satisfy
iv.3 Maximization of the Bethe Log-likelihood Function
This is the function with respect to and and we refer to this function as the Bethe log-likelihood function. Thus, the functional optimization problem of the maximum likelihood estimation is reduced to the function optimization problem. The Bethe log-likelihood function is equivalent to the Bethe log-likelihood functional in Eq. (14), because, from Eqs. (26) and (36),
Therefore, the maximization of the the Bethe log-likelihood function with respect to and is equivalent to the maximization of the Bethe log-likelihood functional with respect to and . At the maximum point of the Bethe log-likelihood function, we have equations for the expanding coefficients in Eqs. (33) and (34) as
where is the sample average over data points . Coefficients and are the solutions to the minimization of the variational Bethe free energy in Eq. (35), that is, the solutions to Eqs. (37) and (38). In the following, we denote the beliefs, the coefficients of which are fixed by Eqs. (40) and (41), by and , i.e., and .
By substituting Eqs. (40) and (41) into Eqs. (37) and (38), we can obtain the solution, and , to the maximization of the Bethe log-likelihood function in Eq. (39), and then, identify the energy function . It should be noted that the solution obtained by our method satisfies Eqs. (15) and (16), which is easily confirmed as follows. A test function is expanded as in Eq. (18). Therefore, the left side of Eq. (15) is
On the other side, the right hand side of Eq. (15) is
By using the method described above, within the framework of Bethe approximation we can identify the functional form of the energy function through the use of the given data points, and then obtain the resulting CMRF as
Unfortunately, one cannot computationally treat the infinite series in Eqs. (37) and (38). Thus, in practice, we truncate the infinite series and approximate them by a finite series obtained by the truncation. The details of this approximation are described in Sec. V.1.
The proposed method includes the integration procedures (cf. Eqs. (37) and (38)). The following rewriting allows us to identify the functional form of without the integration procedures. We now consider the energy function defined by
This energy function satisfies the relation
and obtain the energy function in the form of Eq. (43).
v.1 Approximation for Implementation
Because of the above truncating approximation, the non-negativity of the beliefs may not be retained. Thus, to preserve the positivity of the beliefs, we have to make a further approximation to them. For a small positive value , we define distributions
and regard the cut-off distribution as the approximation of . If over , . In a similar manner, we approximate by
The procedure of our method is summarized as follows. First, given we compute and in Eqs. (40) and (41). Then, using and , we compute and in Eqs. (48) and (49), and then in Eq. (51) for certain and . Finally, we obtain the CMRF determined in our method by Eq. (50), and regard the CMRF as the solution to the inverse problem.