I Introduction
Variational techniques have been used for decades in quantum and statistical physics, where they are referred to as the mean field (MF) approximation [2]
. Later, they found their way to the area of machine learning or statistical inference, see, e.g.,
[3, 4, 5, 6]. The basic idea of variational inference is to derive the statistics of “hidden” random variables given the knowledge of “visible” random variables of a certain probability density function (pdf). In the MF approximation, this pdf is approximated by some “simpler,” e.g., (fully) factorized pdf and the KullbackLeibler divergence between the approximating and the true pdf is minimized, which can be done in an iterative, i.e., message passing like way. Apart from being fully factorized, the approximating pdf typically fulfills additional constraints that allow for messages with a simple structure, which can be updated in a simple way. For example, additional exponential conjugacy constraints result in messages propagating along the edges of the underlying Bayesian network that are described by a small number of parameters
[5]. Variational inference methods were recently applied in [7] to the channel state estimation/interference cancellation part of a class of MIMOOFDM receivers that iterate between detection, channel estimation, and decoding.An approach different from the MF approximation is belief propagation (BP) [8]. Roughly speaking, with BP one tries to find local approximations, which are—exactly or approximately—the marginals of a certain pdf^{1}^{1}1Following the convention used in [9], we use the name BP also for loopy BP.. This can also be done in an iterative way, where messages are passed along the edges of a factor graph [10]. A typical application of BP is decoding of turbo or low density parity check (LDPC) codes. Based on the excellent performance of BP, a lot of variations have been derived in order to improve the performance of this algorithm even further. For example, minimizing an upper bound on the log partition function of a pdf leads to the powerful tree reweighted BP algorithm [11]. An offspring of this idea is the recently developed uniformly tree reweighted BP algorithm [12]. Another example is [13], where methods from information geometry are used to compute correction terms for the beliefs obtained by loopy BP. An alternative approach for turbo decoding that uses projections (that are dual in the sense of [14, Ch. 3] to the one used in [13]) on constraint subsets can be found in [15]. A combination of the approaches used in[13] and in [15] can be found in [16].
Both methods, BP and the MF approximation, have their own virtues and disadvantages. For example, the MF approximation
 +

always admits a convergent implementation;
 +

has simple message passing update rules, in particular
for conjugateexponential models;  –

is not compatible with hard constraints,
and BP
 +

yields a good approximation of the marginal
distributions if the factor graph has no short cycles;  +

is compatible with hard constraints like, e.g.,
code constraints;  –

may have a high complexity, especially when applied
to probabilistic models involving both, discrete and
continuous random variables.
Hence, it is of great benefit to apply BP and the MF approximation on the same factor graph in such a combination that their respective virtues can be exploited while circumventing their drawbacks. To this end, a unified message passing algorithm is needed that allows for combining both approaches.
The fixedpoint equations of both BP and the MF approximation can be obtained by minimizing an approximation of the KullbackLeibler divergence, called regionbased free energy approximation. This approach differs from other methods, see, e.g., [17]^{2}^{2}2 An information geometric interpretation of the different objective functions used in [17] can be found in [14, Ch. 2]., because the starting point for the derivation of the corresponding message passing fixedpoint equations is the same objective function for both, BP and the MF approximation. The main technical result of our work is Theorem 2, where we show that the message passing fixedpoint equations for such a combination of BP and the MF approximation correspond to stationary points of one single constrained regionbased free energy approximation and provide a clear rule stating how to couple the messages propagating in the BP and MF part. In fact, based on the factor graph corresponding to a factorization of a probability mass function (pmf) and a choice for a separation of this factorization into BP and MF factors, Theorem 2 gives the message passing fixedpoint equations for the factor graph representing the whole factorization of the pmf. One example of an application of Theorem 2
is joint channel estimation, interference cancellation, and decoding. Typically, these tasks are considered separately and the coupling between them is described in a heuristic way. As an example of this problematic, there has been a debate in the research community on whether a posteriori probabilities (APP) or extrinsic values should be fed back from the decoder to the rest of the receiver components; several authors coincide in proposing the use of extrinsic values for MIMO detection
[18, 19, 20] while using APP values for channel estimation [19, 20], but no thorough justification for this choice is given apart from the achieved superior performance shown by simulation results. Despite having a clear rule to update the messages for the whole factor graph representing a factorization of a pmf, an additional advantage is the fact that solutions of fixedpoint equations for the messages are related to the stationary points of the corresponding constrained regionbased free energy approximation. This correspondence is important because it yields an interpretation of the computed beliefs for arbitrary factor graphs similar to the case of solely BP, where solutions of the message passing fixedpoint equations do in general not correspond to the true marginals if the factor graph has cycles but always correspond to stationary points of the constrained Bethe free energy [9]. Moreover, this observation allows us to present a systematic way of updating the messages, namely, Algorithm 1, that is guaranteed to converge provided that the factor graph representing the factorization of the pmf fulfills certain technical conditions.The paper is organized as follows. In the remainder of this section we fix our notation. Section II is devoted to the introduction of the regionbased free energy approximations proposed by [9] and to recall how BP, the MF approximation, and the EM algorithm [21] can be obtained by this method. Since the MF approximation is typically used for parameter estimation, we briefly show how to extend it to the case of continuous random variables using an approach presented already in [22, pp. 36–38] that avoids complicated methods from variational calculus. Section III is the main part of this work. There we state our main result, namely, Theorem 2, and show how the message passing fixedpoint equations of a combination of BP and the MF approximation can be related to the stationary points of the corresponding constrained regionbased free energy approximation. We then (i) prove Lemma 2, which generalizes Theorem 2 to the case where the factors of the pmf in the BP part are no longer restricted to be strictly positive realvalued functions, and (ii) present Algorithm 1 that is a convergent implementation of the message passing update equations presented in Theorem 2 provided that the factor graph representing the factorization of the pmf fulfills certain technical conditions. As a byproduct, (i) gives insights into solely BP (which is a special case of the combination of BP and the MF approximation) with hard constraints, where only conjectures are formulated in [9]. In Section IV we apply Algorithm 1 to joint channel estimation and decoding in an OFDM system. More advanced receiver architectures together with numerical simulations and a comparison with other state of the art receivers can be found in [23] and an additional application of the algorithm in a cooperative communications scenario is presented in [24]. Finally, we conclude in Section V and present an outlook for further research directions.
Ia Notation
Capital calligraphic letters denote finite sets. The cardinality of a set is denoted by . If we write for . We use the convention that , where denotes the empty set. For any finite set , denotes the indicator function on , i.e., if and else. We denote by capital letters discrete random variables with a finite number of realizations and pmf . For a random variable , we use the convention that is a representative for all possible realizations of , i.e., serves as a running variable, and denote a particular realization by . For example, runs through all possible realizations of and for two functions and depending on all realizations of , means that for each particular realization of . If is a functional of a pmf of a random variable and is a function depending on all realizations of X, then means that is well defined and holds for each particular realization of . We write
for the realizations of the vector of random variables
. If , then runs through all possible realizations of but . For any nonnegative real valued function with argument and , denotes with fixed argument . If a function is identically zero, we write and means that it is not identically zero. For two real valued functions and with the same domain and argument , we write if for some real positive constant . We use the convention that , if , and [25, p. 31]. For , if and zero else. Matrices are denoted by capital boldface Greek letters. The superscripts and stand for transposition and Hermitian transposition, respectively. For a matrix , the entry in the th row and th column is denoted by . For two vectors and , denotes the Hadamard product of and . Finally, stands for the pdf of a jointly proper complex Gaussian random vector with mean and covariance matrix .Ii Known results
Iia Regionbased free energy approximations [9]
Let be a certain positive pmf of a vector of random variables that factorizes as
(1) 
where and with for all . Without loss of generality we assume that , which can always be achieved by renaming indices.^{3}^{3}3 For example, we can write
A region consists of subsets of indices and with the restriction that implies that . To each region we associate a counting number . A set of regions and associated counting numbers is called valid if
for all .
For a positive function approximating , we define the variational free energy [9]^{5}^{5}5If is not normalized to one, the definition of the variational free energy contains an additional normalization constant, called Helmholtz free energy [9, pp. 4–5].
(2) 
In (2), denotes the entropy [25, p. 5] of and is called average energy of . Note that is the KullbackLeibler divergence [25, p. 19] between and , i.e., . For a set of regions and associated counting numbers, the regionbased free energy approximation is defined as [9] with
Here, each is defined locally on a region . Instead of minimizing with respect to , we minimize with respect to all , where the have to fulfill certain constraints. The quantities are called beliefs. We give two examples of valid sets of regions and associated counting numbers.
Example II.1
The trivial example . It leads to the MF fixedpoint equations, as will be shown in Subsection IIC.
Example II.2
We define two types of regions:

large regions: , with for all ;

small regions: , with for all .
Note that this definition is well defined due to our assumption that . The regionbased free energy approximation corresponding to the valid set of regions and associated counting numbers
is called the Bethe free energy [26, 9]. It leads to the BP fixedpoint equations, as will be shown in Subsection IIB. The Bethe free energy is equal to the variational free energy when the factor graph has no cycles [9].
IiB BP fixedpoint equations
The fixedpoint equations for BP can be obtained from the Bethe free energy by imposing additional marginalization and normalization constraints and computing the stationary points of the corresponding Lagrangian function[27, 9]. The Bethe free energy reads
(3) 
with for all , for all , and . The normalization constraints for the beliefs and the marginalization constraints for the beliefs and can be included in the Lagrangian [28, Sec. 3.1.3]
(4) 
The stationary points of the Lagrangian in (IIB) are then related to the BP fixedpoint equations by the following theorem.
Theorem 1
Often, the following alternative system of fixedpoint equations is solved instead of (6).
(7) 
for all , where are arbitrary positive constants. The reason for this is that for a fixed scheduling the messages computed in (6) differ from the messages computed in (7) only by positive constants, which drop out when the beliefs are normalized. See also [9, Eq. (68) and Eq. (69)], where the symbol is used in the update equations indicating that the normalization constants are irrelevant. A solution of (7) can be obtained, e.g., by updating corresponding likelihood ratios of the messages in (6) or by updating the messages according to (6) but ignoring the normalization constants . The algorithm converges if the normalized beliefs do not change any more. Therefore, a rescaling of the messages is irrelevant and a solution of (7) is obtained. However, we note that a rescaled solution of (7) is not necessarily a solution of (6). Hence, the beliefs obtained by solving (7) need not be stationary points of the Lagrangian in (IIB). To the best of our knowledge, this elementary insight is not published yet in the literature and we state a necessary and sufficient condition when a solution of (7) can be rescaled to a solution of (6) in the following lemma.
Lemma 1
See Appendix A.
IiC Fixedpoint equations for the MF approximation
A message passing interpretation of the MF approximation was derived in [5, 29]. In this section, we briefly show how the corresponding fixedpoint equations can be obtained by the free energy approach. To this end, we use from Example II.1 together with the factorization constraint^{6}^{6}6For binary random variables with pmf in an exponential family it was shown in [30] that this gives a good approximation whenever the truncation of the Plefka expansion does not introduce a significant error.
(10) 
Plugging (10) into the expression for the regionbased free energy approximation corresponding to the trivial approximation we get
(11) 
with . Assuming that all the beliefs have to fulfill a normalization constraint, the stationary points of the corresponding Lagrangian for the MF approximation can easily be evaluated to be
(12) 
for all , where the positive constants are such that is normalized to one for all .^{7}^{7}7 The Lagrange multiplier [28, p. 283] for each belief corresponding to the normalization constraint can be absorbed into the positive constant .
For the MF approximation there always exists a convergent algorithm that computes beliefs solving (12) by simply using (12) as an iterative update equation for the beliefs. Since for all
and the set of all beliefs satisfying the normalization constraint is a convex set, the objective function in (11) cannot increase and the algorithm is guaranteed to converge. Note that in order to derive a particular update we need all previous updates with
By setting for all , the fixedpoint equations in (12) are transformed into the message passing fixedpoint equations
(13) 
for all . The MF approximation can be extended to the case where is a pdf, as shown in Appendix B. Formally, each sum over () in (12) and (13) has to be replaced by a Lebesgue integral whenever the corresponding random variable is continuous.
IiD Expectation maximization (EM)
Message passing interpretations for EM [21] were derived in [31, 32]. It can be shown that EM is a special instance of the MF approximation [33, Sec. 2.3.1], which can be summarized as follows. Suppose that we apply the MF approximation to in (1) as described before. In addition, we assume that for all the beliefs fulfill the constraints that . Using the fact that , we can rewrite in (11) as
(14) 
For all the stationary points of in (14) have the same analytical expression as the one obtained in (12). For , minimizing in (14) with respect to yields
Setting for all , we get the message passing update equations defined in (13) except that we have to replace the messages for all and by
with
for all .
Iii Combined BP / MF approximation fixedpoint equations
Let
(15) 
be a partially factorized pmf with and . As before, we have , , with for all , and for all . We refer to the factor graph representing the factorization in (15) as “BP part” and to the factor graph representing the factorization in (15) as “MF part”. Furthermore, we set
and
Next, we define the following regions and counting numbers:

one MF region , with ;

small regions , with for all ;

large regions , with for all .
This yields the valid set of regions and associated counting numbers
(16) 
The additional terms in the counting numbers of the small regions defined in 2) compared to the counting numbers of the small regions for the Bethe approximation (see Example II.2) guarantee that is indeed a valid set of regions and associated counting numbers.
The valid set of regions and associated counting numbers in (III) gives the regionbased free energy approximation
(17) 
with . In (III), we have already plugged in the factorization constraint
with and . The beliefs and have to fulfill the normalization constraints
(18) 
and the marginalization constraints
(19) 
Remark III.1
Note that there is no need to introduce normalization constraints for the beliefs . If , then it follows from the normalization constraint for the belief and marginalization constraint for the beliefs and that
We will show in Lemma 2 that the regionbased free energy approximation in (III) fulfilling the constraints (18) and (19) is a finite quantity, i.e., that .
The constraints (18) and (19) can be included in the Lagrangian [28, Sec. 3.1.3]
(20) 
The stationary points of the Lagrangian in (III) are then obtained by setting the derivatives of with respect to the beliefs and the Lagrange multipliers equal to zero. The following theorem relates the stationary points of the Lagrangian to solutions of fixedpoint equations for the beliefs.
Theorem 2
Stationary points of the Lagrangian in (III) in the combined BP–MF approach must be fixedpoints with positive beliefs fulfilling
(21) 
with
(22) 
and vice versa. Here, and are positive constants that ensure that the beliefs and are normalized to one with for all .
See Appendix C.
Remark III.2
Note that for each Theorem 2 can be generalized to the case where is a continuous random variable following the derivation presented in Appendix B. Formally, each sum over with in the third identity in (22) has to be replaced by a Lebesgue integral whenever the corresponding random variable is continuous.
Remark III.3
Iiia Hard constraints for BP
Some suggestions on how to generalize Theorem 1 ([9, Th. 2]) to hard constraints, i.e., to the case where the factors of the pmf are not restricted to be strictly positive realvalued functions, can be found in [9, Sec. VI.D]. An example of hard constraints are deterministic functions like, e.g., code constraints. However, the statements formulated there are only conjectures and are based on the assumption that we can always compute the derivative of the Lagrange function with respect to the beliefs. This is not always possible because
with from (IIB). In the sequel, we show how to generalize Theorem 2 to the case where for all based on the simple observation that we are interested in solutions where the regionbased free energy approximation is not plus infinity (recall that we want to minimize this quantity). As a byproduct, this also yields an extension of Theorem 1 ([9, Th. 2]) to hard constraints by simply setting .
Lemma 2
See Appendix D.
Remark III.4
At first sight it seems to be a contradiction to the marginalization constraints (19) that (25) holds and all the beliefs () are strictly positive functions. To illustrate that this is indeed the case, let , , and fix one realization of . Since we also have . This implies that for at least one realization with and, therefore, . The marginalization constraints (19) together with the fact that the belief must be a nonnegative function then implies that we have indeed .
IiiB Convergence and main algorithm
If the BP part has no cycle and
(26) 
then there exists a convergent implementation of the combined message passing equations in (22). In fact, we can iterate between updating the beliefs with and the forward backward algorithm in the BP part, as outlined in the following Algorithm.
Algorithm 1
If the BP part has no cycle and (26) is fulfilled, the following implementation of the fixedpoint equations in (22) is guaranteed to converge.

Initialize for all and send the corresponding messages to all factor nodes .

For each and the message is now available and can be used for further updates in the MF part.

For each successively recompute the message and send it to all . Note that for all indices