# A Unified Bayesian Inference Framework for Generalized Linear Models

In this letter, we present a unified Bayesian inference framework for generalized linear models (GLM) which iteratively reduces the GLM problem to a sequence of standard linear model (SLM) problems. This framework provides new perspectives on some established GLM algorithms derived from SLM ones and also suggests novel extensions for some other SLM algorithms. Specific instances elucidated under such framework are the GLM versions of approximate message passing (AMP), vector AMP (VAMP), and sparse Bayesian learning (SBL). It is proved that the resultant GLM version of AMP is equivalent to the well-known generalized approximate message passing (GAMP). Numerical results for 1-bit quantized compressed sensing (CS) demonstrate the effectiveness of this unified framework.

## Authors

• 12 publications
• 5 publications
• 19 publications
01/10/2019

### A Unified Framework of State Evolution for Message-Passing Algorithms

This paper presents a unified framework to understand the dynamics of me...
05/08/2018

### Binary Sparse Bayesian Learning Algorithm for One-bit Compressed Sensing

In this letter, a binary sparse Bayesian learning (BSBL) algorithm is pr...
10/18/2018

### Bilinear Adaptive Generalized Vector Approximate Message Passing

This paper considers the generalized bilinear recovery problem which aim...
06/01/2019

### Sparse Bayesian Learning Approach for Discrete Signal Reconstruction

This study addresses the problem of discrete signal reconstruction from ...
11/25/2020

### Free Energy Minimization: A Unified Framework for Modelling, Inference, Learning,and Optimization

The goal of these lecture notes is to review the problem of free energy ...
06/02/2017

### Streaming Bayesian inference: theoretical limits and mini-batch approximate message-passing

In statistical learning for real-world large-scale data problems, one mu...
07/04/2019

### A New Insight into GAMP and AMP

A concise expectation propagation (EP) based message passing algorithm (...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

The approximate message passing (AMP) algorithm [1], first proposed by Donoho et al. in the field of compressed sensing (CS)[2], is one state-of-the-art algorithm for Bayesian inference over the standard linear model (SLM)[3, 4, 5, 6, 7, 8, 9, 10, 11]. Since its first publication, much effort has been done to extend AMP[12, 13, 14, 15, 16, 17, 18, 19]. In [13]

, vector AMP (VAMP) was proposed by extending the measurement matrix from i.i.d. sub Gaussian to right-rotationally invariant matrix. For partial discrete Fourier transform (DFT) matrix, a variant of AMP was proposed

[12]. For unitarily-invariant matrices, the authors in [17]

proposed orthogonal AMP (OAMP) using de-correlated linear estimator and divergence-free nonlinear estimator.

However, in some applications such as quantized CS [20], linear classification [21], phase retrieval [22], etc., the measurements are nonlinear transform of the input signal, to which SLM no longer applies. To this end, Rangan extended AMP to generalized approximate message passing (GAMP) [23, 24] for generalized linear models (GLM) so that it can efficiently recover the signal of interest from nonlinear measurements. Recently, both VAMP [13] and turbo CS [12] have been extended to handle GLM problems [14, 18, 19].

In this letter, a unified Bayesian inference framework for GLM inference is presented whereby the original GLM problem is iteratively reduced to a sequence of SLM problems. Under such framework, a variety of inference methods for SLM can be easily extended to GLM. As specific instances, we extend AMP, VAMP, and sparse Bayesian learning (SBL) [25, 26] to GLM under this framework and obtain three GLM inference algorithms termed as Gr-AMP, Gr-VAMP, and Gr-SBL, respectively. It is proved that Gr-AMP is equivalent to GAMP [23]. Note that although both AMP and VAMP have been extended to GLM in [23] and [14], each of them was derived from a different perspective tailored to one specific SLM algorithm only. Under the proposed framework, however, we unveil that they actually share a common rule and more importantly, this rule also suggests novel extensions for some other SLM algorithms, e.g., SBL. Thus the main contribution of this letter is providing a simple and unified Bayesian framework for the understanding and extension of SLM algorithms to GLM. Numerical results for 1-bit quantized CS demonstrate the effectiveness of this framework.

## Ii Problem Description

Consider the system model of GLM shown in Fig. 1. The signal is generated following a prior distribution . Then,

passes through a linear transform

, where is a known matrix and the measurement ratio is . The observation vector is obtained through a component-wise random mapping, which is described by a factorized conditional distribution

 p(y|z)=[a=1]M∏p(ya|za)=[a=1]M∏p(ya|za=N∑i=1Aaixi). (1)

The goal of GLM inference is to compute the minimum mean square error (MMSE) estimate of , i.e., , where the expectation is taken with respect to (w.r.t.) the posterior distribution , where denotes identity up to a normalization constant. In this letter, and are assumed known. Extensions to scenarios with unknown or are possible [6, 4, 27]. Note that the well-known SLM is a special case of GLM which reads

 y=Ax+w, (2)

where is a Gaussian noise vector with mean and covariance matrix , i.e., where

denotes identity matrix of dimension

.

## Iii Unified Inference Framework

In this section we present a unified Bayesian inference framework, shown in Fig. 2, for the GLM problem. It consists of two modules: an SLM inference module A and a component-wise MMSE module B. The messages and (defined in (11), (10)) form the outputs of module A and inputs to module B, while the messages and (defined in (7), (6)) form the outputs of module B and inputs to module A. To perform GLM inference, we alternate between the two modules in a turbo manner[28]. Assuming the prior distribution of as Gaussian with mean

and variance

, module B performs component-wise MMSE estimate and outputs and . Module A then performs SLM inference over a pseudo-SLM and updates the outputs and to module B. This process continues iteratively until convergence or some predefined stopping criteria is satisfied, where one iteration refers to one pass of messages from B to A and then back to B.

Specifically, in the -th iteration, the inputs and to module B are treated as the prior mean and variance of , respectively. Assuming the prior is Gaussian, the posterior mean and variance of can be computed in module B by component-wise MMSE estimate, i.e.,

 zposta,B(t) =E(za|zexta,A(t−1),v% exta,A(t−1)), (3) vposta,B(t) =Var(za|zexta,A(t−1),v% exta,A(t−1)), (4)

where the expectation is taken w.r.t. the posterior distribution

 qt−1z(za)∝p(ya|za)N(za;zexta,A(t−1),vexta,A(t−1)). (5)

According to the turbo principle [28], the posterior mean and variance cannot be directly used as the output message since they incorporate information of the input messages. Instead, the so-called extrinsic mean and variance of are calculated by excluding the contribution of input messages and , which are given as [29]

 ~σ2a(t) =(1vposta,B(t)−1vexta,A(t−1))−1, (6) ~ya(t) =~σ2a(t)(zposta,B(t)vposta,B(t)−zexta,A(t−1)vexta,A(t−1)). (7)

For more details of the turbo principle and extrinsic information, please refer to [28, 30]. Then in module A, similar as [30], can be viewed as the pseudo-observation of an equivalent additive Gaussian channel with as input, i.e.,

 ~ya(t)=za+~wa(t), (8)

where . In matrix form, (8) can be rewritten as an SLM form

 ~y(t)=Ax+~w(t), (9)

where . The underlying theme in (8) is to approximate the estimation error () as Gaussian noise whose mean is zero and variance is the extrinsic variance . Such idea was first used (to our knowledge) in [30] (formula (47) in [30]). Rigorous theoretical analysis on this will be future work.

As a result, the original GLM problem reduces to an SLM problem so that one can estimate by performing SLM inference over (9) to obtain the posterior estimates. Specially, if GLM itself reduces to SLM, then from (6) and (7), we obtain , , which is exactly the original SLM problem. Given the posterior mean and variance estimates of , the posterior mean and variance of can be obtained using the constraint . Note that the specific method to calculate and may vary for different SLM inference methods, as shown in Section IV. Given and , the extrinsic mean and variance of can be computed as [29]

 vexta,A(t) =(1vposta,A(t)−1~σ2a(t))−1, (10) zexta,A(t) =vexta,A(t)(zposta,A(t)vposta,A(t)−~ya(t)~σ2a(t)), (11)

which then form the input messages to module B in the next iteration. Note that (10) - (11) and (6) - (7) apply the same rule but are different operations in different modules.

This process continues until convergence or some predefined stopping criteria is satisfied. The corresponding algorithm framework is shown in Algorithm 1. Note that in step 3), one or more iterations can be performed for iterative SLM methods. Under such framework, a variety of inference methods for SLM can be easily extended to GLM. As specific instances, we will show in Section IV how AMP, VAMP, and SBL can be extended to GLM within this framework.

## Iv Extending AMP, VAMP, and SBL to GLM

### Iv-a From AMP to Gr-AMP and GAMP

First review AMP for SLM. Many variants of AMP exist and in this letter we adopt the Bayesian version in [23] and [4]. Assuming the prior distribution , the factor graph for SLM (2) is shown in Fig. 3 (a), where

denotes the Gaussian distribution

. Denote by the message from to in the -th iteration, with mean and variance of being and . Then the message from to is , with mean and variance of being and , respectively. Define

 Σi(t) ≜(∑a1τta→i)−1,ri(t)≜Σi(t)(∑a^xta→iτta→i)−1, (12) Za(t) ≜∑iAai^xti→a,Va(t)≜∑iA2aiτti→a. (13)

By the central limit theorem and neglecting high order terms, the resultant AMP is shown in Algorithm

2. Note that the approximation in AMP is asymptotically exact in large system limit, i.e., with [23, 4]. The expectation operation in the posterior mean and variance in line (D3) and (D4) are w.r.t. the posterior distribution For more details, refer to [23, 4].

Then we show how to extend AMP to GLM under the unified framework in Section III. As shown in Algorithm 1, the key lies in calculating the extrinsic mean and extrinsic variance . For AMP, fortunately, these two values have already been calculated, as shown in Lemma 1.

###### Lemma 1.

After performing AMP in module A, the output extrinsic mean and variance to module B can be computed as

 zexta,A(t)=Za(t),vexta,A(t)=Va(t) , (14)

where , are the results of AMP in line (D6) and (D5) of Algorithm 2, respectively.

###### Proof:

From (10) and (11), the posterior mean and variance of are needed to calculate and . However, in original AMP, they are not explicitly given. To this end, we present an alternative factor graph for SLM in Fig. 3 (b), where denotes the Dirac function and denotes Gaussian distribution . The two factor graph representations in Fig. 3 are equivalent so that the message from to is precisely . Thus, the message from to is Recall the definitions of and in (13). As in AMP, according to the central limit theorem,

can be approximated as Gaussian random variable whose mean and variance are

and , i.e., . Since the message from to is , the posterior distribution of can be calculated as where

 vposta,A(t) =(1Va(t)+1~σ2a(t))−1, (15) zposta,A(t) =vposta,A(t)(Za(t)Va(t)+~ya(t)~σ2a(t)). (16)

Substituting (15), (16) into (10), (11), we obtain and . In the derivation of AMP [23, 4], and are computed as (D5) and (D6) of Algorithm 2 in large system limit, which completes the proof. ∎

According to Lemma 1 and the framework in Algorithm 1, we readily obtain a generalized version of AMP called Gr-AMP, as shown in Algorithm 3.

Note that the proposed Gr-AMP in Algorithm 3 can be viewed as a general form of the well-known GAMP [23]. In particular, if , Gr-AMP becomes equivalent to the original GAMP. To prove this, in step 2) of Algorithm 3, we first obtain the equivalent noise variance and observation from (6) and (7) with and . Then, in step 3) of Algorithm 3, one iteration of AMP is performed by simply replacing , in Algorithm 2 with , . After some algebra and the definitions of two intermediate variables and in (17) and (18), the -th iteration of the proposed Gr-AMP in Algorithm 3 with becomes

 ^sa(t−1) =zposta,B(t)−Za(t−1)Va(t−1), (17) τsa(t−1) =Va(t−1)−vposta,B(t)V2a(t−1), (18)
 Σi(t−1) =[∑aA2aiτsa(t−1)]−1, (19) ri(t−1) =^xi(t−1)+Σi(t−1)∑aAai^sa(t−1), (20) ^xi(t) =E(xi|ri(t−1),Σi(t−1)), (21) τi(t) =Var(xi|ri(t−1),Σi(t−1)), (22) Va(t) =∑iA2aiτi(t), (23) Za(t) =∑iAai^xi(t)−Va(t)^sa(t−1). (24)

Recalling the definitions (3) and (4), it can be seen that Gr-AMP with is equivalent to the original GAMP [23, 24]. As a result, the proposed framework provides a new perspective on GAMP and leads to a concise derivation.

### Iv-B From VAMP and SBL to Gr-VAMP and Gr-SBL

For VAMP and SBL, the values of and are not already calculated as in AMP. However, the posterior mean and covariance matrix of , denoted as and , respectively, can be obtained in the linear MMSE (LMMSE) operation of VAMP and SBL (refer to [13], [25], and [26]). Thus, the posterior mean and variance of are calculated as , , where denotes the trace of matrix . Then, we obtain and variance from (10) and (11). The resulting generalized VAMP (Gr-VAMP) and generalized SBL (Gr-SBL) are shown in Algorithm 4.

### Iv-C Applications to 1-bit Compressed Sensing

In this subsection, we evaluate the performances of Gr-AMP, Gr-VAMP, and Gr-SBL for 1-bit CS. Assume that the sparse signal follows i.i.d. Bernoulli-Gaussian distribution . The 1-bit quantized measurements are obtained by , where is the component-wise sign of each element. We consider the case for sparse ratio . The signal to noise ratio (SNR) is set to be 50 dB. To test the performances for general matrix , ill-conditioned matrices with various condition numbers are considered following [13]. Specifically,

is constructed from the singular value decomposition (SVD)

, where orthogonal matrices and were uniformly drawn w.r.t. Harr measure. The singular values were chosen to achieve a desired condition number , where and are the maximum and minimum singular values of , respectively. As in [14], the recovery performance is assessed using the debiased normalized mean square error (dNMSE) in decibels, i.e.,. In all simulations, and the results are averaged over 100 realizations. The results of GAMP and the algorithm in [14], denoted as GVAMP, are also given for comparison.

Fig. 4 (a) and (b) show the results of dNMSE versus the number of algorithm iterations when the condition number , respectively. When , all algorithms converge to the same dNMSE value except Gr-SBL which has slightly worse performance. When , however, both Gr-AMP and GAMP fail while other algorithms still work well. In Fig. 4 (c), the dNMSE is evaluated for various condition number

ranging from 1 (i.e., row-orthogonal matrix

) to

(i.e., highly ill-conditioned matrix

). It is seen that as the condition number increases, the recovery performances degrade smoothly for Gr-VAMP, GVAMP, and Gr-SBL while both Gr-AMP and GAMP diverge for even mild condition number.

## V Conclusion

In this letter a unified Bayesian inference framework for GLM is presented whereby a variety of SLM algorithms can be easily extended to handle the GLM problem. Specific instances elucidated under this framework are the extensions of AMP, VAMP, and SBL, which unveils intimate connections between some established algorithms and leads to novel extensions for some other SLM algorithms. Numerical results for 1-bit CS demonstrate the effectiveness of this framework.

## Acknowledgments

The authors sincerely thank P. Schniter for valuable discussions and the anonymous reviewers for valuable comments.

### -a Details of Gr-VAMP

The vector approximate message passing (VAMP) algorithm[13] is a robust version of the well-known AMP for standard linear model (SLM). Define the LMMSE estimator and component-wise denoiser function as

 g2(r2k,γ2k) =(ATAσ2+γ2kIN)−1(ATyσ2+γ2kr2k), (25) g1(r1k,γ1k) =E(xi|r1k,γ1k), (26)

where the expectation in calculating the posterior mean is with respect to (w.r.t.) the distribution

Then, the VAMP algorithm is shown in Algorithm 5. Note that here we only show the MMSE version of VAMP. However, it also applies to the SVD version of VAMP. For more details, please refer to [13]. It should also be noted that the denoising step and the LMMSE step in Algorithm 5 can be exchanged in practical implementations.

Note that line 12 of Algorithm 5 represents

 (27)

where denotes the trace of matrix . Define

 C2k=(ATAσ2+γ2kIN)−1, (28)

then .

As shown in [13], line 11 and 12 of Algorithm 5 can be recognized as the MMSE estimate of under likelihood and prior . Specifically, the posterior mean and covariance matrix are and , respectively.

As shown in the unified framework in Algorithm 1, the key lies in calculating the extrinsic values of and . To this end, from (10) and (11), we should first calculate the posterior mean vector and variance vector of from the results of VAMP. As shown in Algorithm 5, the posterior mean and covariance matrix of can be obtained via the LMMSE step of VAMP. Since , the posterior mean and covariance matrix of are

 zpostA =Aˆx2k, (29) Cz =AC2kAT. (30)

As a result, variance vector of can be calculated as the diagonal vector of covariance matrix , i.e.,

 vpostA=diag(AC2kAT). (31)

Note that for AMP, the covariance matrix of is unavailable, so that we could not directly calculate and as (29) and (31), respectively. In fact, only the diagonal elements of the covariance matrix of are available in AMP.

Moreover, in VAMP[13], the variances values are averaged before being further processed. To be consistent with the implementation of VAMP, the variance vector of are also set to be the average value of the diagonal elements of covariance matrix , i.e.,

 vpostA=1Mtrace(AC2kAT). (32)

Given and , we can then obtain and variance from (10) and (11). Finally, from the unified algorithm framework in Algorithm 1, we obtain the generalized VAMP (Gr-VAMP) as shown in Algorithm 4.

### -B Details of Gr-SBL

The sparse Bayesian learning (SBL) algorithm[25][26] is a well-known Bayesian inference method for compressed sensing for SLM (2). In SBL, the prior distribution of signal is assumed to be i.i.d Gaussian, i.e.,

 (33)

where are non-negative hyper-parameters controlling the sparsity of the signal

 p0(α)=∏iGamma(xi|a,b)=∏iΓ(a)−1baαaie−bαi, (34)

where is the Gamma function.

Then, using the expectation maximization (EM) method, the conventional SBL algorithm is shown in Algorithm

6. Here we assumed knowledge of noise variance. In fact, SBL can also handle the case with unknown noise variance. For more details, please refer to [25][26].

It is seen that in SBL, line 4 and 5 of Algorithm 6 can be recognized as the LMMSE estimate of under likelihood and Gaussian prior given . Thus, the posterior mean and covariance matrix of can be calculated as and in Algorithm 6, respectively.

Under the unified framework in Algorithm 1, and similar to the derivation of Gr-VAMP, we first obtain the posterior mean and covariance matrix of as

 zpostA =Aˆx2k, (35) Cz =AC2kAT. (36)

Then, the variance vector of can be calculated to be the average of the diagonal vector of covariance matrix , i.e.,

 vpostA=1Mtrace(AC2kAT). (37)

Note that the average in (37) is not essential and we can also simply choose the diagonal elements of , i.e.,

 vpostA=diag(AC2kAT). (38)

Given and