# An Analysis of State Evolution for Approximate Message Passing with Side Information

A common goal in many research areas is to reconstruct an unknown signal x from noisy linear measurements. Approximate message passing (AMP) is a class of low-complexity algorithms for efficiently solving such high-dimensional regression tasks. Often, it is the case that side information (SI) is available during reconstruction. For this reason a novel algorithmic framework that incorporates SI into AMP, referred to as approximate message passing with side information (AMP-SI), has been recently introduced. An attractive feature of AMP is that when the elements of the signal are exchangeable, the entries of the measurement matrix are independent and identically distributed (i.i.d.) Gaussian, and the denoiser applies the same non-linearity at each entry, the performance of AMP can be predicted accurately by a scalar iteration referred to as state evolution (SE). However, the AMP-SI framework uses different entry-wise scalar denoisers, based on the entry-wise level of the SI, and therefore is not supported by the standard AMP theory. In this work, we provide rigorous performance guarantees for AMP-SI when the input signal and SI are drawn i.i.d. according to some joint distribution subject to finite moment constraints. Moreover, we provide numerical examples to support the theory which demonstrate empirically that the SE can predict the AMP-SI mean square error accurately.

• 3 publications
• 21 publications
• 13 publications
03/25/2020

### Rigorous State Evolution Analysis for Approximate Message Passing with Side Information

A common goal in many research areas is to reconstruct an unknown signal...
05/10/2019

### Analysis of Approximate Message Passing with Non-Separable Denoisers and Markov Random Field Priors

Approximate message passing (AMP) is a class of low-complexity, scalable...
03/28/2018

### Generating Functional Analysis of Iterative Sparse Signal Recovery Algorithms with Divergence-Free Estimators

Approximate message passing (AMP) is an effective iterative sparse recov...
06/06/2016

### Finite Sample Analysis of Approximate Message Passing Algorithms

Approximate message passing (AMP) refers to a class of efficient algorit...
07/09/2019

### A Simple Derivation of AMP and its State Evolution via First-Order Cancellation

We consider the linear regression problem, where the goal is to recover ...
02/26/2016

### Learning and Free Energies for Vector Approximate Message Passing

Vector approximate message passing (VAMP) is a computationally simple ap...
07/12/2018

### An Approximate Message Passing Framework for Side Information

Approximate message passing (AMP) methods have gained recent traction in...

## I Introduction

High-dimensional linear regression is a well-studied model that has been used in many applications including compressed sensing

[1], imaging[2]

, and machine learning and statistics

[3]. The unknown signal is viewed through the linear model:

 y=Ax+w, (1)

where are the measurements, is a known measurement matrix, and

is measurement noise. The goal is to estimate the unknown signal

having knowledge only of the noisy measurements and the measurement matrix . When the problem is under-determined (i.e., ), in order for reconstruction to be successful, it is necessary to exploit structural or probabilistic characteristics of the input signal . Often a prior distribution on the input signal is assumed, and in this case approximate message passing (AMP) algorithms[1] can be used for the reconstruction task.

AMP [1, 4] is a class of low-complexity algorithms for efficiently solving high-dimensional regression tasks (1

). AMP works by iteratively generating estimates of the unknown input vector,

, using a possibly non-linear denoiser function tailored to any prior knowledge about . One favorable feature of AMP is that under some technical conditions on the measurement matrix and , the observations at each iteration of the algorithm are almost surely equal in distribution to plus independent and identically distributed (i.i.d.) Gaussian noise in the large system limit.

AMP with Side Information (AMP-SI): In information theory [5], when different communication systems share side information (SI), overall communication can become more efficient. Recently [6, 7], a novel algorithmic framework, referred to as AMP-SI, has been introduced for incorporating SI into AMP for high-dimensional regression tasks (1). AMP-SI has been empirically demonstrated to have good reconstruction quality and is easy to use. For example, we have proposed to use AMP-SI for channel estimation in emerging millimeter wave communication systems [8], where the time dynamics of the channel structure allow previous channel estimates to be used as SI when estimating the current channel structure [7].

We model each entry of the observed SI, denoted by , as depending statistically on the corresponding entry of the unknown signal

through some joint probability density function (pdf),

. AMP-SI uses a conditional denoiser, , to incorporate SI,

 ηt(a,b)=E[X|X+λtZ=a,˜X=b]. (2)

The AMP-SI algorithm iteratively updates estimates of the input signal : let , the all-zeros vector, then

 rt =y−Axt+rt−1δ⟨η′t−1(xt−1+ATrt−1,˜x)⟩, (3) xt+1 =ηt(xt+ATrt,˜x), (4)

where is the estimate of input signal at iteration t, and the denoiser in (2) is applied entry-wise to vector inputs. The derivative is with respect to the first input, and is the empirical average of a vector , i.e., . Using the denoiser in (2), the AMP-SI algorithm (3)-(4) provides the minimum mean squared error (MMSE) estimate of the signal when SI is available [6].

State Evolution (SE): It has been proven that the performance of AMP, as measured, for example, by the normalized squared -error between the estimate and true signal , can be accurately predicted by a scalar recursion referred as SE[9, 10] when the measurement matrix A is i.i.d. Gaussian under various assumptions on the elements of the signal. The SE equation for AMP-SI is as follows. Assume the entries of the noise are i.i.d.  with , and let . Then for ,

 λ2t=σ2w+1δE[(ηt−1(X+λt−1Z,˜X)−X)2], (5)

where are independent of , where we use

to denote a Gaussian distribution with mean

and variance

.

Considering AMP-SI (3)-(4), however, we cannot directly apply the existing AMP theoretical results [9, 10], as the conditional denoiser (2) depends on the index through the SI, meaning that different scalar denoisers will be used at different indices within the AMP-SI iterations. Recent results [11], however, extend the asymptotic SE analysis to a larger class of possible denoisers, allowing, for example, each element of the input to use a different non-linear denoiser as is the case in AMP-SI. We employ these results to rigorously relate the SE presented in (5) to the AMP-SI algorithm in (3)-(4).

Related Work:

While integrating SI into reconstruction algorithms is not new, AMP-SI introduces a unified framework within AMP supporting arbitrary signal and SI dependencies. Prior work using SI has been either heuristic, limited to specific applications, or outside the AMP framework.

For example, Wang and Liang [12] integrate SI into AMP for a specific signal prior density, but the method is difficult to apply to other signal models. Ziniel and Schniter [13] develop an AMP-based reconstruction algorithm for a time-varying signal model based on Markov processes for the support and amplitude. This signal model is easily incorporated into the AMP-SI framework as discussed in the analysis of the birth-death-drift model of [6, 7]. Manoel et al. implement an AMP-based algorithm in which the input signal is repeatedly reconstructed in a streaming fashion, and information from past reconstruction attempts is aggregated into a prior, thus improving ongoing reconstruction results [14]. This reconstruction scheme resembles that of AMP-SI, in particular when the Bernoulli-Gaussian model is used (see Section II-B).

Contribution and Outline: Ma et al. use numerical experiments to show that SE (5) accurately tracks the performance of AMP-SI (3)-(4[7], as was shown rigorously for standard AMP. Ma et al. conjecture that rigorous theoretical guarantees can be given for AMP-SI as well [7]. In this work, we analyze AMP-SI performance when the input signal and SI are drawn i.i.d. according to a general pdf obeying some finite moment conditions, the AMP-SI denoiser (2) is Lipschitz, and the measurement matrix is i.i.d. Gaussian.

In Section II, we give the main results, examples for various signal and SI models, and numerical experiments comparing the empirical performance of AMP-SI and the SE predictions. The proof of our main theorem is provided in Section III.

## Ii Main Results

### Ii-a Main Theorem

Our main result provides AMP-SI performance guarantees when considering pseudo-Lipschitzloss functions, which we define in the following.

###### Definition II.1.

Pseudo-Lipschitz functions [9]: For and any , a function is pseudo-Lipschitz of order if there exists a constant , referred to as the pseudo-Lipschitz constant of , such that for any ,

 |ϕ(x)−ϕ(y)|≤L(1+||x||k−1√n+||y||k−1√n)||x−y||√n. (6)

For , this definition coincides with the standard definition of a Lipschitz function. Throughout this work, denotes the Euclidean norm.

We are now ready to state our main result. Throughout the paper we let denote convergence in probability.

###### Theorem II.1.

For any order pseudo-Lipschitz functions and , assume the following.

• The measurement matrix has i.i.d. Gaussian entries with mean and variance .

• The noise is i.i.d.  with finite .

• The signal and SI are sampled i.i.d. from with finite , finite , and finite .

• For , the denoisers defined in (2) are Lipschitz continuous, meaning for scalars , and constant ,

 |ηt(a1,b1)−ηt(a2,b2)|≤L||(a1,b1)−(a2,b2)||.

Then,

 limm1mm∑i=1ϕ(rti,wi)p=E[ϕ(W+√λ2t−σ2wZ1,W)], (7) limn1nn∑i=1ψ(xti+[ATrt]i,xi,˜xi)p=E[ψ(X+λtZ2,X,˜X)],

where are standard Gaussians, independent of and . In the above, and are defined in the AMP-SI recursion (3)-(4), and in the SE (5).

Section III contains the proof of Theorem  II.1. The proof follows from Berthier et al. [11, Theorem 14]

and the strong law of large numbers. The main technical details involve showing that our assumptions

are enough to satisfy the assumptions needed for [11, Theorem 14]. The details are given in Section III.

As a concrete example of how Theorem II.1 provides performance guarantees for AMP-SI, let us consider a few interesting pseudo-Lipschitz loss functions.

###### Corollary II.1.1.

Under assumptions , letting be the loss, , then by Theorem II.1,

 limn→∞1n||xt+ATrt−x||2p=λ2t,

where is defined in (5). Similarly if is defined as , then by Theorem II.1

 limn→∞1n||xt+1−x||2p=δ(λ2t−σ2w).

### Ii-B Examples

Next, we consider a few signal and SI models to show how one can derive the denoiser in (2), use this to construct the AMP-SI algorithm and the SE, and apply Theorem II.1. Before we get to the examples we state a lemma that allows us know about how functions with bounded derivative are Lipschitz.

###### Lemma II.2.

A function having bounded derivatives,

is Lipschitz continuous with Lipschitz constant .

###### Proof.

The result follows using the Triangle Inequality and Cauchy-Schwarz,

 |ϕ(x1,y1)−ϕ(x2,y2)|=|ϕ(x1,y1)−ϕ(x1,y2)+ϕ(x1,y2)−ϕ(x2,y2)|≤|ϕ(x1,y1)−ϕ(x1,y2)|+|ϕ(x1,y2)−ϕ(x2,y2)|≤D2|y1−y2|+D1|x1−x2|≤√D22+D21√(y1−y2)2+(x1−x2)2=√D22+D21||(x1,y1)−(x2,y2)||. (8)

#### Ii-B1 Gaussian-Gaussian Signal and SI

In this model, referred to as the GG model henceforth, the signal has i.i.d. Gaussian entries with zero mean and finite variance and we have access to SI in the form of the signal with additive white Gaussian noise (AWGN). The signal, , and SI, , are related by

 ˜X=X+N(0,σ2I). (9)

In this case, the AMP-SI denoiser (2) equals [7]

 ηt(a,b) =E[X∣∣X+λtZ=a,˜X=b] (10) =σ2xσ2a+σ2xλ2tbσ2x(σ2+λ2t)+σ2λ2t−1.

Then the SE (5) can be computed as

 (11)

We note that the denoiser in (10) is Lipschitz continuous as a result of Lemma II.2 because

 ∣∣∂∂aηt(a,b)∣∣=∣∣σ2xσ2σ2x(σ2+λ2t)+σ2λ2t∣∣≤1,

and

 ∣∣∂∂bηt(a,b)∣∣=∣∣σ2xλ2tσ2x(σ2+λ2t)+σ2λ2t∣∣≤1,

and therefore the assumptions are satisfied in the GG case and we can apply Thoerem II.1.

#### Ii-B2 Bernoulli-Gaussian Signal and SI

The Bernoulli-Gaussian (BG) model reflects scenario in which one wishes to recover a sparse signal and has access to SI in the form of the signal with AWGN as in (9). In this model, each entry of the signal is independently generated according to , where is the Dirac delta function at . In words, the entries of the signal independently take the value with probability and are with probability . In this case, the AMP-SI denoiser (2) equals [7]

 ηt(a,b) =E[X∣∣X+λtZ=a,˜X=b] (12) =Pr(X≠0|a,b)E[X|a,b,X≠0] =Pr(X≠0|a,b)σ2a+λ2tbσ2+λ2t+σ2λ2t,

where, letting be the zero-mean Gaussian density with variance evaluated at , and defining ,

 (13)

where we denote

 Ta,b:=(1−ϵ)ρλ2t(a)ρσ2(b)ϵρ1+σ2(b)ρσ21+σ2+λ2t(b1+σ2−a)=(1−ϵϵ) ⎷σ2+λ2t+σ2λ2tλ2tσ2exp{−(σ2a+λ2t−1b)22σ2λ2t−1(σ2+λ2t+σ2λ2t)}=(1−ϵϵ)νt√2πλ2tσ2ρνt(σ2a+λ2tb), (14)

Then the SE (5) can be computed as

 (15)

We again use Lemma II.2 to show that the denoiser defined in (12) and (13) is Lipschitz continuous so that the assumptions are satisfied in the BG case and we can apply Thoerem II.1. We study the partial derivatives. Denote

 fa,b:=σ2a+λ2tbσ2+λ2t+σ2λ2t. (16)

Combining (13) and (14) and (16),

 ηt(a,b) =(1+Ta,b)−1fa,b.

Then,

 (17)

Now we show upperbounds for the two terms of (17) separately. For the first term, we see that , so

 (1+2Ta,b)(1+Ta,b)2∣∣∂fa,b∂a∣∣≤1.

Now we consider the second term of

Consider the second term of (17). First we note that

Then from (14) and (16),

 Ta,bfa,b=(1−ϵϵ)√2π(σ2a+λ2tb)ρνt(σ2a+λ2tb),

then using that , we have

 (18)

To upper bound the above, we use when , and so

 ρτ2(x)=1√2πτ2exp{−x22τ2}≤√2π(τ2τ2+x2).

Using this in (18), we find

 ∣∣∂∂a[Ta,bfa,b]∣∣≤2σ2√νt(1−ϵϵ)|νt−(σ2a+λ2tb)2|2νt+(σ2a+λ2tb)2≤2σ2√νt(1−ϵϵ)≤2(1−ϵ)σwϵ, (19)

where in the final inequality we use by (15), and

 σ2√νt =σλt√σ2+λ2t+σ2λ2t (20) =1λt√1+λ2tσ2+λ2t≤1λt.

Using the above in (17), we have

 ∣∣∂∂aηt(a,b)∣∣≤1+2(1−ϵ)σwϵ.

As in (17) we can show

 ∣∣∂∂bηt(a,b)∣∣≤(1+2Ta,b)(1+Ta,b)2∣∣[∂∂bfa,b]∣∣+1(1+Ta,b)2∣∣Ta,b[∂∂bfa,b]+[∂∂bTa,b]fa,b∣∣,

Then,

 (1+2Ta,b)(1+Ta,b)2∣∣∂∂bfa,b∣∣≤1,

and a bound as in (18) - (19) gives

 1(1+Ta,b)2∣∣Ta,b[∂∂bfa,b]+[∂∂bTa,b]fa,b∣∣≤∣∣∂∂b[Ta,bfa,b]∣∣≤2λ2t√νt(1−ϵϵ)|νt−(σ2a+λ2tb)2|2νt+(σ2a+λ2tb)2≤2λ2t√νt(1−ϵϵ)≤2(1−ϵ)σϵ.

### Ii-C Numerical Examples

Finally, we provide numerical results to compare the empirical mean square error (MSE) performance of AMP-SI and the performance predicted by SE. Fig. 1 shows the MSE achieved by AMP-SI in the GG scenario and the SE prediction of its performance. In this example, the signal variance , the measurement noise variance , the variance of AWGN in SI . We averaged over 10 trials of a GG recovery problem for empirical results of AMP-SI. The comparison in Fig. 1(a), Fig. 1(b) and Fig. 1(c) given by three different signal length. For smaller there is some gap between the empirical MSE and the SE prediction, as shown in Fig. 1 for , but the gap shrinks as is increased. The results show the empirical MSE tracks the SE prediction nicely.

Fig. 2 shows the MSE achieved by AMP-SI in the BG scenario, and the SE prediction of its performance. We again averaged over 10 trials of a BG recovery problem for empirical results of AMP-SI. The signal length , , the measurement noise variance , and , where of the entries in the signal are nonzero. We vary the variance of AWGN in SI from , , and . The results show that SE can predict the MSE achieved by AMP-SI at every iteration.

## Iii Proof of Theorem ii.1

Proof of The proof Theorem II.1 contains two steps. In the first step we use Berthier et al[11, Theorem 14] and in the second step we make an appeal to the strong law of large numbers (SLLN). We remind the reader of the strong law:

###### Definition III.1.

Strong Law of Large Numbers [15]: Let

be a sequence of i.i.d. random variables with finite mean

. Then

 Pr=(limn→∞1n(X1+X2+...+Xn)=μ)=1, (21)

In words, the partial averages converge almost surely to .

### Iii-a Step 1

We will make use of Berthier et al. [11, Theorem 14], restated here for convenience. Before proceeding we include a definition of uniformly pseudo-Lipschitz functions, that generalizes the ideal of pseudo-Lipschitz functions given in Definition II.1.

###### Definition III.2.

Uniformly pseudo-Lipschitz functions [11]: A sequence (in ) of pseudo-Lipschitz functions is called uniformly pseudo-Lipschitz of order if, denoting by is the pseudo-Lipschitz constant of order of , we have for each and .

Berthier et al. [11, Theorem 14] requires the following assumptions:

• The measurement matrix has Gaussian entries with i.i.d. mean and variance .

• Define a sequence of denoisers to be those that apply the denoiser defined in (2) element-wise on vector input: For each , are uniformly Lipschitz. A function is uniformly Lipschitz in if the Lipschitz constant does not depend on .

• converges to a constant as .

• The limit is finite.

• For any iterations and for any covariance matrix , the following limits exist.

 limn→∞1nn∑i=1EZ[xiηt(xi+Zi,˜xi)]<∞, limn→∞1nn∑i=1EZ,Z′[ηt(xi+Zi,˜xi)ηs(xi+Z′i,˜xi)]<∞,

where , with

denoting the tensor product and

the identity matrix.

###### Theorem III.1.

Under the assumptions , for any sequences of uniformly pseudo-Lipschitz functions and ,

 limm(ϕm(rt,w)−EZ1[ϕm(w+√λ2t−σ2wZ1,w)])p=0, limn(ψn(xt+ATrt,x)−EZ2[ψn(x+λtZ2,x)])p=0,

where , , and are defined in the AMP-SI recursion (3)-(4), and in the SE (5).

Now we demonstrate that our assumptions stated in Section II are enough to satisfy the assumptions needed to apply Theorem III.1.

Assumptions (A1) and (C1) are identical. We will show that (C2) follows from (A4), (C4) follows from (A2), and (C3) follows from (A3). Finally we show (C5) and (C6) follow from (A3) and (A4).

First consider assumption (C2). The non-separable denoiser applies the AMP-SI denoiser defined in (2) entrywise to its vector inputs. From (A4), are Lipschitz continuous. Thus, for length- vectors , and fixed SI ,

 ||˜ηtn(x1)−˜ηtn(x2)||2=n∑i=1(ηt([x1]i,˜xi)−ηt([x2]i,˜xi))2≤n∑i=1L2([x1]i−[x2]i)2=L2||x1−x2||2,

and so

 ||˜ηtn(x1)−˜ηtn(x2)||≤L||x1−x2||.

The Lipschitz constant does not depend on , so is uniformly Lipschitz.

Now consider assumption (C4). From (A2), the measurement noise in (1) has i.i.d. entries with zero-mean and finite . Then applying Definition III.1,

 limm→∞||w||22m=limm→∞1mm∑i=1w2i=σ2w<∞,

where we have used that follows from for . The proof of (C3) similarly follows using the SLLN and the finiteness of given in assumption (A3).

We now show that (C5) is met. Recall . Define for . By assumption (A3), the signal and side information are sampled i.i.d. from the joint density . It follows that are also i.i.d., so by Definition III.1 if where independent of , then

 limn→∞1nn∑i=1xiEZ[ηt(xi+Zi,˜xi)]=E[Xηt(X+Z,˜X)],

We will now show that .

First note that (A4) assumes is Lipschitz, meaning for scalars and some constant ,

 |ηt(a1,b1)−ηt(a2,b2)|≤L||(a1,b1)−(a2,b2)|| ≤L|a1−a2|+L|b1−b2|.

Therefore letting we have

 |ηt(a1,b1)|−|ηt(0,0)|≤|ηt(a1,b1)−ηt(0,0)|≤L|a1|+L|b1|,

giving the follows upper bound for constant ,

 |ηt(a1,b1)|≤L′(1+|a1|+|b1|). (22)

Now using (22) and the triangle inequality,

 E[Xηt(X+Z,˜X)]≤L′E[|X|(1+|X+Z|+|˜X|)] (23) ≤L′(E|X|+E[X2]+E|X|E|Z|+E|X˜X|).

Finally, by assumption (A3) we have that and are all finite. Then noting that for any random variable, , we have for , meaning the boundednes of follows from (23) with assumption (A3).

The proof of (C6) follows similarly to the proof of (C5). Recall . Define for . By assumption (A3), the signal and side information are sampled i.i.d. from the joint density . It follows that are also i.i.d., so by Definition III.1 if where and , independent of , then

 limn→∞1nn∑i=1EZ,Z′[ηt(xi+Zi,˜xi)ηs(xi+Z′i,˜xi)] =E[ηt(X+Z,˜X)ηs(X+Z′,˜X)].

We will now show that . Using the bound (22),

 E[ηt(X+Z,˜X)ηs(X+Z′,˜X)] ≤E[|ηt(X+Z,˜X)||ηs(X+Z′,˜X)|] ≤L′2E[(1+|X+Z|+|˜X|)(1+|X+Z′|+|˜X|)].

Then using the triangle inequality,

 E[(1+|X+Z|+|˜X|)(1+|X+Z′|+|˜X|)] (24) ≤E[(1+|X|+|Z|+|˜X|)(1+|X|+|Z′|+|˜X|)] =1+2E[|X|]+2E[|˜X|]+2E[|X||˜X|]+E[X2]+E[˜X2] +E[|X||Z′|]+E[|X||Z|]+E[|˜X||Z′|]+E[|˜X||Z|] +E|Z|+E[|Z′