I Introduction
Highdimensional linear regression is a wellstudied model that has been used in many applications including compressed sensing
[1], imaging[2], and machine learning and statistics
[3]. The unknown signal is viewed through the linear model:(1) 
where are the measurements, is a known measurement matrix, and
is measurement noise. The goal is to estimate the unknown signal
having knowledge only of the noisy measurements and the measurement matrix . When the problem is underdetermined (i.e., ), in order for reconstruction to be successful, it is necessary to exploit structural or probabilistic characteristics of the input signal . Often a prior distribution on the input signal is assumed, and in this case approximate message passing (AMP) algorithms[1] can be used for the reconstruction task.AMP [1, 4] is a class of lowcomplexity algorithms for efficiently solving highdimensional regression tasks (1
). AMP works by iteratively generating estimates of the unknown input vector,
, using a possibly nonlinear denoiser function tailored to any prior knowledge about . One favorable feature of AMP is that under some technical conditions on the measurement matrix and , the observations at each iteration of the algorithm are almost surely equal in distribution to plus independent and identically distributed (i.i.d.) Gaussian noise in the large system limit.AMP with Side Information (AMPSI): In information theory [5], when different communication systems share side information (SI), overall communication can become more efficient. Recently [6, 7], a novel algorithmic framework, referred to as AMPSI, has been introduced for incorporating SI into AMP for highdimensional regression tasks (1). AMPSI has been empirically demonstrated to have good reconstruction quality and is easy to use. For example, we have proposed to use AMPSI for channel estimation in emerging millimeter wave communication systems [8], where the time dynamics of the channel structure allow previous channel estimates to be used as SI when estimating the current channel structure [7].
We model each entry of the observed SI, denoted by , as depending statistically on the corresponding entry of the unknown signal
through some joint probability density function (pdf),
. AMPSI uses a conditional denoiser, , to incorporate SI,(2) 
The AMPSI algorithm iteratively updates estimates of the input signal : let , the allzeros vector, then
(3)  
(4) 
where is the estimate of input signal at iteration t, and the denoiser in (2) is applied entrywise to vector inputs. The derivative is with respect to the first input, and is the empirical average of a vector , i.e., . Using the denoiser in (2), the AMPSI algorithm (3)(4) provides the minimum mean squared error (MMSE) estimate of the signal when SI is available [6].
State Evolution (SE): It has been proven that the performance of AMP, as measured, for example, by the normalized squared error between the estimate and true signal , can be accurately predicted by a scalar recursion referred as SE[9, 10] when the measurement matrix A is i.i.d. Gaussian under various assumptions on the elements of the signal. The SE equation for AMPSI is as follows. Assume the entries of the noise are i.i.d. with , and let . Then for ,
(5) 
where are independent of , where we use
to denote a Gaussian distribution with mean
and variance
.Considering AMPSI (3)(4), however, we cannot directly apply the existing AMP theoretical results [9, 10], as the conditional denoiser (2) depends on the index through the SI, meaning that different scalar denoisers will be used at different indices within the AMPSI iterations. Recent results [11], however, extend the asymptotic SE analysis to a larger class of possible denoisers, allowing, for example, each element of the input to use a different nonlinear denoiser as is the case in AMPSI. We employ these results to rigorously relate the SE presented in (5) to the AMPSI algorithm in (3)(4).
Related Work:
While integrating SI into reconstruction algorithms is not new, AMPSI introduces a unified framework within AMP supporting arbitrary signal and SI dependencies. Prior work using SI has been either heuristic, limited to specific applications, or outside the AMP framework.
For example, Wang and Liang [12] integrate SI into AMP for a specific signal prior density, but the method is difficult to apply to other signal models. Ziniel and Schniter [13] develop an AMPbased reconstruction algorithm for a timevarying signal model based on Markov processes for the support and amplitude. This signal model is easily incorporated into the AMPSI framework as discussed in the analysis of the birthdeathdrift model of [6, 7]. Manoel et al. implement an AMPbased algorithm in which the input signal is repeatedly reconstructed in a streaming fashion, and information from past reconstruction attempts is aggregated into a prior, thus improving ongoing reconstruction results [14]. This reconstruction scheme resembles that of AMPSI, in particular when the BernoulliGaussian model is used (see Section IIB).
Contribution and Outline: Ma et al. use numerical experiments to show that SE (5) accurately tracks the performance of AMPSI (3)(4) [7], as was shown rigorously for standard AMP. Ma et al. conjecture that rigorous theoretical guarantees can be given for AMPSI as well [7]. In this work, we analyze AMPSI performance when the input signal and SI are drawn i.i.d. according to a general pdf obeying some finite moment conditions, the AMPSI denoiser (2) is Lipschitz, and the measurement matrix is i.i.d. Gaussian.
Ii Main Results
Iia Main Theorem
Our main result provides AMPSI performance guarantees when considering pseudoLipschitzloss functions, which we define in the following.
Definition II.1.
PseudoLipschitz functions [9]: For and any , a function is pseudoLipschitz of order if there exists a constant , referred to as the pseudoLipschitz constant of , such that for any ,
(6) 
For , this definition coincides with the standard definition of a Lipschitz function. Throughout this work, denotes the Euclidean norm.
We are now ready to state our main result. Throughout the paper we let denote convergence in probability.
Theorem II.1.
For any order pseudoLipschitz functions and , assume the following.

The measurement matrix has i.i.d. Gaussian entries with mean and variance .

The noise is i.i.d. with finite .

The signal and SI are sampled i.i.d. from with finite , finite , and finite .

For , the denoisers defined in (2) are Lipschitz continuous, meaning for scalars , and constant ,
Then,
(7)  
where are standard Gaussians, independent of and . In the above, and are defined in the AMPSI recursion (3)(4), and in the SE (5).
Section III contains the proof of Theorem II.1. The proof follows from Berthier et al. [11, Theorem 14]
and the strong law of large numbers. The main technical details involve showing that our assumptions
are enough to satisfy the assumptions needed for [11, Theorem 14]. The details are given in Section III.As a concrete example of how Theorem II.1 provides performance guarantees for AMPSI, let us consider a few interesting pseudoLipschitz loss functions.
IiB Examples
Next, we consider a few signal and SI models to show how one can derive the denoiser in (2), use this to construct the AMPSI algorithm and the SE, and apply Theorem II.1. Before we get to the examples we state a lemma that allows us know about how functions with bounded derivative are Lipschitz.
Lemma II.2.
A function having bounded derivatives,
is Lipschitz continuous with Lipschitz constant .
Proof.
The result follows using the Triangle Inequality and CauchySchwarz,
(8) 
∎
IiB1 GaussianGaussian Signal and SI
In this model, referred to as the GG model henceforth, the signal has i.i.d. Gaussian entries with zero mean and finite variance and we have access to SI in the form of the signal with additive white Gaussian noise (AWGN). The signal, , and SI, , are related by
(9) 
In this case, the AMPSI denoiser (2) equals [7]
(10)  
Then the SE (5) can be computed as
(11) 
We note that the denoiser in (10) is Lipschitz continuous as a result of Lemma II.2 because
and
and therefore the assumptions are satisfied in the GG case and we can apply Thoerem II.1.
IiB2 BernoulliGaussian Signal and SI
The BernoulliGaussian (BG) model reflects scenario in which one wishes to recover a sparse signal and has access to SI in the form of the signal with AWGN as in (9). In this model, each entry of the signal is independently generated according to , where is the Dirac delta function at . In words, the entries of the signal independently take the value with probability and are with probability . In this case, the AMPSI denoiser (2) equals [7]
(12)  
where, letting be the zeromean Gaussian density with variance evaluated at , and defining ,
(13) 
where we denote
(14) 
Then the SE (5) can be computed as
(15) 
We again use Lemma II.2 to show that the denoiser defined in (12) and (13) is Lipschitz continuous so that the assumptions are satisfied in the BG case and we can apply Thoerem II.1. We study the partial derivatives. Denote
(16) 
Combining (13) and (14) and (16),
Then,
(17) 
Now we show upperbounds for the two terms of (17) separately. For the first term, we see that , so
Now we consider the second term of
Consider the second term of (17). First we note that
then using that , we have
(18) 
To upper bound the above, we use when , and so
Using this in (18), we find
(19) 
where in the final inequality we use by (15), and
(20)  
Using the above in (17), we have
As in (17) we can show
Then,
and a bound as in (18)  (19) gives
IiC Numerical Examples
Finally, we provide numerical results to compare the empirical mean square error (MSE) performance of AMPSI and the performance predicted by SE. Fig. 1 shows the MSE achieved by AMPSI in the GG scenario and the SE prediction of its performance. In this example, the signal variance , the measurement noise variance , the variance of AWGN in SI . We averaged over 10 trials of a GG recovery problem for empirical results of AMPSI. The comparison in Fig. 1(a), Fig. 1(b) and Fig. 1(c) given by three different signal length. For smaller there is some gap between the empirical MSE and the SE prediction, as shown in Fig. 1 for , but the gap shrinks as is increased. The results show the empirical MSE tracks the SE prediction nicely.
Fig. 2 shows the MSE achieved by AMPSI in the BG scenario, and the SE prediction of its performance. We again averaged over 10 trials of a BG recovery problem for empirical results of AMPSI. The signal length , , the measurement noise variance , and , where of the entries in the signal are nonzero. We vary the variance of AWGN in SI from , , and . The results show that SE can predict the MSE achieved by AMPSI at every iteration.
Iii Proof of Theorem ii.1
Proof of The proof Theorem II.1 contains two steps. In the first step we use Berthier et al. [11, Theorem 14] and in the second step we make an appeal to the strong law of large numbers (SLLN). We remind the reader of the strong law:
Definition III.1.
Strong Law of Large Numbers [15]: Let
be a sequence of i.i.d. random variables with finite mean
. Then(21) 
In words, the partial averages converge almost surely to .
Iiia Step 1
We will make use of Berthier et al. [11, Theorem 14], restated here for convenience. Before proceeding we include a definition of uniformly pseudoLipschitz functions, that generalizes the ideal of pseudoLipschitz functions given in Definition II.1.
Definition III.2.
Uniformly pseudoLipschitz functions [11]: A sequence (in ) of pseudoLipschitz functions is called uniformly pseudoLipschitz of order if, denoting by is the pseudoLipschitz constant of order of , we have for each and .
Berthier et al. [11, Theorem 14] requires the following assumptions:

The measurement matrix has Gaussian entries with i.i.d. mean and variance .

Define a sequence of denoisers to be those that apply the denoiser defined in (2) elementwise on vector input: For each , are uniformly Lipschitz. A function is uniformly Lipschitz in if the Lipschitz constant does not depend on .

converges to a constant as .

The limit is finite.

For any iterations and for any covariance matrix , the following limits exist.
where , with
denoting the tensor product and
the identity matrix.
Theorem III.1.
Now we demonstrate that our assumptions stated in Section II are enough to satisfy the assumptions needed to apply Theorem III.1.
Assumptions (A1) and (C1) are identical. We will show that (C2) follows from (A4), (C4) follows from (A2), and (C3) follows from (A3). Finally we show (C5) and (C6) follow from (A3) and (A4).
First consider assumption (C2). The nonseparable denoiser applies the AMPSI denoiser defined in (2) entrywise to its vector inputs. From (A4), are Lipschitz continuous. Thus, for length vectors , and fixed SI ,
and so
The Lipschitz constant does not depend on , so is uniformly Lipschitz.
Now consider assumption (C4). From (A2), the measurement noise in (1) has i.i.d. entries with zeromean and finite . Then applying Definition III.1,
where we have used that follows from for . The proof of (C3) similarly follows using the SLLN and the finiteness of given in assumption (A3).
We now show that (C5) is met. Recall . Define for . By assumption (A3), the signal and side information are sampled i.i.d. from the joint density . It follows that are also i.i.d., so by Definition III.1 if where independent of , then
We will now show that .
First note that (A4) assumes is Lipschitz, meaning for scalars and some constant ,
Therefore letting we have
giving the follows upper bound for constant ,
(22) 
Now using (22) and the triangle inequality,
(23)  
Finally, by assumption (A3) we have that and are all finite. Then noting that for any random variable, , we have for , meaning the boundednes of follows from (23) with assumption (A3).
The proof of (C6) follows similarly to the proof of (C5). Recall . Define for . By assumption (A3), the signal and side information are sampled i.i.d. from the joint density . It follows that are also i.i.d., so by Definition III.1 if where and , independent of , then
We will now show that . Using the bound (22),
Then using the triangle inequality,
(24)  