I Introduction
High-dimensional linear regression is a well-studied model that has been used in many applications including compressed sensing
[1], imaging[2], and machine learning and statistics
[3]. The unknown signal is viewed through the linear model:(1) |
where are the measurements, is a known measurement matrix, and
is measurement noise. The goal is to estimate the unknown signal
having knowledge only of the noisy measurements and the measurement matrix . When the problem is under-determined (i.e., ), in order for reconstruction to be successful, it is necessary to exploit structural or probabilistic characteristics of the input signal . Often a prior distribution on the input signal is assumed, and in this case approximate message passing (AMP) algorithms[1] can be used for the reconstruction task.AMP [1, 4] is a class of low-complexity algorithms for efficiently solving high-dimensional regression tasks (1
). AMP works by iteratively generating estimates of the unknown input vector,
, using a possibly non-linear denoiser function tailored to any prior knowledge about . One favorable feature of AMP is that under some technical conditions on the measurement matrix and , the observations at each iteration of the algorithm are almost surely equal in distribution to plus independent and identically distributed (i.i.d.) Gaussian noise in the large system limit.AMP with Side Information (AMP-SI): In information theory [5], when different communication systems share side information (SI), overall communication can become more efficient. Recently [6, 7], a novel algorithmic framework, referred to as AMP-SI, has been introduced for incorporating SI into AMP for high-dimensional regression tasks (1). AMP-SI has been empirically demonstrated to have good reconstruction quality and is easy to use. For example, we have proposed to use AMP-SI for channel estimation in emerging millimeter wave communication systems [8], where the time dynamics of the channel structure allow previous channel estimates to be used as SI when estimating the current channel structure [7].
We model each entry of the observed SI, denoted by , as depending statistically on the corresponding entry of the unknown signal
through some joint probability density function (pdf),
. AMP-SI uses a conditional denoiser, , to incorporate SI,(2) |
The AMP-SI algorithm iteratively updates estimates of the input signal : let , the all-zeros vector, then
(3) | ||||
(4) |
where is the estimate of input signal at iteration t, and the denoiser in (2) is applied entry-wise to vector inputs. The derivative is with respect to the first input, and is the empirical average of a vector , i.e., . Using the denoiser in (2), the AMP-SI algorithm (3)-(4) provides the minimum mean squared error (MMSE) estimate of the signal when SI is available [6].
State Evolution (SE): It has been proven that the performance of AMP, as measured, for example, by the normalized squared -error between the estimate and true signal , can be accurately predicted by a scalar recursion referred as SE[9, 10] when the measurement matrix A is i.i.d. Gaussian under various assumptions on the elements of the signal. The SE equation for AMP-SI is as follows. Assume the entries of the noise are i.i.d. with , and let . Then for ,
(5) |
where are independent of , where we use
to denote a Gaussian distribution with mean
and variance
.Considering AMP-SI (3)-(4), however, we cannot directly apply the existing AMP theoretical results [9, 10], as the conditional denoiser (2) depends on the index through the SI, meaning that different scalar denoisers will be used at different indices within the AMP-SI iterations. Recent results [11], however, extend the asymptotic SE analysis to a larger class of possible denoisers, allowing, for example, each element of the input to use a different non-linear denoiser as is the case in AMP-SI. We employ these results to rigorously relate the SE presented in (5) to the AMP-SI algorithm in (3)-(4).
Related Work:
While integrating SI into reconstruction algorithms is not new, AMP-SI introduces a unified framework within AMP supporting arbitrary signal and SI dependencies. Prior work using SI has been either heuristic, limited to specific applications, or outside the AMP framework.
For example, Wang and Liang [12] integrate SI into AMP for a specific signal prior density, but the method is difficult to apply to other signal models. Ziniel and Schniter [13] develop an AMP-based reconstruction algorithm for a time-varying signal model based on Markov processes for the support and amplitude. This signal model is easily incorporated into the AMP-SI framework as discussed in the analysis of the birth-death-drift model of [6, 7]. Manoel et al. implement an AMP-based algorithm in which the input signal is repeatedly reconstructed in a streaming fashion, and information from past reconstruction attempts is aggregated into a prior, thus improving ongoing reconstruction results [14]. This reconstruction scheme resembles that of AMP-SI, in particular when the Bernoulli-Gaussian model is used (see Section II-B).
Contribution and Outline: Ma et al. use numerical experiments to show that SE (5) accurately tracks the performance of AMP-SI (3)-(4) [7], as was shown rigorously for standard AMP. Ma et al. conjecture that rigorous theoretical guarantees can be given for AMP-SI as well [7]. In this work, we analyze AMP-SI performance when the input signal and SI are drawn i.i.d. according to a general pdf obeying some finite moment conditions, the AMP-SI denoiser (2) is Lipschitz, and the measurement matrix is i.i.d. Gaussian.
Ii Main Results
Ii-a Main Theorem
Our main result provides AMP-SI performance guarantees when considering pseudo-Lipschitzloss functions, which we define in the following.
Definition II.1.
Pseudo-Lipschitz functions [9]: For and any , a function is pseudo-Lipschitz of order if there exists a constant , referred to as the pseudo-Lipschitz constant of , such that for any ,
(6) |
For , this definition coincides with the standard definition of a Lipschitz function. Throughout this work, denotes the Euclidean norm.
We are now ready to state our main result. Throughout the paper we let denote convergence in probability.
Theorem II.1.
For any order pseudo-Lipschitz functions and , assume the following.
-
The measurement matrix has i.i.d. Gaussian entries with mean and variance .
-
The noise is i.i.d. with finite .
-
The signal and SI are sampled i.i.d. from with finite , finite , and finite .
-
For , the denoisers defined in (2) are Lipschitz continuous, meaning for scalars , and constant ,
Then,
(7) | ||||
where are standard Gaussians, independent of and . In the above, and are defined in the AMP-SI recursion (3)-(4), and in the SE (5).
Section III contains the proof of Theorem II.1. The proof follows from Berthier et al. [11, Theorem 14]
and the strong law of large numbers. The main technical details involve showing that our assumptions
are enough to satisfy the assumptions needed for [11, Theorem 14]. The details are given in Section III.As a concrete example of how Theorem II.1 provides performance guarantees for AMP-SI, let us consider a few interesting pseudo-Lipschitz loss functions.
Ii-B Examples
Next, we consider a few signal and SI models to show how one can derive the denoiser in (2), use this to construct the AMP-SI algorithm and the SE, and apply Theorem II.1. Before we get to the examples we state a lemma that allows us know about how functions with bounded derivative are Lipschitz.
Lemma II.2.
A function having bounded derivatives,
is Lipschitz continuous with Lipschitz constant .
Proof.
The result follows using the Triangle Inequality and Cauchy-Schwarz,
(8) |
∎
Ii-B1 Gaussian-Gaussian Signal and SI
In this model, referred to as the GG model henceforth, the signal has i.i.d. Gaussian entries with zero mean and finite variance and we have access to SI in the form of the signal with additive white Gaussian noise (AWGN). The signal, , and SI, , are related by
(9) |
In this case, the AMP-SI denoiser (2) equals [7]
(10) | ||||
Then the SE (5) can be computed as
(11) |
We note that the denoiser in (10) is Lipschitz continuous as a result of Lemma II.2 because
and
and therefore the assumptions are satisfied in the GG case and we can apply Thoerem II.1.
Ii-B2 Bernoulli-Gaussian Signal and SI
The Bernoulli-Gaussian (BG) model reflects scenario in which one wishes to recover a sparse signal and has access to SI in the form of the signal with AWGN as in (9). In this model, each entry of the signal is independently generated according to , where is the Dirac delta function at . In words, the entries of the signal independently take the value with probability and are with probability . In this case, the AMP-SI denoiser (2) equals [7]
(12) | ||||
where, letting be the zero-mean Gaussian density with variance evaluated at , and defining ,
(13) |
where we denote
(14) |
Then the SE (5) can be computed as
(15) |
We again use Lemma II.2 to show that the denoiser defined in (12) and (13) is Lipschitz continuous so that the assumptions are satisfied in the BG case and we can apply Thoerem II.1. We study the partial derivatives. Denote
(16) |
Combining (13) and (14) and (16),
Then,
(17) |
Now we show upperbounds for the two terms of (17) separately. For the first term, we see that , so
Now we consider the second term of
Consider the second term of (17). First we note that
then using that , we have
(18) |
To upper bound the above, we use when , and so
Using this in (18), we find
(19) |
where in the final inequality we use by (15), and
(20) | ||||
Using the above in (17), we have
As in (17) we can show
Then,
and a bound as in (18) - (19) gives
Ii-C Numerical Examples
Finally, we provide numerical results to compare the empirical mean square error (MSE) performance of AMP-SI and the performance predicted by SE. Fig. 1 shows the MSE achieved by AMP-SI in the GG scenario and the SE prediction of its performance. In this example, the signal variance , the measurement noise variance , the variance of AWGN in SI . We averaged over 10 trials of a GG recovery problem for empirical results of AMP-SI. The comparison in Fig. 1(a), Fig. 1(b) and Fig. 1(c) given by three different signal length. For smaller there is some gap between the empirical MSE and the SE prediction, as shown in Fig. 1 for , but the gap shrinks as is increased. The results show the empirical MSE tracks the SE prediction nicely.
Fig. 2 shows the MSE achieved by AMP-SI in the BG scenario, and the SE prediction of its performance. We again averaged over 10 trials of a BG recovery problem for empirical results of AMP-SI. The signal length , , the measurement noise variance , and , where of the entries in the signal are nonzero. We vary the variance of AWGN in SI from , , and . The results show that SE can predict the MSE achieved by AMP-SI at every iteration.
![]() |
![]() |
![]() |

Iii Proof of Theorem ii.1
Proof of The proof Theorem II.1 contains two steps. In the first step we use Berthier et al. [11, Theorem 14] and in the second step we make an appeal to the strong law of large numbers (SLLN). We remind the reader of the strong law:
Definition III.1.
Strong Law of Large Numbers [15]: Let
be a sequence of i.i.d. random variables with finite mean
. Then(21) |
In words, the partial averages converge almost surely to .
Iii-a Step 1
We will make use of Berthier et al. [11, Theorem 14], restated here for convenience. Before proceeding we include a definition of uniformly pseudo-Lipschitz functions, that generalizes the ideal of pseudo-Lipschitz functions given in Definition II.1.
Definition III.2.
Uniformly pseudo-Lipschitz functions [11]: A sequence (in ) of pseudo-Lipschitz functions is called uniformly pseudo-Lipschitz of order if, denoting by is the pseudo-Lipschitz constant of order of , we have for each and .
Berthier et al. [11, Theorem 14] requires the following assumptions:
-
The measurement matrix has Gaussian entries with i.i.d. mean and variance .
-
Define a sequence of denoisers to be those that apply the denoiser defined in (2) element-wise on vector input: For each , are uniformly Lipschitz. A function is uniformly Lipschitz in if the Lipschitz constant does not depend on .
-
converges to a constant as .
-
The limit is finite.
-
For any iterations and for any covariance matrix , the following limits exist.
where , with
denoting the tensor product and
the identity matrix.
Theorem III.1.
Now we demonstrate that our assumptions stated in Section II are enough to satisfy the assumptions needed to apply Theorem III.1.
Assumptions (A1) and (C1) are identical. We will show that (C2) follows from (A4), (C4) follows from (A2), and (C3) follows from (A3). Finally we show (C5) and (C6) follow from (A3) and (A4).
First consider assumption (C2). The non-separable denoiser applies the AMP-SI denoiser defined in (2) entrywise to its vector inputs. From (A4), are Lipschitz continuous. Thus, for length- vectors , and fixed SI ,
and so
The Lipschitz constant does not depend on , so is uniformly Lipschitz.
Now consider assumption (C4). From (A2), the measurement noise in (1) has i.i.d. entries with zero-mean and finite . Then applying Definition III.1,
where we have used that follows from for . The proof of (C3) similarly follows using the SLLN and the finiteness of given in assumption (A3).
We now show that (C5) is met. Recall . Define for . By assumption (A3), the signal and side information are sampled i.i.d. from the joint density . It follows that are also i.i.d., so by Definition III.1 if where independent of , then
We will now show that .
First note that (A4) assumes is Lipschitz, meaning for scalars and some constant ,
Therefore letting we have
giving the follows upper bound for constant ,
(22) |
Now using (22) and the triangle inequality,
(23) | ||||
Finally, by assumption (A3) we have that and are all finite. Then noting that for any random variable, , we have for , meaning the boundednes of follows from (23) with assumption (A3).
The proof of (C6) follows similarly to the proof of (C5). Recall . Define for . By assumption (A3), the signal and side information are sampled i.i.d. from the joint density . It follows that are also i.i.d., so by Definition III.1 if where and , independent of , then
We will now show that . Using the bound (22),
Then using the triangle inequality,
(24) | ||||