Consider the recovery of an unknown
-dimensional signal vectorfrom an -dimensional linear measurement vector , given by
In (1), the sensing matrix is known, while the noise vector is unknown. The purpose of this paper is to present a unified framework for analyzing the asymptotic performance of signal recovery via message-passing (MP).
An important example of MP is approximate message-passing (AMP) . Bayes-optimal AMP can be regarded as an exact approximation of belief propagation  in the large-system limit—both and tend to infinity while the compression rate is kept . Bayati et al. [3, 4] analyzed the rigorous dynamics of AMP in the large system limit via state evolution (SE) when the sensing matrix has independent and identically distributed (i.i.d.), zero-mean, and sub-Gaussian elements. Their result implies that, in spite of its low complexity, AMP can achieve the Bayes-optimal performance in a range of the compression rate . However, AMP fails to converge when the sensing matrix is non-zero mean  or ill-conditioned .
Another important example of MP is orthogonal AMP (OAMP) . OAMP
is also called vector AMP (VAMP)  and was originally
proposed by Opper and Winther [9, Appendix D]. Bayes-optimal OAMP
can be regarded as an large-system approximation of expectation
propagation (EP) [10, 11]. The rigorous SE of OAMP
was presented in the same conference when the sensing matrix is
orthogonally invariant on the real field  or
unitarily invariant on the complex field . These rigorous
results imply that OAMP converges for a wider class of sensing matrices than
AMP because the class of orthogonally invariant matrices contains matrices
with dependent elements. One disadvantage of OAMP is high complexity due to
the requirement of one matrix inversion111 The singular-value decomposition (SVD) of
The singular-value decomposition (SVD) ofallows us to circumvent this requirement . However, the SVD itself is high complexity, unless the sensing matrix has some special structure. per iteration. See  for a complexity reduction of OAMP.
This paper proposes an SE framework for understanding both AMP and OAMP from a unified point of view. The proposed framework is based on a general recursive model of errors that contains the error models of both AMP and OAMP. The main point of the model is that the current errors depend on the whole history of errors in the preceding iterations, while the current errors in OAMP are determined only by the errors in the latest iteration. Under the assumption of orthogonally invariant sensing matrices, we present a rigorous SE analysis of the general error model in the large-system limit.
The main contributions of this paper are twofold: One is the rigorous SE of the general error model that contains those of both AMP and OAMP. The result provides a framework for designing new MP algorithms that have the advantages of both AMP and OAMP : low complexity and the convergence property for orthogonally invariant sensing matrices.
The other contribution is a detailed convergence analysis of AMP. AMP with the maximum number
of iterations is proved to converge for orthogonally invariant sensing matrices if the moment sequence of the asymptotic eigenvalue (EV) distribution ofcoincides with that of the Marc̆henko-Pastur distribution  up to order at most. When has i.i.d. zero-mean elements, the asymptotic EV distribution coincides with the Marc̆henko-Pastur distribution perfectly. Thus, the i.i.d. assumption of is too strong in guaranteeing the convergence of AMP, as long as a finite number of iterations are assumed.
Ii-a General Error Model
Consider the singular-value decomposition (SVD) of the sensing matrix, in which and are and orthogonal matrices, respectively. We consider the following general error model in iteration :
with and the initial conditions .
In the general error model, the notation denotes the arithmetic mean for . The functions and are the element-wise mapping of input vectors, i.e.
for some functions and . Finally, the notations and represent -dimensional vectors of which the th elements and are given by the partial derivatives of and with respect to the th variable, respectively.
The functions and may depend on the singular-values of the sensing matrix. Since the support of the asymptotic singular-value distribution of is assumed to be compact in this paper, we do not write the dependencies of explicitly.
The general error model is composed of two systems with respect to and , respectively. We refer to the former and latter systems as modules A and B, respectively.
Suppose that the functions and
depend only on the latest variables, i.e. and
Then, the general error model reduces to that of OAMP .
The functions and characterize
the types of the linear filter and the thresholding function used in OAMP.
Furthermore, the normalized squared norm corresponds to the mean-square error (MSE) for the OAMP estimation of
corresponds to the mean-square error (MSE) for the OAMP estimation ofin iteration .
We formulate an AMP error model similar to the general error model. Let denote the AMP estimator of in iteration . The update rules of AMP  are given by
with and . In (8), the thresholding function satisfies the separation condition for with a common scalar function .
Let and denote the estimation errors before and after thresholding, respectively. From the definition (5), we find
Then, the extrinsic vector in (2) for is given by
To define the function in (3), we let
Substituting the definition of yields
with and for . The right-hand side (RHS) of (15) defines the function recursively. Note that depends on all vectors .
We follow  to postulate Lipschitz-continuous functions as and in the general error model.
and are Lipschitz-continuous. Furthermore, and are not a linear combination of the first vectors plus some function of the last vector.
We assume the following moment conditions on and to guarantee the existence of the second moments of the variables in the general error model.
The signal vector has independent elements with bounded th moments for some .
The noise vector has bounded th moments for some and satisfies as .
The sensing matrix is orthogonally invariant.
More precisely, the orthogonal matrices and
in the SVD is independent of the other random variables and
is independent of the other random variables and Haar-distributed. The empirical EV distribution of converges almost surely (a.s.) to an asymptotic distribution with a compact support in the large-system limit.
Ii-D Marc̆henko-Pastur Distribution
We review the Marc̆henko-Pastur distribution. Assume that the sensing matrix
has independent zero-mean Gaussian elements with variance. The th moment of the empirical EV distribution of converges a.s. to that of the Marc̆henko-Pastur distribution in the large-system limit. Instead of presenting the Marc̆henko-Pastur distribution explicitly, we characterize it via the -transform , defined as
As shown in [14, Eq. (2.120)], the -transform of the Marc̆henko-Pastur distribution is the positive solution to
The -transform defines the Marc̆henko-Pastur distribution uniquely because the distribution is uniquely determined by the Stieltjes transform, which is given via analytic continuation of the -transform .
We need the asymptotic EV distribution of , rather than . Define the -transform of as
Since and have identical positive eigenvalues, we find the relationship
Substituting this into (17) yields
It is possible to calculate the moment sequence of the asymptotic EV distribution of via the -transform . Since the -transform is uniformly bounded for all , we use the eigen-decomposition and the definition (18) to obtain
This implies that the th moment of the asymptotic EV distribution of is given via the th derivative of the -transform at the origin. Direct calculation of the derivatives based on (20) yields , , and .
Iii Main results
Iii-a State Evolution
We analyze the dynamics of the general error model in the large-system limit. Let
Define the set . The set contains the whole history of the estimation errors just before evaluating (2) in iteration , as well as all random variables with the only exception of , while includes the whole history just before evaluating (4). We use the conditioning technique by Bolthausen  to obtain the following theorem:
Let , and , with . For , the vector conditioned on is statistically equivalent to
In (27), the notation denotes a finite-dimensional vector of which all elements are . For a matrix , the notation represents the matrix that is composed of all left-singular vectors of associated with zero singular values. is independent of the other random variables, orthogonally invariant, and has bounded th moments for some satisfying .
Suppose that satisfies the separation condition like (6), and that each function is Lipschitz-continuous. Then,
There is some such that the minimum eigenvalue of is a.s. larger than .
For module B, on the other hand, the following properties hold in the large-system limit:
Let and . Then, the vector conditioned on is statistically equivalent to
for , otherwise
In (30), is an independent and orthogonally invariant vector, and has bounded th moments for some satisfying and for .
Suppose that satisfies the separation condition like (7), and that each function is Lipschitz-continuous. Then,
There is some such that the minimum eigenvalue of are a.s. larger than .
See Appendix A.
Properties 3 and 3 imply the orthogonality between and and between and in the general error model. Thus, we refer to MP algorithms as long-memory OAMP (LM-OAMP) if their error models are contained in the general error model.
If corresponds to the estimation error of an MP algorithm in iteration , we need to evaluate the MSE in the large-system limit. While Theorem 1 allows us to analyze the MSE, this paper does not discuss any more analysis in the general error model. The MSE should be considered for each concrete MP algorithm.
Because of space limitation, we have focused on a performance measure, such as MSE, that requires the existence of the second moments of the variables in the general error model. As considered in , it is straightforward to extend Theorem 1 to the case of general performance measures in terms of pseudo-Lipschitz functions.
We next prove that the general error model contains the AMP error model under an assumption on the asymptotic EV distribution of .
See Section IV.
The only difference between the general and AMP error models is in (4) and (12). Thus, Theorem 2 implies that the general error model contains the AMP error model in the large-system limit. As long as the number of iterations is finite, it should be possible to construct orthogonally invariant sensing matrices satisfying two conditions: One is that the sensing matrices have dependent elements. The other condition is that the moment sequence of the asymptotic EV distribution of is equal to that of the Marc̆henko-Pastur distribution up to the required order. Thus, we conclude that Theorems 1 and 2 are the first rigorous result on the asymptotic dynamics of the AMP for non-independent sensing matrices.
Instead of evaluating directly, we present a sufficient condition for guaranteeing that the MSE coincides with that for the case of zero-mean i.i.d. Gaussian sensing matrices . From (15), depends on the asymptotic moments up to order . Thus, the MSE coincides with that in  for all in the large-system limit if the moment sequence of the asymptotic EV distribution of is equal to that of the Marc̆henko-Pastur distribution up to order . A future work is to analyze what occurs between the orders and .
Iv Proof of Theorem 2
Let with defined as the RHS of (15). The goal is to prove for all and in the large-system limit.
The proof is by induction with respect to . For , we use (15) to obtain
in the large-system limit, where the th moment is defined in (22). In particular, for and we use to find in the large-system limit.
for , where the second equality follows from the identity obtained from (34). Thus, we find in the large-system limit.
Assume that there is some such that holds for all and . We prove for all . The induction hypothesis allows us to use Property 1 for all , so that, for all , converges a.s. to a constant independent of in the large-system limit. This observation implies that (35) holds for all . Furthermore, we use (15) to obtain
for all and .
Let for , which satisfies the recursive system (37), (38), and (39) with replaced by . It is sufficient to prove in the large-system limit. By definition, is independent of the higher-order moments for all . As long as is assumed, the sequence is determined by the moments up to order . Without loss of generality, we can assume that the asymptotic EV distribution of coincides with the Marc̆henko-Pastur distribution perfectly.
To prove , we define the generating function of as
where satisfies the recursive system (37), (38), and (39) with replaced by . Note that we have extended the definition of with respect to from to all non-negative integers. From the induction hypothesis , it is sufficient to prove .
for all , where we have used . From (44), we have
We next prove that the numerator is divisible by the denominator for . It is sufficient to prove that holds for the zero of , given by
Since the -transform satisfies (20), we have
The positivity of the -transform implies that the correct solution is . Thus, we arrive at .
The author was in part supported by the Grant-in-Aid for Scientific Research (B) (JSPS KAKENHI Grant Number 18H01441), Japan.
-  D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for compressed sensing,” Proc. Nat. Acad. Sci., vol. 106, no. 45, pp. 18 914–18 919, Nov. 2009.
-  Y. Kabashima, “A CDMA multiuser detection algorithm on the basis of belief propagation,” J. Phys. A: Math. Gen., vol. 36, no. 43, pp. 11 111–11 121, Oct. 2003.
-  M. Bayati and A. Montanari, “The dynamics of message passing on dense graphs, with applications to compressed sensing,” IEEE Trans. Inf. Theory, vol. 57, no. 2, pp. 764–785, Feb. 2011.
M. Bayati, M. Lelarge, and A. Montanari, “Universality in polytope phase transitions and message passing algorithms,”Ann. Appl. Probab., vol. 25, no. 2, pp. 753–822, Apr. 2015.
-  F. Caltagirone, L. Zdeborová, and F. Krzakala, “On convergence of approximate message passing,” in Proc. 2014 IEEE Int. Symp. Inf. Theory, Honolulu, HI, USA, Jul. 2014, pp. 1812–1816.
-  S. Rangan, P. Schniter, and A. Fletcher, “On the convergence of approximate message passing with arbitrary matrices,” in Proc. 2014 IEEE Int. Symp. Inf. Theory, Honolulu, HI, USA, Jul. 2014, pp. 236–240.
-  J. Ma and L. Ping, “Orthogonal AMP,” IEEE Access, vol. 5, pp. 2020–2033, Jan. 2017.
-  S. Rangan, P. Schniter, and A. K. Fletcher, “Vector approximate message passing,” in Proc. 2017 IEEE Int. Symp. Inf. Theory, Aachen, Germany, Jun. 2017, pp. 1588–1592.
-  M. Opper and O. Winther, “Expectation consistent approximate inference,” J. Mach. Learn. Res., vol. 6, pp. 2177–2204, Dec. 2005.
-  J. Céspedes, P. M. Olmos, M. Sánchez-Fernández, and F. Perez-Cruz, “Expectation propagation detection for high-order high-dimensional MIMO systems,” IEEE Trans. Commun., vol. 62, no. 8, pp. 2840–2849, Aug. 2014.
-  K. Takeuchi, “Rigorous dynamics of expectation-propagation-based signal recovery from unitarily invariant measurements,” in Proc. 2017 IEEE Int. Symp. Inf. Theory, Aachen, Germany, Jun. 2017, pp. 501–505.
-  K. Takeuchi and C.-K. Wen, “Rigorous dynamics of expectation-propagation signal detection via the conjugate gradient method,” in Proc. 18th IEEE Int. Workshop Sig. Process. Advances Wirel. Commun., Sapporo, Japan, Jul. 2017, pp. 88–92.
-  B. Çakmak, M. Opper, O. Winther, and B. H. Fleury, “Dynamical functional theory for compressed sensing,” in Proc. 2017 IEEE Int. Symp. Inf. Theory, Aachen, Germany, Jun. 2017, pp. 2143–2147.
-  A. M. Tulino and S. Verdú, Random Matrix Theory and Wireless Communications. Hanover, MA, USA: Now Publishers Inc., 2004.
-  E. Bolthausen, “An iterative construction of solutions of the TAP equations for the Sherrington-Kirkpatrick model,” Commun. Math. Phys., vol. 325, no. 1, pp. 333–366, Jan. 2014.
R. Lyons, “Strong laws of large numbers for weakly correlated random variables,”Michigan Math. J., vol. 35, no. 3, pp. 353–359, 1988.
Appendix A Proof of Theorem 1
A-a Properties of Pseudo-Lipschitz Functions
We present the definition and basic properties of pseudo-Lipschitz functions .
A function is called pseudo-Lipschitz of order if there are some constants and such that, for all and ,
In proving the following propositions, we use the equivalence between norms on for finite , i.e. for some constants . Note that is abbreviated as .
Let denote any pseudo-Lipschitz function of order . Then, there is some constant such that for all .
Since is pseudo-Lipschitz of order , there is some constant such that holds for all . For , we have . Otherwise, . Thus, there is some constant such that holds.
Proposition 1 implies that any pseudo-Lipschitz function of order is as , while holds for any Lipschitz-continuous function .
Let denote a random vector with bounded th absolute moments for some . Suppose that a function is pseudo-Lipschitz of order and almost everywhere (a.e.) differentiable. Then, we have and .
Using Proposition 1, we obtain
where the boundedness follows from that of the th absolute moments of .
Suppose that and are pseudo-Lipschitz of orders and , respectively. Then, is pseudo-Lipschitz of order .
From the pseudo-Lipschitz properties, there are some constants such that