A Unified Framework of State Evolution for Message-Passing Algorithms

01/10/2019
by   Keigo Takeuchi, et al.
TUT
0

This paper presents a unified framework to understand the dynamics of message-passing algorithms in compressed sensing. State evolution is rigorously analyzed for a general error model that contains the error model of approximate message-passing (AMP), as well as that of orthogonal AMP. As a by-product, AMP is proved to converge asymptotically if the sensing matrix is orthogonally invariant and if the moment sequence of its asymptotic singular-value distribution coincide with that of the Marchenko-Pastur distribution up to the order that is at most twice as large as the maximum number of iterations.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

03/23/2020

Universality of Approximate Message Passing Algorithms

We consider a broad class of Approximate Message Passing (AMP) algorithm...
12/29/2017

A Unified Bayesian Inference Framework for Generalized Linear Models

In this letter, we present a unified Bayesian inference framework for ge...
08/24/2020

Universality of Linearized Message Passing for Phase Retrieval with Structured Sensing Matrices

In the phase retrieval problem one seeks to recover an unknown n dimensi...
03/21/2019

On Approximate Nonlinear Gaussian Message Passing On Factor Graphs

Factor graphs have recently gained increasing attention as a unified fra...
10/15/2021

Tight Lipschitz Hardness for Optimizing Mean Field Spin Glasses

We study the problem of algorithmically optimizing the Hamiltonian H_N o...
06/25/2018

A Unified Model with Structured Output for Fashion Images Classification

A picture is worth a thousand words. Albeit a cliché, for the fashion in...
09/24/2021

Graph-based Approximate Message Passing Iterations

Approximate-message passing (AMP) algorithms have become an important el...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Consider the recovery of an unknown

-dimensional signal vector

from an -dimensional linear measurement vector , given by

(1)

In (1), the sensing matrix is known, while the noise vector is unknown. The purpose of this paper is to present a unified framework for analyzing the asymptotic performance of signal recovery via message-passing (MP).

An important example of MP is approximate message-passing (AMP) [1]. Bayes-optimal AMP can be regarded as an exact approximation of belief propagation [2] in the large-system limit—both and tend to infinity while the compression rate is kept . Bayati et al[3, 4] analyzed the rigorous dynamics of AMP in the large system limit via state evolution (SE) when the sensing matrix has independent and identically distributed (i.i.d.), zero-mean, and sub-Gaussian elements. Their result implies that, in spite of its low complexity, AMP can achieve the Bayes-optimal performance in a range of the compression rate . However, AMP fails to converge when the sensing matrix is non-zero mean [5] or ill-conditioned [6].

Another important example of MP is orthogonal AMP (OAMP) [7]. OAMP is also called vector AMP (VAMP) [8] and was originally proposed by Opper and Winther [9, Appendix D]. Bayes-optimal OAMP can be regarded as an large-system approximation of expectation propagation (EP) [10, 11]. The rigorous SE of OAMP was presented in the same conference when the sensing matrix is orthogonally invariant on the real field [8] or unitarily invariant on the complex field [11]. These rigorous results imply that OAMP converges for a wider class of sensing matrices than AMP because the class of orthogonally invariant matrices contains matrices with dependent elements. One disadvantage of OAMP is high complexity due to the requirement of one matrix inversion111

The singular-value decomposition (SVD) of

allows us to circumvent this requirement [8]. However, the SVD itself is high complexity, unless the sensing matrix has some special structure. per iteration. See [12] for a complexity reduction of OAMP.

This paper proposes an SE framework for understanding both AMP and OAMP from a unified point of view. The proposed framework is based on a general recursive model of errors that contains the error models of both AMP and OAMP. The main point of the model is that the current errors depend on the whole history of errors in the preceding iterations, while the current errors in OAMP are determined only by the errors in the latest iteration. Under the assumption of orthogonally invariant sensing matrices, we present a rigorous SE analysis of the general error model in the large-system limit.

The main contributions of this paper are twofold: One is the rigorous SE of the general error model that contains those of both AMP and OAMP. The result provides a framework for designing new MP algorithms that have the advantages of both AMP and OAMP [13]: low complexity and the convergence property for orthogonally invariant sensing matrices.

The other contribution is a detailed convergence analysis of AMP. AMP with the maximum number

of iterations is proved to converge for orthogonally invariant sensing matrices if the moment sequence of the asymptotic eigenvalue (EV) distribution of

coincides with that of the Marc̆henko-Pastur distribution [14] up to order at most. When has i.i.d. zero-mean elements, the asymptotic EV distribution coincides with the Marc̆henko-Pastur distribution perfectly. Thus, the i.i.d. assumption of is too strong in guaranteeing the convergence of AMP, as long as a finite number of iterations are assumed.

Ii Preliminaries

Ii-a General Error Model

Consider the singular-value decomposition (SVD) of the sensing matrix, in which and are and orthogonal matrices, respectively. We consider the following general error model in iteration :

(2)
(3)
(4)
(5)

with and the initial conditions .

In the general error model, the notation denotes the arithmetic mean for . The functions and are the element-wise mapping of input vectors, i.e.

(6)
(7)

for some functions and . Finally, the notations and represent -dimensional vectors of which the th elements and are given by the partial derivatives of and with respect to the th variable, respectively.

The functions and may depend on the singular-values of the sensing matrix. Since the support of the asymptotic singular-value distribution of is assumed to be compact in this paper, we do not write the dependencies of explicitly.

The general error model is composed of two systems with respect to and , respectively. We refer to the former and latter systems as modules A and B, respectively.

Remark 1

Suppose that the functions and depend only on the latest variables, i.e.  and . Then, the general error model reduces to that of OAMP [11]. The functions and characterize the types of the linear filter and the thresholding function used in OAMP. Furthermore, the normalized squared norm

corresponds to the mean-square error (MSE) for the OAMP estimation of

in iteration .

Ii-B Amp

We formulate an AMP error model similar to the general error model. Let denote the AMP estimator of in iteration . The update rules of AMP [1] are given by

(8)
(9)

with and . In (8), the thresholding function satisfies the separation condition for with a common scalar function .

Let and denote the estimation errors before and after thresholding, respectively. From the definition (5), we find

(10)

Then, the extrinsic vector in (2) for is given by

(11)

To define the function in (3), we let

(12)

Substituting the definition of yields

(13)

with and , where the second equality follows from (11) and (12). Left-multiplying (9) by and using (1), we obtain

(14)

with . Applying (11), (12) and (13) to (14), we arrive at

(15)

with and for . The right-hand side (RHS) of (15) defines the function recursively. Note that depends on all vectors .

The only difference between the general and AMP error models is in (4) and (12). Instead of , the vector is used to define in the AMP. We will prove in the second main theorem.

Ii-C Assumptions

We follow [3] to postulate Lipschitz-continuous functions as and in the general error model.

Assumption 1

and are Lipschitz-continuous. Furthermore, and are not a linear combination of the first vectors plus some function of the last vector.

The latter assumption implies that and in (2) and (4) depend on and , respectively.

We assume the following moment conditions on and to guarantee the existence of the second moments of the variables in the general error model.

Assumption 2

The signal vector has independent elements with bounded th moments for some .

Assumption 3

The noise vector has bounded th moments for some and satisfies as .

We follow [8, 11] to postulate orthogonally invariant sensing matrices.

Assumption 4

The sensing matrix is orthogonally invariant. More precisely, the orthogonal matrices and in the SVD

is independent of the other random variables and Haar-distributed 

[14]. The empirical EV distribution of converges almost surely (a.s.) to an asymptotic distribution with a compact support in the large-system limit.

Ii-D Marc̆henko-Pastur Distribution

We review the Marc̆henko-Pastur distribution. Assume that the sensing matrix

has independent zero-mean Gaussian elements with variance

. The th moment of the empirical EV distribution of converges a.s. to that of the Marc̆henko-Pastur distribution in the large-system limit. Instead of presenting the Marc̆henko-Pastur distribution explicitly, we characterize it via the -transform , defined as

(16)

As shown in [14, Eq. (2.120)], the -transform of the Marc̆henko-Pastur distribution is the positive solution to

(17)

The -transform defines the Marc̆henko-Pastur distribution uniquely because the distribution is uniquely determined by the Stieltjes transform, which is given via analytic continuation of the -transform [14].

We need the asymptotic EV distribution of , rather than . Define the -transform of as

(18)

Since and have identical positive eigenvalues, we find the relationship

(19)

Substituting this into (17) yields

(20)

It is possible to calculate the moment sequence of the asymptotic EV distribution of via the -transform . Since the -transform is uniformly bounded for all , we use the eigen-decomposition and the definition (18) to obtain

(21)
(22)

This implies that the th moment of the asymptotic EV distribution of is given via the th derivative of the -transform at the origin. Direct calculation of the derivatives based on (20) yields , , and .

Iii Main results

Iii-a State Evolution

We analyze the dynamics of the general error model in the large-system limit. Let

(23)
(24)
(25)
(26)

Define the set . The set contains the whole history of the estimation errors just before evaluating (2) in iteration , as well as all random variables with the only exception of , while includes the whole history just before evaluating (4). We use the conditioning technique by Bolthausen [15] to obtain the following theorem:

Theorem 1

Postulate Assumptions 14. For all and , the following properties hold for module A in the large-system limit.

  1. [label=(A-)]

  2. Let , and , with . For , the vector conditioned on is statistically equivalent to

    (27)

    In (27), the notation denotes a finite-dimensional vector of which all elements are . For a matrix , the notation represents the matrix that is composed of all left-singular vectors of associated with zero singular values. is independent of the other random variables, orthogonally invariant, and has bounded th moments for some satisfying .

  3. (28)
  4. Suppose that satisfies the separation condition like (6), and that each function is Lipschitz-continuous. Then,

    (29)
  5. There is some such that the minimum eigenvalue of is a.s. larger than .

For module B, on the other hand, the following properties hold in the large-system limit:

  1. [label=(B-)]

  2. Let and . Then, the vector conditioned on is statistically equivalent to

    (30)

    for , otherwise

    (31)

    In (30), is an independent and orthogonally invariant vector, and has bounded th moments for some satisfying and for .

  3. (32)
  4. Suppose that satisfies the separation condition like (7), and that each function is Lipschitz-continuous. Then,

    (33)
  5. There is some such that the minimum eigenvalue of are a.s. larger than .

Proof:

See Appendix A.

Theorem 1 was proved in [11, 8] for the case of functions and that depend only on and , respectively. Theorem 1 is a generalization of [11, 8] to the case of the general functions (6) and (7).

Properties 3 and 3 imply the orthogonality between and and between and in the general error model. Thus, we refer to MP algorithms as long-memory OAMP (LM-OAMP) if their error models are contained in the general error model.

If corresponds to the estimation error of an MP algorithm in iteration , we need to evaluate the MSE in the large-system limit. While Theorem 1 allows us to analyze the MSE, this paper does not discuss any more analysis in the general error model. The MSE should be considered for each concrete MP algorithm.

Because of space limitation, we have focused on a performance measure, such as MSE, that requires the existence of the second moments of the variables in the general error model. As considered in [3], it is straightforward to extend Theorem 1 to the case of general performance measures in terms of pseudo-Lipschitz functions.

Iii-B Amp

We next prove that the general error model contains the AMP error model under an assumption on the asymptotic EV distribution of .

Theorem 2

Consider the AMP error model, postulate Assumptions 14, and suppose that the moment sequence of the asymptotic EV distribution of coincides with that of the Marc̆henko-Pastur distribution up to order . Then, holds for all in the large-system limit.

Proof:

See Section IV.

The only difference between the general and AMP error models is in (4) and (12). Thus, Theorem 2 implies that the general error model contains the AMP error model in the large-system limit. As long as the number of iterations is finite, it should be possible to construct orthogonally invariant sensing matrices satisfying two conditions: One is that the sensing matrices have dependent elements. The other condition is that the moment sequence of the asymptotic EV distribution of is equal to that of the Marc̆henko-Pastur distribution up to the required order. Thus, we conclude that Theorems 1 and 2 are the first rigorous result on the asymptotic dynamics of the AMP for non-independent sensing matrices.

Remark 2

Instead of evaluating directly, we present a sufficient condition for guaranteeing that the MSE coincides with that for the case of zero-mean i.i.d. Gaussian sensing matrices [3]. From (15), depends on the asymptotic moments up to order . Thus, the MSE coincides with that in [3] for all in the large-system limit if the moment sequence of the asymptotic EV distribution of is equal to that of the Marc̆henko-Pastur distribution up to order . A future work is to analyze what occurs between the orders and .

Iv Proof of Theorem 2

Let with defined as the RHS of (15). The goal is to prove for all and in the large-system limit.

The proof is by induction with respect to . For , we use (15) to obtain

(34)

in the large-system limit, where the th moment is defined in (22). In particular, for and we use to find in the large-system limit.

Let . Since we have proved , we can use Property 1 for . Thus, converges a.s. to a constant independent of in the large-system limit. Using (15) yields

(35)

for , where the second equality follows from the identity obtained from (34). Thus, we find in the large-system limit.

Assume that there is some such that holds for all and . We prove for all . The induction hypothesis allows us to use Property 1 for all , so that, for all , converges a.s. to a constant independent of in the large-system limit. This observation implies that (35) holds for all . Furthermore, we use (15) to obtain

(36)

for all and .

We simplify the recursive system (34), (35), and (36). Let , with and for all . Applying these definitions to (34), (35), and (36), we have

(37)
(38)
(39)

The simplified system (37)–(39) implies that is stationary with respect to and . In other words, depends on and only through the difference .

Let for , which satisfies the recursive system (37), (38), and (39) with replaced by . It is sufficient to prove in the large-system limit. By definition, is independent of the higher-order moments for all . As long as is assumed, the sequence is determined by the moments up to order . Without loss of generality, we can assume that the asymptotic EV distribution of coincides with the Marc̆henko-Pastur distribution perfectly.

To prove , we define the generating function of as

(40)

with

(41)

where satisfies the recursive system (37), (38), and (39) with replaced by . Note that we have extended the definition of with respect to from to all non-negative integers. From the induction hypothesis , it is sufficient to prove .

We first derive an explicit formula of . From (37), (38), and (39), we utilize the power-series representation (21) to obtain

(42)
(43)
(44)

for all , where we have used . From (44), we have

(45)

Solving this equation with (42) and (43), we arrive at

(46)

with

(47)
(48)

We next prove that the numerator is divisible by the denominator for . It is sufficient to prove that holds for the zero of , given by

(49)

Calculating yields

(50)

Since the -transform satisfies (20), we have

(51)

The positivity of the -transform implies that the correct solution is . Thus, we arrive at .

Finally, we prove . For , we use to find . Since we have proved that is a polynomial for all , from (40) we can conclude for all . In particular, we use (41) and the induction hypothesis to arrive at . Thus, Theorem 2 holds.

Acknowledgment

The author was in part supported by the Grant-in-Aid for Scientific Research (B) (JSPS KAKENHI Grant Number 18H01441), Japan.

References

  • [1] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for compressed sensing,” Proc. Nat. Acad. Sci., vol. 106, no. 45, pp. 18 914–18 919, Nov. 2009.
  • [2] Y. Kabashima, “A CDMA multiuser detection algorithm on the basis of belief propagation,” J. Phys. A: Math. Gen., vol. 36, no. 43, pp. 11 111–11 121, Oct. 2003.
  • [3] M. Bayati and A. Montanari, “The dynamics of message passing on dense graphs, with applications to compressed sensing,” IEEE Trans. Inf. Theory, vol. 57, no. 2, pp. 764–785, Feb. 2011.
  • [4]

    M. Bayati, M. Lelarge, and A. Montanari, “Universality in polytope phase transitions and message passing algorithms,”

    Ann. Appl. Probab., vol. 25, no. 2, pp. 753–822, Apr. 2015.
  • [5] F. Caltagirone, L. Zdeborová, and F. Krzakala, “On convergence of approximate message passing,” in Proc. 2014 IEEE Int. Symp. Inf. Theory, Honolulu, HI, USA, Jul. 2014, pp. 1812–1816.
  • [6] S. Rangan, P. Schniter, and A. Fletcher, “On the convergence of approximate message passing with arbitrary matrices,” in Proc. 2014 IEEE Int. Symp. Inf. Theory, Honolulu, HI, USA, Jul. 2014, pp. 236–240.
  • [7] J. Ma and L. Ping, “Orthogonal AMP,” IEEE Access, vol. 5, pp. 2020–2033, Jan. 2017.
  • [8] S. Rangan, P. Schniter, and A. K. Fletcher, “Vector approximate message passing,” in Proc. 2017 IEEE Int. Symp. Inf. Theory, Aachen, Germany, Jun. 2017, pp. 1588–1592.
  • [9] M. Opper and O. Winther, “Expectation consistent approximate inference,” J. Mach. Learn. Res., vol. 6, pp. 2177–2204, Dec. 2005.
  • [10] J. Céspedes, P. M. Olmos, M. Sánchez-Fernández, and F. Perez-Cruz, “Expectation propagation detection for high-order high-dimensional MIMO systems,” IEEE Trans. Commun., vol. 62, no. 8, pp. 2840–2849, Aug. 2014.
  • [11] K. Takeuchi, “Rigorous dynamics of expectation-propagation-based signal recovery from unitarily invariant measurements,” in Proc. 2017 IEEE Int. Symp. Inf. Theory, Aachen, Germany, Jun. 2017, pp. 501–505.
  • [12] K. Takeuchi and C.-K. Wen, “Rigorous dynamics of expectation-propagation signal detection via the conjugate gradient method,” in Proc. 18th IEEE Int. Workshop Sig. Process. Advances Wirel. Commun., Sapporo, Japan, Jul. 2017, pp. 88–92.
  • [13] B. Çakmak, M. Opper, O. Winther, and B. H. Fleury, “Dynamical functional theory for compressed sensing,” in Proc. 2017 IEEE Int. Symp. Inf. Theory, Aachen, Germany, Jun. 2017, pp. 2143–2147.
  • [14] A. M. Tulino and S. Verdú, Random Matrix Theory and Wireless Communications.   Hanover, MA, USA: Now Publishers Inc., 2004.
  • [15] E. Bolthausen, “An iterative construction of solutions of the TAP equations for the Sherrington-Kirkpatrick model,” Commun. Math. Phys., vol. 325, no. 1, pp. 333–366, Jan. 2014.
  • [16]

    R. Lyons, “Strong laws of large numbers for weakly correlated random variables,”

    Michigan Math. J., vol. 35, no. 3, pp. 353–359, 1988.

Appendix A Proof of Theorem 1

A-a Properties of Pseudo-Lipschitz Functions

We present the definition and basic properties of pseudo-Lipschitz functions [3].

Definition 1

A function is called pseudo-Lipschitz of order  if there are some constants and such that, for all and ,

(52)

In proving the following propositions, we use the equivalence between norms on for finite , i.e.  for some constants . Note that is abbreviated as .

Proposition 1

Let denote any pseudo-Lipschitz function of order . Then, there is some constant such that for all .

Proof:

Since is pseudo-Lipschitz of order , there is some constant such that holds for all . For , we have . Otherwise, . Thus, there is some constant such that holds.

Proposition 1 implies that any pseudo-Lipschitz function of order  is as , while holds for any Lipschitz-continuous function .

Proposition 2

Let denote a random vector with bounded th absolute moments for some . Suppose that a function is pseudo-Lipschitz of order  and almost everywhere (a.e.) differentiable. Then, we have and .

Proof:

Using Proposition 1, we obtain

(53)

where the boundedness follows from that of the th absolute moments of .

The boundedness is also obtained by repeating the same argument, since (52) implies

(54)

where denotes the th column of . Thus, Proposition 2 holds.

Proposition 3

Suppose that and are pseudo-Lipschitz of orders  and , respectively. Then, is pseudo-Lipschitz of order .

Proof:

From the pseudo-Lipschitz properties, there are some constants such that

(55)