Support Recovery in the Phase Retrieval Model: Information-Theoretic Fundamental Limits

01/30/2019 ∙ by Lan V. Truong, et al. ∙ National University of Singapore 0

The support recovery problem consists of determining a sparse subset of variables that is relevant in generating a set of observations. In this paper, we study the support recovery problem in the phase retrieval model consisting of noisy phaseless measurements, which arises in a diverse range of settings such as optical detection, X-ray crystallography, electron microscopy, and coherent diffractive imaging. Our focus is on information-theoretic fundamental limits under an approximate recovery criterion, considering both discrete and Gaussian models for the sparse non-zero entries. In both cases, our bounds provide sharp thresholds with near-matching constant factors in several scaling regimes on the sparsity and signal-to-noise ratio. As a key step towards obtaining these results, we develop new concentration bounds for the conditional information content of log-concave random variables, which may be of independent interest.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Recently, there has been a growing interest in recovering an unknown signal from phaseless quadratic observations of the form , where is a measurement matrix, and represents measurement noise. Since only the magnitude of is measured, and not the phase (or the sign, in the real case), this problem is referred to as phase retrieval. The phase retrieval problem has many applications including optical detection, -ray crystallography, electron microscopy, and coherent diffractive imaging [1].

Similarly to the basic linear model, various works have shown that the number of measurements can be reduced significantly if the signal is sparse, i.e., it has at most non-zero entries for some . It is shown in [1] that stable phase retrieval is achieved with measurements in the noiseless setting, and with measurements in the noisy setting under some conditions on the noise process. Recently, Iwen et al. [2] provided a simple two-stage sparse phase retrieval strategy that can stably reconstruct up to a global phase shift using only measurements for complex measurements under some bounded noise assumptions. Some other existing works focus on finding practical algorithms to approach the fundamental limits for special cases of the phase retrieval problem [3, 4, 5]. For example, Jaganathan et al. [4]

showed that for the noiseless case, when the measurement matrix is the Discrete Fourier Transform (DFT) matrix and

and for some , almost all signals with aperiodic support can be uniquely identified by their Fourier transform magnitude (up to time-shift, conjugate-flip, and global phase) [4].

A distinct goal that has received less attention in phase retrieval, but considerable attention in other models, is the support recovery problem [6, 7, 8], where one wishes to exactly or approximately determine the support given a collection of observations and the corresponding measurement matrix (or ). This problem is of direct interest when the goal is to find which

variables influence the output (rather than their associated weights), and may also be used as a first step towards estimating the values of

(e.g., see [9]).

Under general linear and non-linear models, Scarlett and Cevher [10]

provided achievability and converse bounds characterizing the trade-off between error probability and number of measurements. They applied their general bounds to the linear,

-bit, and group testing models to obtain exact thresholds on the number of measurements required to achieve vanishing decoding error probability in the high-dimensional limit. Numerous other related works also exist, with the focus being mainly on linear models [11, 12, 13, 14, 15]; see [10] for a more detailed overview. In particular, approximate recovery criteria were studied by Reeves and Gastpar [16, 17] in the regime , and by Scarlett and Cevher [10] in the regime ; we focus on the latter setting.

Although the initial bounds in [10] are very general, applying these bounds to new models can still be very challenging, due to the need to establish concentration bounds and mutual information bounds on a case-by-case basis. In this paper, we use this approach to establish fundamental limits for approximate support recovery in the phase retrieval model, under a log-concavity assumption on the noise process. To achieve this goal, we need to overcome at least two key challenges: establishing concentration bounds for information quantities in the phase retrieval model, and upper and lower bounding key conditional mutual information terms that have no closed form expressions. For each of these challenges, we develop novel auxiliary results, some of which may be of independent interest. The following subsection lists our specific contributions in more detail.

I-a Contributions

Our main contributions in this paper are as follows:

  • We extend the concentration bounds of the unconditional information content of log-concave densities by Fradelizi et al. [18, Theorem 3.1] to conditional versions (cf. Corollary 9) in which joint log-concavity does not hold. Due to this extension, we can establish concentration bounds for the conditional information density of -dimensional random variables (cf. Theorem 11) and apply these bounds to the phase retrieval model. Because of their generality, our extended concentration bounds might be of independent interest.

  • Under i.i.d. complex Gaussian measurement matrices , we establish tight upper and lower bounds on the required number of measurements to achieve approximate support recovery (i.e., recovering a given proportion of the support) under both discrete (cf. Lemma 13) and Gaussian (cf. Theorem 2) modeling assumptions on the non-zero entries of . In both cases, the upper and lower bounds coincide up to an explicit constant factor in certain sparsity regimes, and this constant factor is often very close to one (e.g., when the signal-to-noise ratio is sufficiently high).

I-B Notation

We use the similar notation to [10]

. We use upper-case letters for random variables, and lower cases for their realizations. A non-bold character may be a scalar or a vector, whereas a bold character refers to a collection of

scalars (e.g., ) or vectors (e.g., ), where is the number of measurements. We write to denote the subvector of at the columns indexed by , and to denote the submatrix of containing the columns indexed by . The complement with respect to is denoted by .

The symbol means “distributed as”. For a given joint probability density distribution , the corresponding marginal distributions are denoted by and , and similarly for conditional probability density marginals (e.g., ). The notation , , etc. denotes the corresponding i.i.d. distribution in which each term is distributed as , , etc. We write for probabilities, for expectations, and

for variances.

We use usual notations for the differential entropy (e.g., ) and mutual information (e.g., ), and their conditional counterparts (e.g., ). We use the notation for real Gaussian random variables, for complex Gaussians (with variance in each of the real and imaginary parts), and

for the central chi squared distribution with

degrees of freedom.

We make use of the standard asymptotic notations and . We define the function and write the floor and ceiling functions as and , respectively. The function has base , and all information quantities are measured in nats.

Throughout the paper, we frequently make use of integrals written as , , etc., where denotes a suitable measure that can typically be taken to be the Lebesgue measure. For , we say that a function on is in is is integrable.

I-C Structure of the Paper

In Section II, we formally introduce the problem setup and overview our main results. In Section III, we provide the main auxiliary results on log-concavity, concentration of measure, and mutual information bounds. Sections IV and V provide the proofs of our main support recovery results. Conclusions are drawn in Section VI.

Ii Problem Setup and Main Results

Ii-a Model and Assumptions

Let denote the ambient dimension, the sparsity level, and the number of measurements. We let be the set of subsets of having cardinality . The key random variables in the support retrieval problem are the support set , the unknown signal , the measurement matrix , and the observation vector .

The support set is assumed to be equiprobable on the subsets within . Given , the entries of are deterministically set to zero, and the remaining entries are generated according to some distribution .111We allow for both discrete and continuous distributions on , meaning that in some cases represents a probability mass function rather than a density function. We assume that these non-zero entries follows the same distribution for all the possible realizations of , and that this distribution is permutation-invariant.

We consider the setting of (complex) Gaussian measurements, in which the measurement matrix takes i.i.d. values on , whose density is denoted by . We write , to denote the corresponding i.i.d. distribution for matrices, and we write as a shorthand for . Given , each entry of the observation vector is generated in a conditionally independent manner according to the following model:

(1)

where , , and , with being an arbitrary log-concave density function. This log-concavity assumption is made for mathematical convenience, but also captures a wide range of noise distributions, including Gaussian. We note that the permutation-invariance of , and with respect to allows us to condition on a fixed throughout the analysis (e.g. ) without loss of generality; such conditioning should henceforth be assumed unless explicitly stated otherwise.

The relation (1

) induces the following conditional joint distribution of

(given ):

(2)
(3)

and its multiple-observation counterpart

(4)

where is the -fold product of . The remaining entries of the measurement matrix are distributed as .

Given and , a decoder forms an estimate of . Like previous works studying the information-theoretic limits of support recovery (e.g., [10, 11]), we assume that the decoder knows the system model, including and . We focus on the approximate recovery criterion, only requiring that at least entries of are successfully identified (approximate recovery) for some . Following [16, 10], the error probability is given by

(5)

Note that if both and have cardinality with probability one, then the two events in the union are identical, and hence either of the two can be removed. A more stringent performance criterion also considered in literature is the exact support recovery problem, where the error probability is given by , but our techniques currently appear to be less suited to that setting.

Our main goal is to derive necessary and sufficient conditions on (as a function of and ) such that vanishes as . Moreover, when considering converse results, we will not only be interested in conditions under which , but also conditions under which the stronger statement holds.

Ii-B Overview of Main Results

Here we state and discuss the two main results of this paper. Both of the theorems concern the information-theoretic limits of support recovery in the phase retrieval as described above, but with two different models of interest for the non-zero entries .

Discrete setting. The first result concerns a discrete distribution on , namely, is a uniformly random permutation of a fixed complex vector . We let be the sorted version of such that , and define the following mutual information quantities:

(6)
(7)
Theorem 1.

Consider the phase retrieval setup in Section II, with being a uniformly random permutation of a fixed complex vector . Let and , and assume that , and that with as . In addition, assume that there are distinct elements in .

We have as provided that

(8)

for arbitrarily small if either of the following additional conditions hold: (i) and , or (ii) (and is arbitrary).

Conversely, under the general scaling and arbitrary , we have as whenever

(9)

for arbitrarily small .

Proof:

See Section IV. ∎

We observe that the upper and lower bounds are nearly in closed form, other than the optimization over a single scalar . Moreover, the two have a very similar form, with the main difference being the appearance of vs.  in the numerator, and vs.  in the denominator. The bounds hold for an arbitrary log-concave noise distribution .

Since the noise variance is fixed and the measurement matrix has normalized entries, the assumption corresponds to the case that the signal-to-noise ratio (SNR) is constant. We observe that under this assumption, the upper and lower bounds provide matching behavior. Perhaps more significantly, in the high-SNR limit (i.e., ), we obtain nearly identical constant factors. To see this, it suffices to crudely lower bound by , and upper bound by . For any bounded away from zero, since , these both behave as as (or equivalently ), which implies that the maxima in (8) and (9) are attained by in this limit, and the upper and lower bounds coincide up to a factor of .

We believe that the additional assumptions on and in the achievability part are an artifact of our analysis, and note that similar assumptions were made for the linear model in [10]. The conditions in Theorem 1 are less restrictive than those in [10] since we are considering approximate recovery instead of exact recovery.

Gaussian setting. We now turn to a (complex) Gaussian model on the non-zero entries in which , for some . This is analogous to a model considered for the linear setting in [16, 10]. Our result is stated in terms of the mutual information quantities

(10)
(11)

where is defined as

(12)

with

denoting the cumulative distribution function of a

random variable.

Theorem 2.

Consider the phase retrieval setup in Section II where , and with for some constant . If with , then we have as provided that

(13)

for arbitrarily small .

Conversely, under the broader scaling regime with , we have as whenever

(14)

for arbitrarily small .

The assumption in the achievability part (which holds, for example, when for some ) is rather restrictive compared to the general scaling in the converse part. The former arises from a significant technical challenge (see Proposition 14 below), and we expect that the requirement is merely an artifact of our analysis.222In fact, extending our analysis to the broader scaling regime (for some ) leads to the correct scaling , but unfortunately the resulting constant factors are quite loose compared to Theorem 2. In addition, we note that while we allowed an arbitrary log-concave distribution in the discrete setting, here we have focused on to simplify the analysis. Despite this restriction, we believe that Gaussian noise still captures the essential features of the phase retrieval problem.

Once again, the scaling amounts to a fixed SNR. As mentioned in [16], exact recovery is not possible for Gaussian when the SNR is constant, and may even need a huge number of measurements when the SNR increases with . This motivates the consideration of approximate recovery in this setting.

The differences between the upper and lower bounds are similar to the discrete case. In particular, although the constants differ, the bounds are similar, and always have the same scaling laws. In the limit , we have and ; in this case, the maxima in (13)-(14) are both achieved with , and hence, the two bounds coincide to within a multiplicative factor of .

Comparison to the linear model. In Figures 1 and 2, we plot the upper and lower bounds of Theorem 1 and Theorem 2 for under various signal-to-noise ratios (SNRs), along with the counterparts for the linear model in [10].333The approximate recovery result for the discrete case was not explicitly stated in [10], but it is easily inferred from the analysis, and amounts to a much simpler version of the analysis of the present paper. For the discrete model, we focus on the simple case that and

(15)

for some , corresponding to in Theorem 1. In Appendix A, we describe how we equate the SNR in the linear and phase retrieval models, and also how to evaluate the bounds of Theorem 1 when .

As predicted by the discussion following Theorems 1 and 2, the upper and lower bounds are close (though still with a constant gap) when the SNR is sufficiently high. In addition, in this regime the information-theoretic limits of the phase retrieval model and the linear model are very similar, especially in the Gaussian case.

However, at lower SNR, the gap for the phase retrieval model can widen significantly more than that of the linear model. This appears to be because the key mutual information quantities arising in the analysis can only be expressed in closed form in the linear model, while requiring possibly-loose bounds in the phase retrieval model. However, all that is needed to close this gap (at least partially) is to deduce improved mutual information bounds for the phase retrieval setting (cf., Section III-D).

Fig. 1: Asymptotic thresholds on the number of measurements required for approximate support recovery for the linear model [10] and phase retrieval model in Section IV with Gaussian noise and with distortion level and non-zero entries . The asymptotic number of measurements is normalized by , and (with ).
Fig. 2: Asymptotic thresholds on the number of measurements required for approximate support recovery for the linear model [10] and the phase retrieval model in Section V, with distortion level and non-zero entries in . The asymptotic number of measurements is normalized by , and ) (with ).

Iii Auxiliary Results

In this section, we introduce the main auxiliary results needed to prove Theorems 1 and 2. We first introduce some notation and recall the initial bounds for general observation models from [10], and then present the relevant log-concavity properties, mutual information bounds, and concentration bounds.

Iii-a Information-Theoretic Definitions

We first outline some information theoretic definitions from [10], recalling that we are conditioning on a fixed throughout. We consider partitions of the support set into two disjoint sets and , where will typically correspond to an overlap between and some other set (i.e., , the “equal” part), and will correspond to the indices in one set but not in the other (i.e., , the “differing” part).

For fixed and a corresponding pair , we introduce the notation

(16)
(17)

where is the marginal distribution of (4). While the left-hand sides of (16) and (17) represent the same quantities for any pair , it will still prove convenient to work with these in place of the right-hand sides. In particular, this allows us to introduce the marginal distributions

(18)
(19)

where . Using the preceding definitions, we introduce two information densities (in the terminology of the information theory literature, e.g., [19]). The first contains probabilities averaged over ,

(20)

whereas the second conditions on :

(21)

where is the -th measurement, and the single-letter information density is

(22)

Averaging (22) with respect to the distribution in (17) conditioned on yields a conditional mutual information quantity, which is denoted by

(23)

Iii-B General Achievability and Converse Bounds

For the general support recovery problem with probabilistic models, the following achievability and converse bounds are given in [10]. While these are stated for the real-valued setting in [10], the proofs apply verbatim to the complex-valued setting.

Theorem 3.

[10, Theorem 5] Fix any constants , and , and functions such that the following holds:

(24)
(25)

for all with and for all in some (typical) set . Then we have

(26)

where

(27)
Theorem 4.

[10, Theorem 6] Fix any constants , and functions such that the following holds:

(28)
(29)

for all with , and for all in some (typical) set . Then we have

(30)

The steps for applying and simplifying these bounds are as follows:

  1. Establish an explicit characterization of each mutual information term (e.g., upper and lower bounds);

  2. Use concentration of measure to find expressions for each function and in Theorems 3 and 4, i.e., functions satisfying (24) and (28);

  3. According to the specific model on the non-zero entries under consideration, choose a suitable typical set , and also a value of , so that both and can be proved to be vanishing as ;

  4. Combine and simplify the preceding steps to deduce the final sample complexity bound.

These steps turn out to be highly non-trivial in the phase retrieval setting. In the following subsections, we provide general-purpose tools for Steps 1 and 2; we defer Steps 3 and 4 to Section IV for discrete , and to Section V for Gaussian .

Iii-C Log-Concavity Properties

Both our mutual information bounds and concentration bounds will crucially rely on the log-concavity properties stated in the following lemma.

Lemma 5.

Under the phase retrieval setup in Section II, we have the following:

  1. Given and , the conditional marginal density of is log-concave;

  2. Given , , and for some , the conditional marginal density of is log-concave.

Proof:

Recall that is log-concave by assumption, and with having i.i.d.  entries. In other words, , where is the squared magnitude of a random variable. We observe that is log-concave, since the distribution with two degrees of freedom is log-concave [20] and the convolution of two log-concave functions is log-concave [21].

In addition, given , , and , we have , where is the squared magnitude of a random variable. This distribution on is also log-concave by a similar argument, and the fact that the non-central distribution with two degrees of freedom is log-concave [20]. ∎

Iii-D Mutual Information Bounds

While an exact expression for the mutual information does not appear to be possible, the following theorem states closed-form upper and lower bounds. While there is a gap between the two in general, the asymptotic behavior is similar when grows large; this fact ultimately leads to tight sample complexity bounds in the high-SNR setting.

Theorem 6.

For the phase retrieval setup in Section II, the following holds for defined in (23):

(31)

where and .

Proof:

The upper bound is based on the entropy power inequality and the maximum entropy property of the Gaussian distribution, and the lower bound is based on (known) results that give nearly-matching lower bounds for log-concave random variables. The details are given in Appendix

B. ∎

Iii-E Concentration Bounds

Perhaps the most technically challenging part of our analysis is to establish concentration bounds amounting to explicit expressions for and in Theorems 3 and 4.

Before stating the final concentration bounds, we provide a general result that may be of independent interest, giving a concentration bound on conditional information random variables of the form (in generic notation) under certain log-concavity assumptions. Such a result is provided as a corollary of the following, which considers generic random variables that need not be associated with the phase retrieval problem at this point.

Proposition 7.

Suppose that with joint density function . For each , define

(32)

and assume that

(33)

for all . Moreover, for an arbitrary positive number (to be chosen later), define

(34)

and assume that

(35)

Then, the following holds:

(36)

where

(37)
(38)
(39)
Proof:

We follow the general approach of [18], which considers the unconditional information variable ; however, many of the details differ significantly. The reader is referred to Appendix C. ∎

From this, we immediately deduce a similar result for i.i.d. product distributions.

Corollary 8.

Let . Suppose that with distribution (i.e., i.i.d. on ), where satisfies (33) and (35). Then, the following holds:

(40)

where

(41)