Phase transition in the spiked random tensor with Rademacher prior

12/05/2017
by   Wei-Kuo Chen, et al.
University of Minnesota
0

We consider the problem of detecting a deformation from a symmetric Gaussian random p-tensor (p≥ 3) with a rank-one spike sampled from the Rademacher prior. Recently in Lesieur et al. (2017), it was proved that there exists a critical threshold β_p so that when the signal-to-noise ratio exceeds β_p, one can distinguish the spiked and unspiked tensors and weakly recover the prior via the minimal mean-square-error method. On the other side, Perry, Wein, and Bandeira (2017) proved that there exists a β_p'<β_p such that any statistical hypothesis test can not distinguish these two tensors, in the sense that their total variation distance asymptotically vanishes, when the signa-to-noise ratio is less than β_p'. In this work, we show that β_p is indeed the critical threshold that strictly separates the distinguishability and indistinguishability between the two tensors under the total variation distance. Our approach is based on a subtle analysis of the high temperature behavior of the pure p-spin model with Ising spin, arising initially from the field of spin glasses. In particular, we identify the signal-to-noise criticality β_p as the critical temperature, distinguishing the high and low temperature behavior, of the Ising pure p-spin mean-field spin glass model.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

09/18/2018

Phase transition in random tensors with multiple spikes

Consider a spiked random tensor obtained as a mixture of two components:...
06/26/2020

Tensor estimation with structured priors

We consider rank-one symmetric tensor estimation when the tensor is corr...
10/19/2021

Long Random Matrices and Tensor Unfolding

In this paper, we consider the singular values and singular vectors of l...
08/02/2021

A Random Matrix Perspective on Random Tensors

Tensor models play an increasingly prominent role in many fields, notabl...
12/22/2016

Statistical limits of spiked tensor models

We study the statistical limits of both detecting and estimating a rank-...
11/04/2014

A statistical model for tensor PCA

We consider the Principal Component Analysis problem for large tensors o...
02/06/2018

A Log-Euclidean and Total Variation based Variational Framework for Computational Sonography

We propose a spatial compounding technique and variational framework to ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The problem of detecting a deformation in a symmetric Gaussian random tensor is concerned about whether there exists a statistical hypothesis test that can reliably distinguish the deformation from the noise. In the matrix case, if is a Gaussian Wigner ensemble and

is a unit vector, the goal is to distinguish the unspiked matrix

and the spiked matrix for a given signal-to-noise ratio

It is well-known that in this case the top eigenvalue of the spiked matrix exhibits the so-called BBP (Baik, Ben Arous, and Péché) transition

[5, 15, 17, 43]. Namely, the top eigenvalue successfully detects the signal if the strength of exceeds the critical threshold , while it fails to provide indicative information if It was further proved in [34, 37, 45] that for any every statistical hypothesis test can not distinguish the spiked and unspiked matrices.

In recent years, the above phenomena were also studied in the spiked symmetric Gaussian random -tensor model of the form, . Earlier results were obtained by Montanari and Richard [35] and Montanari, Reichman, and Zeitouni [34] in the setting of spherical prior, where they showed that there exist and with such that if the signal-to-noise ratio exceeds , it is possible to distinguish the spiked and unspiked tensors and weakly recover the signal, but these are impossible if the signal-to-noise ratio is less than More recently, Lesieur et al. [30] studied this detection problem for any prior by means of the minimal mean-square-error (MMSE). They discovered that there exists a critical threshold (depending on the prior) such that when the signal-to-noise ratio exceeds the critical value

, one can distinguish the two tensors by the MMSE and weakly recover the signal by the MMSE estimator. In contrast, when

, MMSE fails to distinguish the two tensors and weakly recovery of the signal is not possible. On the other side, Perry, Wein, and Bandeira [44] studied the detection problem with spherical, Rademacher, and sparse Rademacher priors and they provided an improvement on the bounds in [34, 35]. Moreover, their results showed that in each of these three cases, there exists a such that for any , it is impossible to distinguish the two tensors in the sense that the total variation distance between the spiked and unspiked tensors asymptotically vanishes. As a consequence, every statistical hypothesis test fails to distinguish the two tensors. The paper [44] then left with a conjecture that indistinguishability between the two tensors should be valid up to the critical threshold .

The aim of this paper is to study the symmetric Gaussian random -tensor () with Rademacher prior as in the setting of [44]. We show that the threshold , suggested by the MMSE method in [30], is indeed the critical value that strictly separates the distinguishability and indistinguishability between the spiked and unspiked tensors under the total variation distance. More precisely, it is established that when the signal-to-noise ratio is less than the critical value the total variation distance between the spiked and unspiked tensors converges to zero. This establishes the aforementioned prediction in [44]. In particular, we identify the critical value as the critical temperature, distinguishing the high and low temperature behavior, of the Ising pure -spin mean-field spin glass model.

Our approach is based on the methodologies originated from the study of mean-field spin glasses, especially those for the Sherrington-Kirkpatrick model and the mixed -spin models, see [39, 49, 50]. Roughly speaking, spin glass models are disordered spin systems initially invented by theoretical physicists in order to explain the strange magnetic behavior of certain alloys, such as CuMn. Mathematically, they are usually formulated as stochastic processes with high complexity and present several crucial features, e.g., quenched disorder and frustration, that are commonly shared in many real world problems, involving combinatorial random optimizations. Over the past decades, the study of spin glasses has received a lot of attention in both physics and mathematics communities, see [32] for physics overview and [39, 49, 50] for mathematical development.

One way to investigate the detection problem in the symmetric Gaussian random tensor is through the total variation distance between the spiked and unspiked tensors. While in the detection problem represents the signal-to-noise ratio, we regard as a (inverse) temperature parameter in the pure -spin model. Notably, under this setting, the ratio of the densities between the two tensors can be computed as the partition function of the pure -spin model with temperature In [34, 44]

, the authors controlled the total variation distance by the second moment of the partition function. Different than their consideration, we relate this distance to the free energy of the pure

-spin model with Ising spin, see Lemma 2. This relation allows us to show that the critical threshold can be determined by the critical temperature of the pure -spin model. In bounding the total variation distance, the most critical ingredient is played by a sharp upper bound concerning the fluctuation of the free energy up to the critical temperature for all . In the case , the pure

-spin model is famously known as the Sherrington-Kirkpatrick model and its free energy was shown to possess a Gaussian central limit theorem in the weak limit up to the critical temperature

by Aizenman, Lebowitz, and Ruelle [1]. As for even , Bovier, Kurkova, and Löwe [16] showed that the same result also holds (with different scaling than that in the Sherrington-Kirkpatrick model), but not up to the critical temperature. Our main contribution is that we obtain a sharp upper bound for the fluctuation of the free energy, which is comparable to the one in [16] and more importantly it is valid up to the critical temperature for all

including odd

This allows us to extract a sharp upper bound for the total variation distance and deduce the desired result.

Besides the consideration of the detection problem, we also present some new results and arguments for the pure -spin models that are of independent interest in spin glasses. First, we show that if the temperature is below the critical value , the model presents the high temperature or replica symmetric

solution in the sense that any two independently sampled spin configurations from the Gibbs measure are essentially orthogonal to each other by providing exponential tail probability and moment controls. While these results can also be established at

very high temperature by some well-known techniques in spin glasses, such as the cavity method, the second moment method, and Latala’s argument (see [49, Chapter 1]), it is relatively a more challenging task to obtain the same behavior throughout the entire high temperature regime. We show that this is achievable in the pure -spin model (see Theorem 3 and 4) and indeed, our method can also be applied to more general situations, the mixed -spin models (see Remark 3). Next, in terms of technicality, our argument for the above result is based on the Guerra-Talagrand replica symmetry breaking bound for the coupled free energy for two systems. This bound has been playing a critical role in the study of the mixed even -spin models, see Talagrand [50]. Its validity for the model involving odd

mixture is however generally unknown as it is unclear whether the error term along Guerra’s replica symmetry breaking interpolation possesses a nonnegative sign or not. To tackle this obstacle, we adopt the synchronization property, introduced by Panchenko

[41, 42], that the overlap matrix is asymptotically symmetric and positive semi-definite under the Gibbs average, which was established heavily relying on the fact that the Ghirlanda-Guerra identities imply ultrametricity of the overlaps [38]. This allows us to show that the error term creates a nonnegative sign and ultimately leads to the validity of the Guerra-Talagrand bound in the pure odd -spin model if one restricts the functional order parameters to be of one-step replica symmetry breaking. Whether this bound is valid for more general functional order parameters remains open.

For other related works on the detection problem of spiked matrices and tensors, we invite the readers to check a variety of low rank matrix estimation problems, including explicit characterizations of mutual information in [7, 28, 29, 31, 33] and the performance of the approximate message passing (AMP) for MMSE method and compressed sensing in [7, 13, 23, 24, 27]. More recently, the fluctuation of the likelihood ratio in the spiked Wigner model was studied in [25]. In the case of spiked tensors, phase transitions of the mutual information and the MMSE estimator were recently studied for any and given prior, see [9, 30] for symmetric case and [11] for non-symmetric case. The performance of the AMP in the spiked tensor was also investigated in [30]. See also [6, 8, 10, 21, 22, 36, 46]. For the study of Gaussian random -tensor in terms of complexity, see [14].

This paper is organized as follows. In Section 2, we state our main results on the detection problem. In Section 3, their proofs are presented and are essentially self-contained for those who wish to learn only the roles of spin glass results in the detection problem. The high temperature results on the pure -spin model are gathered and established in Section 4 with great details.

2 Main results

We begin by setting some standard notation. Let For any denote by the space of all real-valued tensors . For any two tensors and , their outer product and inner product are defined respectively by

and

For we define as the -th order power of For any permutation and any tensor , is defined as . We say that a tensor is symmetric if for all Let be the collection of all symmetric tensors. For any measurable for some , we use to stand for the Borel -field on

The symmetric Gaussian random tensor is defined as follows. Denote by a random tensor with i.i.d. entries Define a symmetric random tensor by

For example, if , is the Gaussian orthogonal ensemble, i.e., are independent of each other with for and Let . Assume that is sampled uniformly at random from and is independent of Set the spiked random tensor as

for Let

be the total variation distance between and . We now make the distinguishability and indistinguishability between and precise.

Definition 1.

We say that

  • and are indistinguishable if .

  • and are distinguishable if

Let be a realization of the prior. For a given tensor

, consider the detection problem that under the null hypothesis,

and under the alternative hypothesis, Item essentially says that any statistical test can not reliably distinguish these two hypotheses. Item means there exists a sequences of events that distinguishes these two tensors. Next we define the notion of weak recovery for .

Definition 2.

For we say that weak recovery of is possible if there exists a sequence of random probability measures on and a constant such that

(1)

and that weak recovery of is not possible if for any random probability measure on and constant ,

(2)

Here is a random probability measure on means that is a mapping from to such that is -measurable for each and is a probability measure on for each

A few comments are in position. Consider a given realization of signal and tensor . Equation (1) ensures that there exists some produced though the measure such that and have a nontrivial overlap. To understand (2), let be any measurable function. If we consider the random probability measure defined by

then from (2), for any

In other words, any vector generated by is uncorrelated with the signal and thus, it does not provide indicative information about We emphasize that Definitions 1 and 2 are not directly related to each other. Nevertheless, we will show that both of them hold up to a critical threshold in our main result below.

Now we introduce the pure -spin model. For each set . The Hamiltonian of the pure -spin model is defined by

for . Its covariance can be computed as

where is the overlap between ,

(3)

In the terminology of detection problem, is understood as the signal-to-noise ratio. In the pure -spin model, we regard as a (inverse) temperature parameter. For a given temperature define the free energy by

(4)

If , this model is known as the Sherrington-Kirkpatrick model and it has been intensively studied over the past decades. The readers are referred to check the books [39, 49, 50] for recent mathematical advances for the SK model as well as more general models involving a mixture of pure -spin interactions. In particular, it is already known that the thermodynamic limit of () converges to a nonrandom quantity that can be expressed as the famous Parisi formula, see, e.g., [40, 48]. Denote this limit by . A direct application of Jensen’s inequality to (4) implies that for all

Define the high temperature regime for the pure -spin model as

Set the critical temperature by

Our main result shows that is the critical threshold in the detection problem.

Theorem 1.

Let The following statements hold:

  • If then and are indistinguishable and weak recovery of is impossible.

  • If , then and are distinguishable and weak recovery of is possible.

Our main contribution in Theorem 1 is the part on the indistinguishability of and in the statement . Previous results along this line were established in [44], where the authors showed that there exists some so that and are indistinguishable for any . Theorem 1 here proves that this behavior is indeed valid up to the critical value . As one will see from Theorem 2 below, we give a characterization of the high temperature regime and provide one way to simulate in terms of an auxiliary function deduced from the optimality of the Parisi formula for the free energy at high temperature. Numerically, it is obtained that

This agrees with the prediction in [44, Figure 1]. We comments that for Theorem 1, a polynomial rate of convergence for the total variation distance can also be obtained, see Remark 2 below. In comparison, we add that Theorem 1 is quite different from the BBP transition for , see [5, 34, 37, 45]. In this case, and it is known that for , one can distinguish and in the sense of Definition 1 by using the top eigenvalue. For it presents a weaker sense of distinguishability, , see Remark 1.

As mentioned before, the work [30] investigated the present detection problem for any given prior. Their results state that one can distinguish and by the MMSE method and weakly recover the signal through the MMSE estimator when ; if they concluded that weak recovery of the signal is not possible. In other words, their results imply the weak recovery part of item as well as the statement of item . Nevertheless, we emphasize that their approach and the way how the critical value was discovered are fundamentally different from the argument we present here. As one will see, while Theorem 1 follows directly from a relation (see Lemma 2) between the total variation distance and the free energies, the delicate part is Theorem 1, which is the major component of this paper.

Acknowledgements. The author thanks G. Ben Arous for introducing this project to him and A. Auffinger and A. Jagannath for some fruitful discussions. He is indebted to D. Panchenko for asking good questions that lead to a major improvement on the result of weak recovery of and several suggestions regarding the presentation of the paper. He also thanks J. Barbier, A. Montanari, and L. Zdeborová for suggesting several references and valuable comments. Research partly supported by NSF DMS-1642207 and Hong Kong Research Grants Council GRF-14302515.

3 Proof of Theorem 1

3.1 Total variation distance

In this subsection, we prepare some lemmas for Theorem 1. Recall that is a random tensor with i.i.d. and is the symmetric random tensor generated by Throughout the remainder of the paper, we use to standard for an indicator function on a set We first establish an elementary expression of the total variation distance.

Lemma 1.

Let be two -dimensional random vectors with densities and satisfying and a.e. on Then

Proof.

Note that

Using Fubini’s theorem and this equation, the first equality follows by

To obtain the second equality, one simply exchanges the roles of .

Recall the free energy and the Rademacher prior from Section 2. Define an auxiliary free energy of the pure -spin model with a Curie-Weiss type interaction as

(5)

where In Appendix, it will be established that the limit of converges a.s. to a nonrandom quantity for any . Denote this limit by The following lemma relates the total variation distance between and to the free energy and the auxiliary free energy .

Lemma 2.

For any we have that

(6)
(7)
Proof.

Note that has density on for some normalizing constant For any , a change of variables gives

where is the expectation with respect to only and is the Lebesgue on This implies that the density of is given by Now since

we obtain

(8)

Here, since for any and , we see that

and

where is the expectation of , an independent copy of and independent of , and the second equality of both displays used the assumption that and Our proof is then completed by applying Lemma 1.

Remark 1.

Aizenman, Lebowitz, and Ruelle [1] showed that converges to a Gaussian central limit theorem. From (6), one immediately sees that

3.2 Proof of Theorem 1

The central ingredient throughout our proof is played by the high temperature behavior of the pure -spin model stated in Section 4 below, namely, a tight upper bound for the fluctuation of the free energy in Proposition 1 and a good moment control for the concentration of the overlap around zero under the Gibbs measure in Theorem 4. The former will be directly used to show that the total variation distance vanishes via the exact expression (6), while the latter is vital in order to establish the impossibility of weak recovery of

Proof of Theorem 1: Indistinguishability.

Let . Assume that . For any , writing in (6) gives

To complete the proof, we use a key property about the fluctuation of the free energy stated in Proposition 1 below, which says that there exists a constant such that

for all From this,

(9)

Since sending and then implies that and are indistinguishable.

Remark 2.

Take for and use in (9). We obtain the rate of convergence,

Next we continue to show that weak recovery of is impossible. For , define a random probability measure on by

(10)

for any and , where is the expectation with respect to , an independent copy of . The following lemma relates the expectation of to by a change of measure in terms of

Lemma 3.

Let be a measurable function from to If and are indistinguishable, then

Proof.

Recall the densities and of and from Lemma 2. Let be the probability mass function for Since is independent of , the joint density of is given by

This implies that where

Note that since . For any define

where for and Observe that for From this and the triangle inequality,

Since converges to zero, each term in the above sum must vanish in the limit and thus, letting and then yields

(11)

Now, write

Note that for some normalizing constant. Since from (8) and ,

it follows that

From this and (11), the announced result follows.

Proof of Theorem 1: Impossibility of weak recovery.

Let and . Let be a random probability measure on (see Definition 2) and . Our goal is to show that

(12)

Set

for Note that and is measurable. From Lemma 3,

(13)

We claim that the second expectation converges to zero. For notation convenience, we simply denote and . Note that

The second term in the above equation can be controlled by

where the last inequality used the Cauchy-Schwarz inequality. Using the Cauchy-Schwarz inequality again, the last inequality is bounded above by

Here the second bracket is bounded above by . As for the first one, we observe that is in distribution equal to the Gibbs measure defined in (17) and if we write and , then in distribution, are independent samplings from and is the overlap between and . As a result,

where is the Gibbs average with respect to the product measure . Now, since we can apply Theorem 4 to control the right-hand side by the bound

for some constant independent of Hence, from the above inequalities,

From (13),

which gives the desired limit (12) by using Markov’s and Jensen’s inequalities.

3.3 Proof of Theorem 1