The problem of detecting a deformation in a symmetric Gaussian random tensor is concerned about whether there exists a statistical hypothesis test that can reliably distinguish the deformation from the noise. In the matrix case, if is a Gaussian Wigner ensemble and
is a unit vector, the goal is to distinguish the unspiked matrixand the spiked matrix for a given signal-to-noise ratio
It is well-known that in this case the top eigenvalue of the spiked matrix exhibits the so-called BBP (Baik, Ben Arous, and Péché) transition[5, 15, 17, 43]. Namely, the top eigenvalue successfully detects the signal if the strength of exceeds the critical threshold , while it fails to provide indicative information if It was further proved in [34, 37, 45] that for any every statistical hypothesis test can not distinguish the spiked and unspiked matrices.
In recent years, the above phenomena were also studied in the spiked symmetric Gaussian random -tensor model of the form, . Earlier results were obtained by Montanari and Richard  and Montanari, Reichman, and Zeitouni  in the setting of spherical prior, where they showed that there exist and with such that if the signal-to-noise ratio exceeds , it is possible to distinguish the spiked and unspiked tensors and weakly recover the signal, but these are impossible if the signal-to-noise ratio is less than More recently, Lesieur et al.  studied this detection problem for any prior by means of the minimal mean-square-error (MMSE). They discovered that there exists a critical threshold (depending on the prior) such that when the signal-to-noise ratio exceeds the critical value
, one can distinguish the two tensors by the MMSE and weakly recover the signal by the MMSE estimator. In contrast, when, MMSE fails to distinguish the two tensors and weakly recovery of the signal is not possible. On the other side, Perry, Wein, and Bandeira  studied the detection problem with spherical, Rademacher, and sparse Rademacher priors and they provided an improvement on the bounds in [34, 35]. Moreover, their results showed that in each of these three cases, there exists a such that for any , it is impossible to distinguish the two tensors in the sense that the total variation distance between the spiked and unspiked tensors asymptotically vanishes. As a consequence, every statistical hypothesis test fails to distinguish the two tensors. The paper  then left with a conjecture that indistinguishability between the two tensors should be valid up to the critical threshold .
The aim of this paper is to study the symmetric Gaussian random -tensor () with Rademacher prior as in the setting of . We show that the threshold , suggested by the MMSE method in , is indeed the critical value that strictly separates the distinguishability and indistinguishability between the spiked and unspiked tensors under the total variation distance. More precisely, it is established that when the signal-to-noise ratio is less than the critical value the total variation distance between the spiked and unspiked tensors converges to zero. This establishes the aforementioned prediction in . In particular, we identify the critical value as the critical temperature, distinguishing the high and low temperature behavior, of the Ising pure -spin mean-field spin glass model.
Our approach is based on the methodologies originated from the study of mean-field spin glasses, especially those for the Sherrington-Kirkpatrick model and the mixed -spin models, see [39, 49, 50]. Roughly speaking, spin glass models are disordered spin systems initially invented by theoretical physicists in order to explain the strange magnetic behavior of certain alloys, such as CuMn. Mathematically, they are usually formulated as stochastic processes with high complexity and present several crucial features, e.g., quenched disorder and frustration, that are commonly shared in many real world problems, involving combinatorial random optimizations. Over the past decades, the study of spin glasses has received a lot of attention in both physics and mathematics communities, see  for physics overview and [39, 49, 50] for mathematical development.
One way to investigate the detection problem in the symmetric Gaussian random tensor is through the total variation distance between the spiked and unspiked tensors. While in the detection problem represents the signal-to-noise ratio, we regard as a (inverse) temperature parameter in the pure -spin model. Notably, under this setting, the ratio of the densities between the two tensors can be computed as the partition function of the pure -spin model with temperature In [34, 44]
, the authors controlled the total variation distance by the second moment of the partition function. Different than their consideration, we relate this distance to the free energy of the pure-spin model with Ising spin, see Lemma 2. This relation allows us to show that the critical threshold can be determined by the critical temperature of the pure -spin model. In bounding the total variation distance, the most critical ingredient is played by a sharp upper bound concerning the fluctuation of the free energy up to the critical temperature for all . In the case , the pure
-spin model is famously known as the Sherrington-Kirkpatrick model and its free energy was shown to possess a Gaussian central limit theorem in the weak limit up to the critical temperatureby Aizenman, Lebowitz, and Ruelle . As for even , Bovier, Kurkova, and Löwe  showed that the same result also holds (with different scaling than that in the Sherrington-Kirkpatrick model), but not up to the critical temperature. Our main contribution is that we obtain a sharp upper bound for the fluctuation of the free energy, which is comparable to the one in  and more importantly it is valid up to the critical temperature for all
including oddThis allows us to extract a sharp upper bound for the total variation distance and deduce the desired result.
Besides the consideration of the detection problem, we also present some new results and arguments for the pure -spin models that are of independent interest in spin glasses. First, we show that if the temperature is below the critical value , the model presents the high temperature or replica symmetric
solution in the sense that any two independently sampled spin configurations from the Gibbs measure are essentially orthogonal to each other by providing exponential tail probability and moment controls. While these results can also be established atvery high temperature by some well-known techniques in spin glasses, such as the cavity method, the second moment method, and Latala’s argument (see [49, Chapter 1]), it is relatively a more challenging task to obtain the same behavior throughout the entire high temperature regime. We show that this is achievable in the pure -spin model (see Theorem 3 and 4) and indeed, our method can also be applied to more general situations, the mixed -spin models (see Remark 3). Next, in terms of technicality, our argument for the above result is based on the Guerra-Talagrand replica symmetry breaking bound for the coupled free energy for two systems. This bound has been playing a critical role in the study of the mixed even -spin models, see Talagrand . Its validity for the model involving odd
mixture is however generally unknown as it is unclear whether the error term along Guerra’s replica symmetry breaking interpolation possesses a nonnegative sign or not. To tackle this obstacle, we adopt the synchronization property, introduced by Panchenko[41, 42], that the overlap matrix is asymptotically symmetric and positive semi-definite under the Gibbs average, which was established heavily relying on the fact that the Ghirlanda-Guerra identities imply ultrametricity of the overlaps . This allows us to show that the error term creates a nonnegative sign and ultimately leads to the validity of the Guerra-Talagrand bound in the pure odd -spin model if one restricts the functional order parameters to be of one-step replica symmetry breaking. Whether this bound is valid for more general functional order parameters remains open.
For other related works on the detection problem of spiked matrices and tensors, we invite the readers to check a variety of low rank matrix estimation problems, including explicit characterizations of mutual information in [7, 28, 29, 31, 33] and the performance of the approximate message passing (AMP) for MMSE method and compressed sensing in [7, 13, 23, 24, 27]. More recently, the fluctuation of the likelihood ratio in the spiked Wigner model was studied in . In the case of spiked tensors, phase transitions of the mutual information and the MMSE estimator were recently studied for any and given prior, see [9, 30] for symmetric case and  for non-symmetric case. The performance of the AMP in the spiked tensor was also investigated in . See also [6, 8, 10, 21, 22, 36, 46]. For the study of Gaussian random -tensor in terms of complexity, see .
This paper is organized as follows. In Section 2, we state our main results on the detection problem. In Section 3, their proofs are presented and are essentially self-contained for those who wish to learn only the roles of spin glass results in the detection problem. The high temperature results on the pure -spin model are gathered and established in Section 4 with great details.
2 Main results
We begin by setting some standard notation. Let For any denote by the space of all real-valued tensors . For any two tensors and , their outer product and inner product are defined respectively by
For we define as the -th order power of For any permutation and any tensor , is defined as . We say that a tensor is symmetric if for all Let be the collection of all symmetric tensors. For any measurable for some , we use to stand for the Borel -field on
The symmetric Gaussian random tensor is defined as follows. Denote by a random tensor with i.i.d. entries Define a symmetric random tensor by
For example, if , is the Gaussian orthogonal ensemble, i.e., are independent of each other with for and Let . Assume that is sampled uniformly at random from and is independent of Set the spiked random tensor as
be the total variation distance between and . We now make the distinguishability and indistinguishability between and precise.
We say that
and are indistinguishable if .
and are distinguishable if
Let be a realization of the prior. For a given tensor
, consider the detection problem that under the null hypothesis,and under the alternative hypothesis, Item essentially says that any statistical test can not reliably distinguish these two hypotheses. Item means there exists a sequences of events that distinguishes these two tensors. Next we define the notion of weak recovery for .
For we say that weak recovery of is possible if there exists a sequence of random probability measures on and a constant such that
and that weak recovery of is not possible if for any random probability measure on and constant ,
Here is a random probability measure on means that is a mapping from to such that is -measurable for each and is a probability measure on for each
A few comments are in position. Consider a given realization of signal and tensor . Equation (1) ensures that there exists some produced though the measure such that and have a nontrivial overlap. To understand (2), let be any measurable function. If we consider the random probability measure defined by
then from (2), for any
In other words, any vector generated by is uncorrelated with the signal and thus, it does not provide indicative information about We emphasize that Definitions 1 and 2 are not directly related to each other. Nevertheless, we will show that both of them hold up to a critical threshold in our main result below.
Now we introduce the pure -spin model. For each set . The Hamiltonian of the pure -spin model is defined by
for . Its covariance can be computed as
where is the overlap between ,
In the terminology of detection problem, is understood as the signal-to-noise ratio. In the pure -spin model, we regard as a (inverse) temperature parameter. For a given temperature define the free energy by
If , this model is known as the Sherrington-Kirkpatrick model and it has been intensively studied over the past decades. The readers are referred to check the books [39, 49, 50] for recent mathematical advances for the SK model as well as more general models involving a mixture of pure -spin interactions. In particular, it is already known that the thermodynamic limit of () converges to a nonrandom quantity that can be expressed as the famous Parisi formula, see, e.g., [40, 48]. Denote this limit by . A direct application of Jensen’s inequality to (4) implies that for all
Define the high temperature regime for the pure -spin model as
Set the critical temperature by
Our main result shows that is the critical threshold in the detection problem.
Let The following statements hold:
If then and are indistinguishable and weak recovery of is impossible.
If , then and are distinguishable and weak recovery of is possible.
Our main contribution in Theorem 1 is the part on the indistinguishability of and in the statement . Previous results along this line were established in , where the authors showed that there exists some so that and are indistinguishable for any . Theorem 1 here proves that this behavior is indeed valid up to the critical value . As one will see from Theorem 2 below, we give a characterization of the high temperature regime and provide one way to simulate in terms of an auxiliary function deduced from the optimality of the Parisi formula for the free energy at high temperature. Numerically, it is obtained that
This agrees with the prediction in [44, Figure 1]. We comments that for Theorem 1, a polynomial rate of convergence for the total variation distance can also be obtained, see Remark 2 below. In comparison, we add that Theorem 1 is quite different from the BBP transition for , see [5, 34, 37, 45]. In this case, and it is known that for , one can distinguish and in the sense of Definition 1 by using the top eigenvalue. For it presents a weaker sense of distinguishability, , see Remark 1.
As mentioned before, the work  investigated the present detection problem for any given prior. Their results state that one can distinguish and by the MMSE method and weakly recover the signal through the MMSE estimator when ; if they concluded that weak recovery of the signal is not possible. In other words, their results imply the weak recovery part of item as well as the statement of item . Nevertheless, we emphasize that their approach and the way how the critical value was discovered are fundamentally different from the argument we present here. As one will see, while Theorem 1 follows directly from a relation (see Lemma 2) between the total variation distance and the free energies, the delicate part is Theorem 1, which is the major component of this paper.
Acknowledgements. The author thanks G. Ben Arous for introducing this project to him and A. Auffinger and A. Jagannath for some fruitful discussions. He is indebted to D. Panchenko for asking good questions that lead to a major improvement on the result of weak recovery of and several suggestions regarding the presentation of the paper. He also thanks J. Barbier, A. Montanari, and L. Zdeborová for suggesting several references and valuable comments. Research partly supported by NSF DMS-1642207 and Hong Kong Research Grants Council GRF-14302515.
3 Proof of Theorem 1
3.1 Total variation distance
In this subsection, we prepare some lemmas for Theorem 1. Recall that is a random tensor with i.i.d. and is the symmetric random tensor generated by Throughout the remainder of the paper, we use to standard for an indicator function on a set We first establish an elementary expression of the total variation distance.
Let be two -dimensional random vectors with densities and satisfying and a.e. on Then
Using Fubini’s theorem and this equation, the first equality follows by
To obtain the second equality, one simply exchanges the roles of .
Recall the free energy and the Rademacher prior from Section 2. Define an auxiliary free energy of the pure -spin model with a Curie-Weiss type interaction as
where In Appendix, it will be established that the limit of converges a.s. to a nonrandom quantity for any . Denote this limit by The following lemma relates the total variation distance between and to the free energy and the auxiliary free energy .
For any we have that
Note that has density on for some normalizing constant For any , a change of variables gives
where is the expectation with respect to only and is the Lebesgue on This implies that the density of is given by Now since
Here, since for any and , we see that
where is the expectation of , an independent copy of and independent of , and the second equality of both displays used the assumption that and Our proof is then completed by applying Lemma 1.
3.2 Proof of Theorem 1
The central ingredient throughout our proof is played by the high temperature behavior of the pure -spin model stated in Section 4 below, namely, a tight upper bound for the fluctuation of the free energy in Proposition 1 and a good moment control for the concentration of the overlap around zero under the Gibbs measure in Theorem 4. The former will be directly used to show that the total variation distance vanishes via the exact expression (6), while the latter is vital in order to establish the impossibility of weak recovery of
Proof of Theorem 1: Indistinguishability.
Let . Assume that . For any , writing in (6) gives
To complete the proof, we use a key property about the fluctuation of the free energy stated in Proposition 1 below, which says that there exists a constant such that
for all From this,
Since sending and then implies that and are indistinguishable.
Take for and use in (9). We obtain the rate of convergence,
Next we continue to show that weak recovery of is impossible. For , define a random probability measure on by
for any and , where is the expectation with respect to , an independent copy of . The following lemma relates the expectation of to by a change of measure in terms of
Let be a measurable function from to If and are indistinguishable, then
Recall the densities and of and from Lemma 2. Let be the probability mass function for Since is independent of , the joint density of is given by
This implies that where
Note that since . For any define
where for and Observe that for From this and the triangle inequality,
Since converges to zero, each term in the above sum must vanish in the limit and thus, letting and then yields
Note that for some normalizing constant. Since from (8) and ,
it follows that
From this and (11), the announced result follows.
Proof of Theorem 1: Impossibility of weak recovery.
Let and . Let be a random probability measure on (see Definition 2) and . Our goal is to show that
for Note that and is measurable. From Lemma 3,
We claim that the second expectation converges to zero. For notation convenience, we simply denote and . Note that
The second term in the above equation can be controlled by
where the last inequality used the Cauchy-Schwarz inequality. Using the Cauchy-Schwarz inequality again, the last inequality is bounded above by
Here the second bracket is bounded above by . As for the first one, we observe that is in distribution equal to the Gibbs measure defined in (17) and if we write and , then in distribution, are independent samplings from and is the overlap between and . As a result,
where is the Gibbs average with respect to the product measure . Now, since we can apply Theorem 4 to control the right-hand side by the bound
for some constant independent of Hence, from the above inequalities,
which gives the desired limit (12) by using Markov’s and Jensen’s inequalities.