# Phase transition in random tensors with multiple spikes

Consider a spiked random tensor obtained as a mixture of two components: noise in the form of a symmetric Gaussian p-tensor and signal in the form of a symmetric low-rank random tensor. The latter low-rank tensor is formed as a linear combination of k independent symmetric rank-one random tensors, referred to as spikes, with weights referred to as signal-to-noise ratios (SNRs). The entries of the vectors that determine the spikes are i.i.d. sampled from general probability distributions supported on bounded subsets of R. This work focuses on the problem of detecting the presence of these spikes, and establishes the phase transition of this detection problem for any fixed k ≥ 1. In particular, it shows that for a set of relatively low SNRs it is impossible to distinguish between the spiked and non-spiked Gaussian tensors. Furthermore, in the interior of the complement of this set, where at least one of the k SNRs is relatively high, these two tensors are distinguishable by the likelihood ratio test. In addition, when the total number of low-rank components, k, grows in the order o(N^(p-2)/4), the problem exhbits an analagous phase transition. This theory for spike detection implies that recovery of the spikes by the minimum mean square error exhibits the same phase transition. The main methods used in this work arise from the study of mean field spin glass models. In particular, the thresholds for phase transitions are identified as the critical inverse temperatures distinguishing the high and low-temperature regimes of the free energies in the pure p-spin model.

There are no comments yet.

## Authors

• 4 publications
• 1 publication
• 35 publications
• ### Phase transition in the spiked random tensor with Rademacher prior

We consider the problem of detecting a deformation from a symmetric Gaus...
12/05/2017 ∙ by Wei-Kuo Chen, et al. ∙ 0

• ### A Random Matrix Perspective on Random Tensors

Tensor models play an increasingly prominent role in many fields, notabl...
08/02/2021 ∙ by José Henrique de Morais Goulart, et al. ∙ 0

• ### Complex energy landscapes in spiked-tensor and simple glassy models: ruggedness, arrangements of local minima and phase transitions

We study rough high-dimensional landscapes in which an increasingly stro...
04/08/2018 ∙ by Valentina Ros, et al. ∙ 0

• ### Statistical mechanics of low-rank tensor decomposition

Often, large, high dimensional datasets collected across multiple modali...
10/23/2018 ∙ by Jonathan Kadmon, et al. ∙ 8

• ### On maximum-likelihood estimation in the all-or-nothing regime

We study the problem of estimating a rank-1 additive deformation of a Ga...
01/25/2021 ∙ by Luca Corinzia, et al. ∙ 9

• ### Robust Multi-echo GRE Phase processing using a unity rank enforced complex exponential model

Purpose: Develop a processing scheme for Gradient Echo (GRE) phase to en...
06/26/2021 ∙ by Joseph Suresh Paul, et al. ∙ 0

• ### Phase Transitions in Community Detection: A Solvable Toy Model

Recently, it was shown that there is a phase transition in the community...
12/02/2013 ∙ by Greg Ver Steeg, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

This work studies the detection and recovery of a low-rank component in a particular random tensor and characterizes the corresponding phase transitions of the possibility of detection and recovery. In order to motivate this problem, we first discuss a simpler and widely-studied question: When can principal component analysis (PCA) detect and recover low-rank linear structures in noisy data? While detection only requires determining the presence or absence of low-rank structure, the task of recovery aims to reveal the concealed low-rank structure. The use of PCA is equivalent to applying the eigendecomposition of the sample covariance matrix.

One common setting for addressing this question assumes data points

drawn independently from the multivariate normal distribution

, where is the

-dimensional identity matrix, which generates spherically symmetric Gaussian noise,

is a unit column vector in , which generates a rank-one signal, and is the signal-to-noise ratio (SNR). Under this model, the observations , , take the form , where is proportional to the signal with signal-to-noise ratio , and is the Gaussian noise. The question is then whether or not it is possible to apply PCA to detect the presence of the signal when given the data points with different choices of the SNR parameter .

Assume that as . When

[30]

describes the limiting distribution of the eigenvalues of the sample covariance matrix. The well-known BBP phase transition

[3, 4] states that when , the eigenvalues of this matrix still follow the Marchenko-Pastur distribution and thus detection of the low-rank sample is impossible by PCA. In contrast, when , the largest eigenvalue of this matrix stays away from the typical location of the Marchenko-Pastur distribution and PCA can detect the presence of the signal. This phase transition of spike detection is extended in [43] to spike recovery by PCA. More precisely, [43] shows that when

, there is a non-trivial asymptotic correlation between the top eigenvector of the sample covariance and

and thus one can approximately recover by PCA. Moreover, when this asymptotic correlation is zero and PCA cannot recover . Extension of detection and recovery to the case where is also established in [43].

Another common setting for studying the detection-using-PCA problem assumes a random matrix of the form

, where is an Gaussian Wigner matrix111 is a symmetric matrix with independent for and for . and is an -dimensional random vector with i.i.d. entries sampled from a bounded distribution on . The parameter is the SNR. We refer to the rank one component, , as a spike and to as a spiked random matrix. The problem is to detect the presence of the spike in , or equivalently, to distinguish between and . This detection problem exhibits a phase transition similar to that of the previous setting. When the SNR is below a certain critical threshold, the eigenvalue distribution of follows Wigner’s semi-circle law and it is thus impossible to distinguish between and . Once the value of exceeds this critical threshold, the largest eigenvalue jumps away from the typical location of the Wigner semi-circle law and the top eigenvector nontrivially correlates with the signal [15, 16]. Consequently, in this case, one can detect and approximately recover the signal by PCA. Recent studies of phase transitions in detection and recovery of low-rank signals in random matrices include [28, 33, 34, 35, 39, 45].

The latter setting of low-rank detection in spiked random matrices has a natural higher-order generalization to spiked random tensors. This generalization considers the spiked symmetric random -tensor

 Tk=W+1N(p−1)/2k∑r=1βru(r)⊗p.

The first component, , is the symmetric Gaussian -tensor of size , formally defined in Section 2.1. The second component is the signal, which is a linear combination of the spikes , , . Here, are -dimensional vectors whose entries are i.i.d. sampled from probability measures supported on bounded subsets of the real line. We refer to as the vector of SNRs. The detection problem under this setting asks whether identification of the low-rank signal in the tensor is possible for a given vector . The recovery problem seeks to recover, if possible, the low-rank signal for given values of . Answering these question of whether the spike is detectable or recoverable requires characterization of the phase transitions in of the detection and recovery problems.

We remark that the generalized tensor setting is significantly more challenging than the above setting of detecting and recovering rank-one structure in matrices. The former setting involved the best rank-one approximation by PCA. However, for tensors, basic relevant notions, such as rank and best low-rank approximation, are not obvious [27]. Furthermore, many common algorithms for computing these and related notions are NP-hard [24]

. In this work, we study low-rank tensor detection and recovery by common theoretical tests and estimators, which are hard to compute. We leave the analysis of tractable procedures to future work. Following

[18, 22, 35, 36, 44], we say that spike detection is impossible if the total variation distance between and vanishes when tends to infinity. In other words, any statistical test fails to distinguish and (see Section 2.1). On the other hand, we say that detection is possible if this distance is one in the limit. This means that asymptotically one can find a statistical test, in particular, the likelihood ratio test, that distinguishes between and (see Section 2.1). For recovery, we follow [29] and use the minimum mean square error (MMSE) and its corresponding estimator.

Many recent works, which are reviewed in Section 2.5, have studied detection and recovery under the spiked random tensor model. Nevertheless, the optimal phase transition for low-rank detection in spiked random tensors has not yet been established. This paper aims to close this gap. Our main result states that there exist critical thresholds and a set of the form such that detection is impossible if lies strictly in the interior of the set . Furthermore, it is possible to detect the spike via the likelihood ratio test when . In other words, detection is possible only when at least one of exceeds its critical threshold; whereas, if are all smaller than their critical thresholds, one cannot detect the spike. Our result also allows the total number of spikes to grow with . In particular, if and , then similar statements hold. A byproduct of these developments is a new proof for a recent result on the recovery problem by [29] when assuming the same setting of the present paper. In essence, their result states that are the critical thresholds for the MMSE recovery problem.

Our approach is based on methodologies from the study of mean-field spin glass models. Roughly speaking, spin glasses are spin systems that exhibit both quenched disorder and frustration. That is, the interactions between sites are disordered and spin constraints cannot be simultaneously satisfied. These two features are commonly shared by many problems that involve randomized combinatorial optimization

[31, 38]. Mézard et al. [32] reviews the area of spin glasses from the point of view of physicists, whereas mathematical treatments of the subject appear in [41, 50, 51].

Mean-field spin glasses are related to the detection problem by the following key observation: The total variation distance between and can be represented as an integral of the distribution function of the so-called free energy of the normalized pure -spin model with vector-valued spin configurations (see Lemma 2 below). From this observation, the detection problem is reduced to obtaining a tight bound on the fluctuation of the free energy for all values of the SNR vector . Our results reveal that this fluctuation is in the order of when lies in the interior of and is of order 1 when lies in the complement of . These implications allow us to completely characterize the phase transition of the detection problem. In the terminology of spin glasses, we identify as the high-temperature regime of the pure -spin model. Its complement is the low-temperature regime.

Notably, the integral representation of the total variation distance mentioned above was previously observed by Chen [18] under the setting of a single spike sampled from the so-called Rademacher prior, i.e., when and

is a Bernoulli random variable on

with equal probability. In this case, the model reduces to the pure -spin model with Ising spin configuration, and Chen [18] characterized the phase transition of the detection problem in this special case. Theorems 1 and 2 below extend his results to more general distributions for a single spike. Theorems 3 and 4 further extend the results of [18], Theorems 1, and Theorem 2 to the case of multiple spikes. While we follow ideas of Chen [18], the vector-valued spin glass model used here raises nontrivial challenges. Indeed, this is the first full characterization of the phase transition for the tensor detection problem with multiple spikes.

Acknowledgement: The research of W.-K. Chen is partly supported by NSF grants DMS-16-42207 and DMS-17-52184, and Hong Kong Research Grants Council GRF-14302515. He thanks the National Center for Theoretical Sciences and Academia Sinica in Taipei for the hospitality during his visit in June and July 2018, where part of the results and writings were completed. In addition, he is grateful to Lenka Zdeborová for many illuminating discussions. The research of G. Lerman is partially supported by NSF grants DMS-14-18386 and DMS-18-21266.

## 2 Main Results

This section states the main results of this paper and provides the necessary mathematical background. Additionally, it reviews prior results and describes the structure of the rest of the paper, particularly the structure of the proofs of Theorems 1 - 4. Section 2.1 defines the necessary terminology, especially, the distinguishability of two random tensors. Section 2.2 describes our main results for the detection problem in the case of a single spike. In particular, it introduces an auxiliary function that characterizes the high-temperature regime and allows one to simulate the critical SNR. Using this function, we demonstrate numerical simulations of the critical SNR for the sparse Rademacher prior. Section 2.3 states our main results for the detection problem in the case of multiple spikes. Section 2.4 mentions a result for recovery by MMSE that is later obtained from our results for spike detection. Section 2.5 surveys recent related results. Finally, Section 2.6 describes the organization of the proofs of the main results.

### 2.1 Settings and Definitions

Let be an integer. For any integer , denote by the set of all real-valued -tensors equipped with the Borel -field. The inner product of two -tensors is

 ⟨Y,Y′⟩=∑1≤i1,…,ip≤NYi1,…,ipY′i1,…,ip.

Given a vector , we form a rank-one -tensor using the outer product as follows:

 (u⊗p)i1,…,ip=ui1⋯uip,∀1≤i1,…,ip≤N.

Given and a permutation of the set , define by

 Yπi1,…,ip=Yπ(i1),…,π(ip).

A -tensor is said to be symmetric if for all corresponding indices and permutations. Throughout the rest of the paper, we assume that is a random -tensor and all entries in are i.i.d. standard Gaussian. The symmetric Gaussian -tensor of size is obtained by the averaging over all permutations in the symmetric group of letters:

 W=1p!∑πYπ.

In the case , is the Gaussian Wigner matrix.

Next, we define the notion of distinguishability and indistinguishability between any two random -tensors in terms of the total variation distance. For any two random -tensors of size denote by the total variation distance between and , that is,

 dTV(U,V)=supA|P(U∈A)−P(V∈A)|,

where the supremum is taken over all sets in the Borel -algebra generated by symmetric -tensors.

###### Definition 1.

Let be two random -tensors of size . We say that they are distinguishable if

 limN→∞dTV(UN,VN)=1

and are indistinguishable if

 limN→∞dTV(UN,VN)=0.

Distinguishability of and means that there exists a sequence of measurable subsets of such that and From this, if we consider a statistical test defined by for and for then the sum of the type one and type two errors satisfies

 limN→∞P(SN(UN)=1)+P(SN(VN)=0)=limN→∞P(UN∉AN)+P(VN∈AN)=0.

This means that one can statistically distinguish and by the test If we assume that and have nonvanishing densities and , the well-known formula

 dTV(UN,VN)=∫fUN≥fVN(fUN−fVN)dw

implies that

 dTV(UN,VN)=P(UN∈AN)−P(VN∈AN)

for

 AN:={w∈ΩN ∣∣ fUN(w)fVN(w)≥1}.

Therefore, one can naturally use the likelihood ratio test for distinguishing between and . In contrast, when and are indistinguishable, any statistical test is powerless as in this case the total error approaches one as tends to infinity.

### 2.2 Main Results for Detection of a Single Spike

Let be a bounded subset of and be a probability measure on the Borel -field of . Assume that are i.i.d. samples from that are also independent of . Denote We refer to the random variable as the prior. Consider the spiked random -tensor defined by

 T=W+βN(p−1)/2u⊗p. (1)

We say that detection of the spike in is possible if and are distinguishable and detection is impossible if they are indistinguishable in the sense of Definition 1. Note that if , one can immediately detect the spike by noting that

are i.i.d. standard Gaussian and using the strong law of large number. Indeed,

 1N(p+1)/2N∑i1,…,ip=1Wi1,…,ip=1N(p+1)/2N∑i1,…,ip=1Yi1,…,ip→0,

while

 1N(p+1)/2N∑i1,…,ip=1Ti1,…,ip=1N(p+1)/2N∑i1,…,ip=1Yi1,…,ip+β(∑Ni=1uiN)p→β(∫aμ(da))p.

We can thus restrict our discussion of single spike detection to the case when is centered, that is, when . Our first result on spike detection is formulated as follows.

###### Theorem 1.

Assume that is centered. For any there exists a constant such that

1. if , then detection is impossible;

2. if , then detection is possible.

In other words, is the critical threshold that describes the phase transition of the detection problem. As we explained in Section 2.1, when detection is possible, one can use the likelihood ratio test, which uses the ratio of densities , to distinguish between and . In Lemma 2 below, we relate this ratio to the free energy of the pure -spin mean field spin glass model.

The precise value of can be determined as follows. Let

 ξ(s)=sp2

and

 v∗=∫a2μ(da). (2)

For and , consider the geometric Brownian motion

 Z(a,t)=exp(aBt−a2t2),

where is a standard Brownian motion. For , define an auxiliary function on by

 Γb(v) =∫v0ξ′′(s)(γb(s)−s)ds, (3)

where for

 γb(s):=E⎡⎣(∫aZ(a,bξ′(s))μ(da))2∫Z(a,bξ′(s))μ(da)⎤⎦. (4)

Given these notations, the critical value in Theorem 1 can be calculated as follows.

###### Theorem 2.

If and is centered, then is the largest such that .

As an example of the utility of Theorem 2, we demonstrate numerical simulations for estimating the critical threshold for the sparse Rademacher prior. In the sparse Rademacher prior, the entries in are i.i.d. sampled from the probability distribution

 ρ2δ−1√ρ+(1−ρ)δ0+ρ2δ1√ρ,

with parameter that controls the sparsity of the prior. The case corresponds to the regular Rademacher prior, where are i.i.d. sampled from balanced Bernoulli random variable. If the sparse Rademacher prior can be regarded as first uniformly sampling approximately of the coordinates and then for these coordinates, sampling Bernoulli random variables with equal probability. The remaining approximately

coordinates are set to zero. From this construction, the second moment of

is of order 1. To simulate according to the value established in Theorem 2, we numerically evaluate for test values of with increments in the interval between 0 and . For this purpose, we have used the numerical integrator of Mathematica. The critical value is the largest value such that for all test values of , where discrete positive values of with increments were tested. Figure 1 summarizes the numerical results for , 4, 5, 10 and , , 1.

The behavior of is influenced by the portion of zeros and the magnitude of the nonzero jumps. As can be seen, in each of the four figures there exists a threshold (depending on ) such that is increasing on and decreasing on

. Heuristically, in the interval

, the large fraction of the zeros dominates the small portion of far jumps, whose magnitude is large. Therefore, in order to detect the spike, needs to increase as increases, or equivalently, as the magnitude of jumps decreases. On the other hand, in the interval , the far jumps overpower the small fraction of zeros and their magnitude has relatively low variation with . In this case, decreases as increases, or equivalently, as the fraction of far jumps increases. In each subfigure of Figure 1, we indicate by a solid curve the following upper bound for , which was pointed out in [44],

 H(ρ):=2√−ρlogρ−(1−ρ)log(1−ρ)+ρlog2.

We note that as increases the estimated values of are closer to the ones of the upper bound . For , , , we see that if is sufficiently small, then is still a good approximation for .

### 2.3 Main Results for Detection of Multiple Spikes

In this subsection, we study the case of more than one spike and denote the number of spikes by . Let be bounded subsets of and be centered probability measures on the Borel -fields of , respectively. For any let be i.i.d. samples from , which are also independent of Denote

 u(r)=(u1(r),…,uN(r)).

We refer to the random variables , , as priors. For with the spiked tensor is defined by

 Tk=W+1N(p−1)/2k∑r=1βru(r)⊗p. (5)

This spiked tensor extends the one in (1) to multiple spikes. In a manner similar to the previous subsection, we say that detection is possible if and are distinguishable and is impossible if they are indistinguishable. For denote by the critical threshold obtained by plugging into Theorem 2. We extend Theorem 1 to the case of multiple spikes as follows.

###### Theorem 3.

Assume that are centered. For , the following statements hold.

• If , then detection is impossible;

• If , then detection is possible.

Theorem 3 implies that in order to detect the spikes, at least one of the ’s has to exceed its own marginal critical threshold . In particular, if all probability measures are the same, that is, then the above result implies that and are indistinguishable if and are distinguishable if , where is the common threshold for all components. It is natural to ask whether this critical threshold would change if one allows to grow with We show that this is not the case if the growth of is of certain polynomial order, which is sufficiently slow in comparison to the size of the -tensor,

To state our result, let be the probability measure considered in Section 2.2 and let be the corresponding critical value provided by Theorem 2. Assume that for all and that is a sequence of SNRs. Let be the random tensor in (5) with and for

###### Theorem 4.

Assume that and that grows with while satisfying

 limsupN→∞k(N)Np−24=0.

Then the following statements hold.

1. If , detection is impossible.

2. If and is even, detection is possible.

As the number of independent spikes grows in , it seems reasonable to believe that the critical threshold should become smaller since now we have more spikes and it should be relatively easier to detect them in comparison to the case of a fixed finite number of spikes. However, Theorem 4 presents a counterintuitive result that if the total number of spikes is of smaller order than , then the critical threshold remains unchanged. It would be of great interest to investigate the sharpness of the order We comment that the assumption of being even in is used later in (17) to control the system with spikes by the sum of individual single-spike systems. It is a difficult open problem to rigorously determine if the same result is possible when

is odd.

### 2.4 Byproduct: Result for Recovery by MMSE

The proof techniques for the theory of spike detection described above can be applied to establish spike recovery by the minimum mean square error (MMSE). In the present section we state our result that the phase transition and critical thresholds for recovery by the MMSE estimator are the same as the phase transition and critical threshold of the detection problem. We defer the proof of this result to Section 5.

Recall the setting of Section 2.3 where , are centered probability measures, is the spiked tensor and are the critical thresholds. Let be a -valued bounded random variable generated by the -field . The random variable is allowed to depend on other randomness independent of the ’s and . The minimum mean square error (MMSE) is defined by

 MMSEN(¯β):=min^θ1Np∑1≤i1,…,ip≤NE(k∑r=1βrui1(r)⋯uip(r)−^θi1,…,ip)2,

where the minimum is taken over all such . The minimizer to this problem is attained by the minimum mean square estimator,

 ^θMMSE =k∑r=1βrE[ui1(r)⋯uip(r)|Tk].

By restricting the minimum in the definition of to the so-called dummy estimators [28], i.e., estimators where is independent of any for all and , one obtains the following upper bound for :

 MMSEN(¯β)≤1Np∑1≤i1,…,ip≤N(E(k∑r=1βrui1(r)⋯uip(r))2−(E[k∑r=1βrui1(r)⋯uip(r)])2).

Denote , for . Applying the strong law of large numbers and the fact that , , are centered, taking to infinity in the above bound yields the asymptotic bound

 limsupN→∞MMSEN(¯β) ≤DMSE(¯β):=k∑r=1β2rvpr,∗.

Using this terminology, our main result for spike recovery by MMSE is formulated as follows.

###### Theorem 5.

For , the following statements hold.

• If , then

• If , then

This theorem asserts that if the SNRs of all marginal spikes are less than their critical thresholds, then the minimum mean square estimator is no better than a dummy estimator. In contrast, if at least one of the SNRs of the marginal spikes is larger than its critical threshold, the minimum mean square estimator performs better than all dummy estimators.

As mentioned before, the MMSE recovery problem for the spiked random tensor for more general priors was studied earlier by Lesieur et al. [29]. They computed the limiting mutual information between and and used it to establish a result equivalent to Theorem 5. Our proof of Theorem 5 relies heavily on our main results for the detection problem and is thus a completely different approach than the one taken in [29].

### 2.5 Previous Results

Understanding phase transitions of spike detection and recovery problems in spiked random matrices and tensors has received a lot of attention in the past several years. We summarize here some recent works.

Matrix Case: . Barbier et al. [6] studied the MMSE recovery problem in the spiked random matrix in (1) (see the setting in Section 2.4 with and ) by deriving a replica symmetric Parisi-type formula for the mutual information between and . Analogous study for the case of multiple spikes (5) was handled by Lelarge et al. [28], where are assumed to have finite second moments and are allowed to be correlated. Similar result for the non-symmetric case was pursued by Miolane [33].

As for the detection problem, under the same setting as (1), Alaoui et al. [22] obtained the same critical value specified in Equation (7) and Proposition 1 below. It was deduced that above , detection is possible and below , a weak form of detection remains possible in the sense that the limiting total error (the sum of type one and type two errors) of the likelihood ratio test between and is strictly less than one. Incidentally, we mention that when the results of [6, 28] apply to the case (1), is also the critical threshold for recovery.

Tensor Case: Earlier results trace back to the works of Montanari and Richard [36] and Montanari et al. [35], where the authors considered (5) with and a spherical prior. By adoption of the second moment method, they showed that there exist and such that detection is impossible for below and is possible for above

Lesieur et al. [29] considered (5) with a general setting in which the vectors for

are i.i.d. sampled from a joint distribution with finite second moments. For centered priors, they proved that there exists a vector of critical thresholds

such that for any satisfying for , the MMSE estimator obtains a better error than any dummy estimator. Consequently, one can also detect the spike in that case. In addition, when satisfies for all the MMSE estimator is statistically irrelevant to recover the spike. They did not provide results for the detection problem in this case. Notably, in the case that are chosen as in Section 2.3, our critical thresholds agree with and as a consequence, their result in this case is the same as Theorem 5. Barbier and Macris [8] provided a different proof for the results of [29] by using stochastic interpolation. Analogous results to [29] were developed for non-symmetric settings by Barbier et al. [10].

In another work, Perry et al. [44] focused on and three priors: the spherical prior, the Rademacher prior, and the sparse Rademacher prior. In these three settings, it was proved that there exist lower and upper bounds and such that detection is not possible when and is possible when In particular, their result in the spherical case improved the existing bounds in [35, 36] mentioned above. For the Rademacher prior, Chen [18] closed the gap between and by showing that in Theorem 2 is indeed the critical threshold for detection. The present work extends the results of [18, 44] to a broader class of priors and also to .

The proofs of [18] rely on the Parisi formula for the free energy as well as a cavity argument for controlling the moments of the overlap between two spin configurations sampled independently from the Gibbs measure. In particular, the critical temperature which distinguishes the high and low-temperature regimes is determined by an algebraic equation deduced from the so-called replica symmetric ansatz in Parisi’s variational formula for the free energy. Our results extend the ones of [18] to the more general setting, where multiple spikes are allowed and the distributions of the vectors determining the spikes can be sampled from arbitrary probability measures on bounded subsets of . There are non-trivial difficulties in obtaining this extension. For example, note that the main results in [18] were based on the strict monotonicity of in . The proof of this proposition heavily relied on the symmetry of the Rademacher prior and it does not apply to more general priors. In order to prove our main results, an analogous, though more general proposition, is established in Lemma 5 below, which requires the development of additional arguments. Furthermore, there are various subtleties and non-trivial challenges that one needs to handle when addressing the vector-valued model, see Section 9 below. In fact, not much about the characterization of the high temperature regime in vector-valued spin glass models is known.

Other Related Works. Since the likelihood ratio test and the MMSE estimator are often intractable to compute, it is natural to ask about the performance of tractable algorithms for detection and recovery of low-rank signals. Both [28] and [33] studied the performance of the approximate message passing (AMP) algorithm in recovering the spike. See [12, 21, 26, 46] for the performance of AMP for MMSE and compressed sensing. See also [5, 7, 9, 19, 20, 37] for the performance of AMP and [13] for the performance of the Langevin dynamics in the spiked tensor model. The complexity of energy landscapes in spiked tensor models was studied in [14, 48].

### 2.6 Structure of the Rest of the Paper

The key ingredient of this paper is an observation that the total variation distance between and can be expressed as an integral related to the free energy of the so-called pure -spin models with scalar- and vector-valued spin configurations (Lemma 2). Section 3 defines these models, characterizes their high-temperature regimes and presents results on the fluctuation of the free energy and concentration of the overlap of the models. Using this background material, we establish Theorems 1-4 in Section 4. In Section 5, we present the proof of Theorem 5, which we prove using our results on the detection problem as well as the so-called Nishimori identity.

The rest of the sections are devoted to establishing the main results in Section 3. In Sections 6 and 9, we prove the asserted structures of the high-temperature regimes. These proofs are the most crucial components in this paper. Sections 7 and 8 establish the high-temperature behavior of the overlap and the free energy when Finally, Section 10 extends the theory established in Sections 7 and 8 to the case where . Since the arguments are similar to those of the case where , we only sketch them while emphasizing the difference between the two cases.

## 3 Pure p-spin Models

In this section, we introduce the pure -spin mean field spin glass models with scalar-valued and vector-valued spin configurations and formulate some crucial results regarding their high-temperature behavior. Their proofs are deferred to later sections.

### 3.1 Scalar-valued Model

Recall the random tensor from Section 2.1 and the probability space from Section 2.2. For any , the Hamiltonian of the pure -spin model is defined as

 XN(σ)=1N(p−1)/2⟨Y,σ⊗p⟩=1N(p−1)/2∑1≤i1,…,ip≤NYi1,…,ipσi1⋯σip,

where the ’s are i.i.d. standard Gaussian random variables. Note that by the symmetry of , we also have the identity For any two spin configurations and , the covariance of can be computed as

 E(XN(σ1)XN(σ2))=N(R(σ1,σ2))p,

where is the overlap between and defined by

 R(σ1,σ2)=1NN∑i=1σ1iσ2i.

Define the normalized Hamiltonian by

 HN,β(σ)=βXN(σ)−β2N2R(σ,σ)p.

By normalization, we mean that

 EeHN,β(σ)=1.

Associated to this Hamiltonian, define the free energy and Gibbs measure respectively by

 FN(β)=1Nlog∫eHN,β(σ)μ⊗N(dσ) (6)

and

 GN,β(dσ) =eHN,β(σ)μ⊗N(dσ)ZN,β,

where is the normalizing constant so that is a probability measure on It can be shown (see Proposition 3 below) that for all , the limiting free energy exists and is equal to a nonrandom quantity. Denote this limit by From the normalization of and Jensen’s inequality, we readily see that for all Define the high-temperature regime as

 R={β>0:F(β)=0},

the low-temperature regime as and the critical threshold as

 βc=supR. (7)

In spin glasses, the parameter is understood as the (inverse) temperature parameter, while in the detection problem of (5), it is interpreted as the signal strength or SNR. These equivalent meanings of are justified below in Lemma 2 via an integral representation for the total variation distance between and .

The following proposition shows that the high-temperature regime is an interval and its right-end boundary is . It also characterizes this regime in terms of the constant and the auxiliary function defined in (2) and (3), respectively.

###### Proposition 1.

If , then Furthermore, for if and only if

 supv∈(0,v∗]Γβ(v)≤0.

This proposition implies that is the critical temperature distinguishing the high and low-temperature regimes of the pure -spin model. It also implies the formula for provided in Theorem 2. That is, is the largest such that

 supv∈(0,v∗]Γβ(v)=0.

Indeed, assume on the contrary that , then since is continuous in , it is possible to find such that . Application of Proposition 1 then yields that , which contradicts the maximality of and thus verifies the above formula.

Next, denote by the Gibbs expectation with respect to the i.i.d. samples from the Gibbs measure We show that in the interior of the high-temperature regime, the overlap between two i.i.d. samples and is concentrated around zero.

###### Theorem 6.

For , , and , there exists a constant , depending only on , , and , such that

 (8)

Furthermore, we control the fluctuation of the free energy as follows.

###### Proposition 2.

For and , there exists a constant , depending only on and , such that

 P(|FN(β)|≥l)≤Kl2Np/2+1,∀l>0,N≥1.

In the case that is a uniform probability measure on , the behavior of the overlap and the fluctuation of the free energy at high-temperature is well-understood and we briefly summarize it here. The case corresponds to the famous Sherrington-Kirkpatrick (SK) model. In this case, Aizenman et al. [1] proved that converges to a Gaussian random variable when and Talagrand [51, Chapters 11 and 13] obtained the moment control of Theorem 6. For Bardina et al. [11] established (8) for . For even Bovier et al. [17] showed that has a Gaussian fluctuation up to some temperature strictly less than More recently, Chen [18] obtained the same statements as Theorem 6 and Proposition 2 for this choice of Our main contribution here is to establish concentration of the overlap and the fluctuation of the free energy up to the critical temperature for any spin configurations sampled from a probability measure on a bounded subset of the real line.

###### Remark 1.

From Proposition 2, it is tempting to conjecture that follows Gaussian law in the weak limit throughout the entire high-temperature regime for all Based on Theorem 6 and Proposition 2, we anticipate that this can be proved by adapting a previous argument for the SK model [51, Section 11.4].

###### Remark 2.

Although we only consider the pure -spin model here, the mixed -spin model222The normalized Hamiltonian for the mixed -spin model is , where is a Gaussian process on with zero mean and covariance structure for some with and is studied more often in the community of spin glasses. In this general setting, we can define its free energy, Gibbs measure, and high-temperature regime in a similar fashion as above and check that Proposition 1 holds for any mixture. In addition, the statements of Theorem 6 and Proposition 2 are also valid if the following assumption holds: There exists some such that and for all .

### 3.2 Vector-valued Model

Next we consider the pure -spin model with -dimensional vector-valued spin configurations, where Recall the probability spaces from Section 2.3. Set the product space and measure by

 ¯Λ =Λ1×⋯×Λk,1≤i≤N, ¯μ =μ1⊗⋯⊗μk.

For , where denote

 ¯σi=(σi(1),…,σi(k))T∈¯Λ

and

 ¯σ=(¯σ1,…,¯σN)∈¯ΛN.

In other words, the spin configuration is a matrix: the rows are and the columns are . Given with , the pure -spin Hamiltonian with vector-valued spin configurations is defined for any as

 HN,¯β(¯σ) =k∑r=1βrXN(σ(r))−k∑r,r′=1βrβr′2NR(σ(r),σ(r′))p.

Similar to the scalar-valued model, the free energy and the Gibbs measure are defined as

 FN(¯β)=1Nlog∫eHN,¯β(¯σ)¯μ(d¯σ) (9)

and

 GN,¯β(d¯σ)=eHN,¯β(¯σ)¯μ⊗N(d¯σ)ZN,¯β,

where is the normalizing constant. Define

 F(¯β)=limsupN→∞FN(¯β).

There is a technical subtlety here that is not present in the model of the previous subsection. In the case of even Panchenko [42] proved that if one drops the overlap term in , then the limiting free energy with overlap constraint exists. Consequently, one can show that (see the proof of Proposition 3 below). When is odd, this limit is preserved if , as explained in the previous subsection, but whether it is still true for remains an open question.

An application of Jensen’s inequality ensures that . The high-temperature regime is defined as

 ¯R={¯β=(β1,…,βk)∣βr>0 for all 1≤r≤k and F(¯β)=0}.

Again, while is understood as the vector of SNRs in the detection problem, we read the entries of this vector as the temperature parameters in the setting of spin glass models. Let be the critical temperature obtained from Section 3.1 by taking The following theorem states that the high-temperature regime of the vector-valued -spin model is equal to the product of the high-temperature regimes of the marginal systems.

For