1 Introduction
In estimation problems, the goal is to recover a structured object from an observed input which partially obfuscates it. Formally, an estimation problem is specified by a family of distributions over parametrized by . The input consists of a sample drawn from for some , and the goal is to recover the value of the parameter . We refer to as the hidden variable or the parameter, and to the sample as the measurement or the instance.
Often, it is informationtheoretically impossible to recover hidden variables in that their value is not completely determined by the measurements. Further, even when recovery is informationtheoretically possible, in many highdimensional settings it is computationally intractable to recover . For these reasons, we often seek to recover
approximately by minimizing the expected loss for an appropriate loss function. For example, if
denotes the estimate for given the measurement , a natural goal would be to minimize the expected meansquare loss given by .In many cases, we can formulate such a minimization problem as a feasibility problem for a system of polynomial equations. By classical NPcompleteness results, general polynomial systems in many variables are computationally intractable in the worst case. In our context, an estimation problem gives rise to a distribution over polynomial systems that encode it. We wish to study a typical system drawn from this distribution. If the underlying distributions are sufficiently wellbehaved, polynomial systems yield an avenue to design algorithms for highdimensional estimation problems.
In this survey, our tool for studying such polynomial systems will be sumofsquares (SoS) proofs. Sumofsquares proofs yield a complete proof system for reasoning about polynomial systems [Kri64, Ste74]. More importantly, SoS proofs are constructive: the problem of finding a sumofsquares proof can be formulated as a semidefinite program, and thus algorithms for convex optimization can be used to find a sumofsquares proof when one exists. Lowdegree SoS proofs can be found efficiently, and the computational complexity of the algorithm grows exponentially with the degree of the polynomials involved in the proof.
The study of lowdegree SoS proofs in the context of estimation problems suggests a rich family of questions. For natural estimation problems, if a polynomial system drawn from the corresponding distribution is feasible, can one harness sumofsquares proofs towards solving the polynomial system? (surprisingly, the answer is often yes!) If a system from this distribution is typically infeasible, what is the smallest degree of a sumofsquares refutation? Are there structural characterizations of the degree of SoS refutations in terms of the properties of the distribution? Is there a connection between the existence of lowdegree SoS proofs and the spectra of random matrices associated with the distribution (yielding efficient spectral algorithms)? Over the past few years, significant strides have been made on all these fronts, exposing the contours of a rich theory that remains largely hidden. This survey will be devoted to expounding some of the major developments in this context.
1.1 Estimation problems
We will start by describing a few estimation problems that will be recurring examples in our survey.
Example 1.1 (clique).
Fix a positive integer . In the clique problem, a clique of size is planted within a random graph drawn from the ErdősRényi distribution denoted . The goal is to recover the clique.
Formally, the structured family is parametrized by subsets . For a subset , the distribution over measurements is specified by the following sampling procedure:

Sample a graph from the ErdősRényi distribution and set where denotes the clique on the vertices in .
An application of the second moment method
[GM75] shows that for all , the clique can be exactly recovered with high probability given the graph . However, for any , there is no known polynomial time algorithm for the problem with the best algorithm being a brute force search running in time . Improving upon this runtime is an open problem dating back to Karp in 1976 [Kar76], but save for the spectral algorithm of Alon et al. for [AKS98a], the only progress has been in proving lower bounds against broad classes of algorithms (e.g. [Jer92, FK03, FGR17, BHK16]).We will now see how to encode the problem as a polynomial system. For pairs , let denote the natural encoding of the graph , namely, for all . Set . We will refer to the variables as instance variables as they specify the input to the problem. The variables will be referred to as the hidden variables. We encode each constraint as a polynomial equality or inequality:
are Boolean  
if then are not both in clique  
at least vertices in clique 
Note that when we are solving the estimation problem, the instance variables are given, and the hidden variables are the unknowns in the polynomial system. It is easy to check that the only feasible solutions
for this system of polynomial equations are Boolean vectors
which are supported on cliques of size at least in .Refutation and distinguishing.
For every estimation problem that we will encounter in this survey, we can associate two related computational problems termed refutation and distinguishing. In estimation problems, we typically think of instances as having structure: we sample from a structured distribution , and we wish to recover the hidden variables that give structure to . But there may also be instances which do not have structure. The goal of refutation is to certify that there is no hidden structure, when there is none.
A null distribution is a probability distribution over instances for which there is no hidden structure . For example, in the clique problem, the corresponding null distribution is the ErdősRényi random graph (without a planted clique). With high probability, a graph has no clique with significantly more than vertices. Therefore, for a fixed , given a graph , the goal of a refutation algorithm is to certify that has no clique of size . Equivalently, the goal of a refutation algorithm is to certify the infeasibility of the associated polynomial system.
The most rudimentary computational task associated with estimation and refutation is that of distinguishing. The setup of the distinguishing problem is as follows. Fix a prior distribution on the hidden variables , which in turn induces a distribution on , obtained by first sampling and then sampling . The input consists of a sample which is with equal probability drawn from the structured distribution or the null distribution . The computational task is to identify which distribution the sample is drawn from, with a probability of success for some constant . For example, the structured distribution for clique is obtained by setting the prior distribution of to be uniform on subsets of of size . In the distinguishing problem, the input is a graph drawn from either or the null distribution , and the algorithm is required to identify the distribution. For every problem included in this survey, the distinguishing task is formally no harder than estimation or refutation, i.e., the existence of algorithms for estimation or refutation immediately implies a distinguishing algorithm.
Example 1.2.
(tensor PCA) The family of structured distributions is parametrized by unit vectors . A sample from consists of a
where is a symmetrictensor whose entries are i.i.d Gaussian random variables sampled from
. The goal is to recover a vector that is close as possible to .A canonical strategy to recover given is to maximize the degree polynomial associated with the symmetric tensor . Specifically, if we set
then one can show that with high probability over . If then . Furthermore, when it can be shown that is close to the unique maximizer of the function . So the problem of recovering can be encoded as the following polynomial system:
is in the unit sphere  
has large value for 
In the distinguishing and refutation versions of this problem, we will take the null distribution to be the distribution over tensors with independent Gaussian entries sampled from (equivalent to the distribution of the noise from ). For a tensor , the maximum of over the unit ball is referred to as the injective tensor norm of the tensor , and is denoted by . If then with high probability over choice of [ABAC̆]. Thus when , the refutation version of the tensor PCA problem reduces to certifying an upper bound on . If we could compute exactly, then we can certify that for as large as .
The injective tensor norm is known to be computationally intractable in the worst case [Gur03, Gha10, BBH12]. Understanding the the function for random
is a deep topic in probability theory and statistical physics (e.g.
[ABAC̆]). As an estimation problem, tensor PCA was first considered by [MR14], and inspired multiple followup works concerned with spectral and SoS algorithms (e.g. [HSS15, HSSS16, RRS17, BGL17]).Example 1.3.
(Matrix & Tensor Completion) In matrix completion, the hidden parameter is a rank matrix . For a parameter , the measurement consists of a partial matrix revealing a subset of entries of , namely for a subset with . The probability distribution over measurements is obtained by picking the set to be a uniformly random subset of entries.
To formulate a polynomial system for recovering a rank matrix consistent with the measurement , we will use a matrix of variables , and write the following system of constraints on it:
is consistent with measurement 
Tensor completion is the analogous problem with being a higherorder tensor namely, for some fixed . The corresponding polynomial system is again over a matrix of variables with columns and the following system of constraints,
is consistent with measurement 
1.2 Sumofsquares proofs
The sumofsquares (SoS) proof system is a restricted class of proofs for reasoning about polynomial systems. Fix a set of polynomial inequalities in variables . We will refer to these inequalities as the axioms. Starting with the axioms , a sumofsquares proof of is given by an identity of the form,
where are real polynomials. It is clear that any identity of the above form manifestly certifies that the polynomial , whenever each for real . The degree of the sumofsquares proof is the maximum degree of all the summands, i.e., .
Sumofsquares proofs extend naturally to polynomial systems that involve a set of equalities along with a set of inequalities . We can extend the definition syntactically by replacing each equality by a pair of inequalities and .
We will the use the notation to denote that the assertion that, there exists a degree sumofsquares proof of from the set of axioms . The superscript in the notation indicates that the sumofsquares proof is an identity of polynomials where is the formal variable. We will drop the subscript or superscript when it is clear from the context, and just write . Sumofsquares proofs can also be used to certify the infeasibility, or refute, the polynomial system. In particular, a degree sumofsquares refutation of a polynomial system is an identity of the form,
(1.1) 
where is at most .
The sumofsquares proof system has been an object of study starting with the work of Hilbert and Minkowski more than a century ago (see [Rez00] for a survey). With no restriction on degree, Stengle’s Positivestellensatz implies that sumofsquares proofs form a complete proof system, i.e., if the axioms imply , then there is an SoS proof of this fact.
The algorithmic implications of the sumofsquares proof system were first realized in the works of Parrilo [Par00] and Lasserre [Las01], who independently arrived at families of algorithms for polynomial optimization using semidefinite programming (SDP). Specifically, these works observed that semidefinite programming can be used to find a degree SoS proof in time , if there exists one. This family of algorithms (called a hierarchy, as we have an algorithm for each even integer degree) are referred to as the sumofsquares SDP hierarchy. We say that the SoS algorithm is lowdegree if does not grow with .
The SoS hierarchy has since emerged as a powerful tool for algorithm design. On the one hand, the first few levels of the SoS hierarchy systematically capture a vast majority of algorithms in combinatorial optimization and approximation algorithms developed over several decades. Furthermore, the lowdegree SoS SDP hierarchy holds the promise of yielding improved approximations to NPhard combinatorial optimization problems, approximations that would beat the longstanding and universal barrier posed by the notorious unique games conjecture
[Tre12, BS14].More recently, the lowdegree SoS SDP hierarchy has proved to be a very useful tool in designing algorithms for highdimensional estimation problems, wherein the inputs are drawn from a natural probability distribution. For this survey, we organize the recent work on this topic into three lines of work.

When the polynomial system for an estimation problem is feasible, can sumofsquares proofs be harnessed to retrieve the solution? The answer is yes for many estimation problems, including tensor decomposition, matrix and tensor completion, and clustering problems. Furthermore, there is a simple and unifying principle that underlies all of these applications. Specifically, the underlying principle asserts that if there is a lowdegree SoS proof that all solutions to the system are close to the hidden variable , then a lowdegree SoS SDP can be used to actually retrieve . We will discuss this broad principle and several of its implications in Section 2.

When the polynomial system is infeasible, what is the smallest degree at which it admits sumofsquares proof of infeasibility? The degree of the sumofsquares refutation is critical for the runtime of the SoS SDPbased algorithm. Recent work by Barak et al. [BHK16] introduces a technique referred to as “pseudocalibration” for proving lower bounds on the degree of SoS refutation, developed in the context of the work on clique. Section 3
is devoted to the heuristic technique of pseudocalibration, and the mystery surrounding its effectiveness.

Can the existence of degree of sumofsquare refutations be characterized in terms of (spectral) properties of the underlying distribution? In Section 4, we will discuss a result that shows a connection between the existence of lowdegree sumofsquares refutations and the spectra of certain lowdegree matrices associated with the distribution. This connection implies that under fairly mild conditions, the SoS SDP based algorithms are no more powerful than a much simpler and more lightweight class of algorithms referred to as spectral algorithms. Roughly speaking, a spectral algorithm proceeds by constructing a matrix out of the input instance
, and then using the eigenvalues of the matrix
to recover the desired outcome.
Notation.
For a positive integer , we use to denote the set . We sometimes use to denote the set of all subsets of of size , and to denote the set of all multisubsets of cardinality at most .
If and is a multiset, then we will use the shorthand to denote the monomial . We will also use to denote the vector containing all monomials in of degree at most (including the constant monomial ), where . Let denote the space of polynomials of degree at most in variables .
For a function , we will say if for some universal constant . We say that if .
If is a distribution over the probability space , then we use the notation for sampled according to . For an event , we will use as the indicator that occurs. We use to denote the ErdősRényi distribution with parameter , or the distribution over graphs where each edge is included independently with probability .
If is an matrix, we use to denote ’s largest eigenvalue. When , then denotes ’s trace. If is an matrix as well, then we use to denote the matrix inner product. We use to denote the Frobenius norm of , . For a subset , we will use to denote the indicator vector of in . We will also use to denote the all1’s vector.
For two matrices we use to denote both the Kronecker product of and , and the order tensor given by taking and reshaping it with modes for the rows and columns of and of . We also use to denote the th Kronecker power of . For an order tensor and for a permutation of , we denote by the matrix reshaping given by ordering the modes of so that index the rows and index the columns.
Pseudoexpectations.
For a polynomial system in variables consisting of inequalities , we can write an SDP of size which finds a degree sumofsquares refutation, if one exists (see [Rot13] for more discussion).
If there is no degree refutation, the dual semidefinite program computes in time a linear functional over degree polynomials which we term a pseudoexpectation. Formally, a degree pseudoexpectation is a linear functional over polynomials of degree at most with the properties that , for all and polynomials such that , and whenever .
Claim 1.4.
If there exists a degree pseudoexpectation for the polynomial system , then does not admit a degree refutation.
Proof.
Suppose admits a degree refutation. Applying the pseudoexpectation operator to the lefthandside of Eq. 1.1, we have . Applying to the righthandside of Eq. 1.1, the first summand must be nonnegative by definition of since it is a sum of squares, and the second summand is nonnegative, since we assumed that satisfies the constraints of . This yields a contradiction. ∎
The properties above imply that when , then if is a degree pseudoexpectation operator for the polynomial system defined by , as well. This implies that satisfies several useful inequalities; for example, the CauchySchwarz inequality.
Claim 1.5.
If is a degree pseudoexpectation and if are polynomials of degree at most , then .
Proof.
We have the following polynomial equality of degree at most :
Applying to both sides, using that , we have our conclusion. ∎
Other versions of the CauchySchwarz inequality can be shown to hold for pseudoexpectations as well; see e.g. [BBH12] for details.
2 Algorithms for highdimensional estimation
In this section, we prove a algorithmic metatheorem for highdimensional estimation that provides a unified perspective on the best known algorithms for a wide range of estimation problems. This unifying perspective allows us to obtain algorithms with significantly better guarantees than what’s known to be possible with other methods. We illustrate the power of this metatheorem by applying it to matrix and tensor completion, tensor decomposition, and clustering.
2.1 Algorithmic metatheorem for estimation
We consider the following general class of estimation problems, which will turn out to capture a plethora of interesting problems in a useful way: In this class, an estimation problem^{1}^{1}1 In contrast to the discussion of estimation problems in Section 1, for every parameter, we have a set of possible measurements as opposed to a distribution over measurements. We can model distributions over measurements in this way by considering a set of “typical measurements”. The viewpoint in terms of sets of possible measurements will correspond more closely to the kind of algorithms we consider. is specified by a set of pairs , where is called parameter and is called measurement. Nature chooses a pair , we are given the measurement and our goal is to (approximately) recover the parameter .
For example, we can encode compressed sensing with measurement matrix and sparsity bound by the following set of pairs,
Similarly, we can encode matrix completion with observed entries and rank bound by the set of pairs,
For both examples, the measurement was a simple (linear) function of the parameter. This is not always the case; consider for example the following clustering problem. There are two distinct centers , and we observe samples such that each sample is closer to either or . Then we can encode the problem of finding and as follows,
Identifiability.
In general, an estimation problem may be illposed in the sense that, even ignoring computational efficiency, it may not be possible to (approximately) recover the parameter for a measurement because we have for two farapart parameters and .
For a pair , we say that identifies exactly if for all . Similarly, we say that identifies up to error if for all . We say that is identifiable (up to error ) if every satisfies that identifies (up to error ).
For example, for compressed sensing , it is not difficult to see that every sparse vector is identifiable if every subset of at most columns of is linearly independent. For tensor decomposition, a sufficient condition under which the observation is enough to identify (up to a permutation of its columns) is if the columns of are linearly independent.
From identifiability proofs to efficient algorithms.
By itself, identifiability typically only implies that there exists an inefficient algorithm to recover a vector close to the parameter from the observation (e.g. by bruteforce search over the set of all ). But perhaps surprisingly, the notion of identifiability in a broader sense can also help us understand if there exists an efficient algorithm for this task. Concretely, if the proof of identifiability is captured by the sumofsquares proof system at low degree, then there exists an efficient algorithm to (approximately) recover from .
In order to formalize this phenomenon, let the set be be described by polynomial equations
where is a vectorvalued polynomial and are auxiliary variables.^{2}^{2}2 We allow auxiliary variables here because they might make it easier to describe the set . The algorithms we consider depend on the algebraic description of we choose and different descriptions can lead to different algorithmic guarantees. In general, it is not clear which description is best. However, typically, the more auxiliary variables the better. (In other words, is a projection of the variety given by the polynomials .) The following theorem shows that there is an efficient algorithm to (approximately) recover given if there exists a lowdegree proof of the fact that the equation implies that is (close to) .
Theorem 2.1 (Metatheorem for efficient estimation).
Let be a vectorvalued polynomial and let the triples satisfy . Suppose , where . Then, every degree pseudodistribution consistent with the constraints satisfies
Furthermore, for every , there exists a polynomialtime algorithm (with running time )^{3}^{3}3In order to be able to state running times in a simple way, we assume that the total bitcomplexity of and the vectorvalued polynomial (in the monomial basis) is bounded by a fixed polynomial in . that given a vectorvalued polynomial and a vector outputs a vector with the following guarantee: if with a proof of bitcomplexity at most , then .
Despite not being explicitly stated, the above theorem is the basis for many recent advances in algorithms for estimation problems through the sumofsquares method [BKS15, BKS14, HSS15, MSS16, BM16, PS17, KSS18, HL18].
Proof.
Let be a degree pseudodistribution with . Since degree sumofsquares proofs are sound for degree pseudodistributions, we have . In particular, . By Cauchy–Schwarz for pseudodistributions (Claim 1.5), every vector satisfies
By choosing , we obtain the desired conclusion about .
Given a measurement , the algorithm computes a degree pseudodistribution that satisfies up to error and outputs . We are guaranteed that such a pseudodistribution exists, e.g. the distribution that places all its probability mass on the vector . If the proof has bitcomplexity , it follows that satisfies up to error . In particular, . By the same argument as before, it follows that . ∎
2.2 Matrix and tensor completion
In matrix completion, we observe a few entries of a lowrank matrix and the goal is to fill in the missing entries. This problem is studied extensively both from practical and theoretical perspectives. One of its practical applications is in recommender systems, which was the basis of the famous Netflix Prize competition. Here, we may observe a few movie ratings for each user and the goal is to infer a user’s preferences for movies that the user hasn’t rated yet.
In terms of provable guarantees, the best known polynomial time algorithm for matrix completion is based on a semidefinite programming relaxation. Let be a rank matrix such that its left and right singular vectors are incoherent^{4}^{4}4 Random unit vectors satisfy this notion of incoherence for . In this sense, incoherent vectors behave similar to random vectors. , i.e., they satisfy and for all and . The algorithm observes the partial matrix that contains a random cardinality subset of the entries of . If , then with high probability over the choice of the algorithm recovers exactly [CR09, Gro11, Rec11, Che15]. This bound on is nearly optimal in that appears to be necessary because an by rank matrix has degrees of freedom (the entries of its singular vectors).
In this section, we will show how the above algorithm is captured by sumofsquares and, in particular, Theorem 2.1. We remark that this fact follows directly by inspecting the analysis of the original algorithm [CR09, Gro11, Rec11, Che15]. The advantage of sumofsquares here is twofold: First, it provides a unified perspective on algorithms for matrix completion and other estimation problems. Second, the sumofsquares approach for matrix completion extends in a natural way to tensor completion (in a way that the original approach for matrix completion does not).
Identifiability proof for matrix completion.
For the sake of clarity, we consider a simplified setup where the matrix is assumed to be a rank projector so that for
incoherent orthonormal vectors
. The following theorem shows that, with high probability over the choice of , the matrix is identified by the partial matrix . Furthermore, the proof of this fact is captured by sumofsquares. Together with Theorem 2.1, the following theorem implies that there exists a polynomialtime algorithm to recover from .Theorem 2.2 (implicit in [Cr09, Gro11, Rec11, Che15]).
Let be an dimensional projector and orthonormal with incoherence . Let be a random symmetric subset of size . Consider the system of polynomial equations in the by matrix variable ,
Suppose . Then, with high probability over the choice of ,
Proof.
The analyses of the aforementioned algorithm for matrix completion [CR09, Gro11, Rec11, Che15] show the following: let be the complement of in . Then if satisfies our incoherence assumptions, with high probability over the choice of , there exists^{5}^{5}5 Current proofs of the existence of this matrix proceed by an ingenious iterative construction of this matrix (alternatingly projecting to two affine subspaces). The analysis of this iterative construction is based on matrix concentration bounds. We refer to prior literature for details of this proof [Gro11, Rec11, Che15]. a symmetric matrix with and . As we will see, this matrix also implies that the above proof of identifiability exists.
Since and , we have
Since and contains the equation , we have . At the same time, we have
where the first step uses and the second step uses because and contains the equation . Combining the lower and upper bound on , we obtain
Together with the facts and , we obtain as desired. ∎
Identifiability proof for tensor completion.
Tensor completion is the analog of matrix completion for tensors. We observe a few of the entries of an unknown lowrank tensor and the goal is to fill in the missing entries. In terms of provable guarantees, the best known polynomialtime algorithms are based on sumofsquares, both for exact recovery [PS17] (of tensors with orthogonal lowrank decompositions) and approximate recovery [BM16] (of tensors with general lowrank decompositions).
Unlike for matrix completion, there appears to be a big gap between the number of observed entries required by efficient and inefficient algorithms. For 3tensors, all known efficient algorithms require observed entries (ignoring the dependence on incoherence) whereas informationtheoretically observed entries are enough. The gap for higherorder tensors becomes even larger. It is an interesting open question to close this gap or give formal evidence that the gap is inherent.
As for matrix completion, we consider the simplified setup that the unknown tensor has the form for incoherent, orthonormal vectors . The following theorem shows that with high probability, is identifiable from random entries of and this fact has a lowdegree sumofsquares proof.
Theorem 2.3 ([Ps17]).
Let be orthonormal vectors with incoherence and let be their 3tensor. Let be a random symmetric subset of size . Consider the system of polynomial equations in the by matrix variable with columns ,
Suppose . Then, with high probability over the choice of ,
Proof.
Let be the matrix with columns . Analogous to the proof for matrix completion, the heart of the proof is the existence of a 3tensor that satisfies the following properties: , , and
(2.1) 
These properties imply that are the unique global maximizers of the cubic polynomial over the unit sphere. (We remark that for matrix completion, the spectral properties of the matrix imply that the unique global optimizers of the quadratic polynomial are the unit vectors in the span of .)
The proof that this tensor exists follows the same approach as the proof of existence of the matrix for matrix completion in Theorem 2.2 and proceeds by an iterative construction [Rec11, Gro11]. The main difference is due to the fact that for we only need to ensure spectral properties, whereas for we need to ensure the existence of (higherdegree) sumofsquares proofs Eq. 2.1. We refer to previous literature for details of the proof that such exists with high probability over the choice of [PS17].
Similar to the proof for matrix completion, we have by the properties of that and . By Eq. 2.1 and linearity,
Because includes the equations and because the final term is a sum of squares, we conclude that for all and for all with . We also have the following claim:
Claim 2.4.
When are orthogonal and and , then
We give the (easy) proof of Claim 2.4 in Appendix A. Thus, from the orthonormality of the ,
Together with the facts and , we obtain as desired. ∎
2.3 Overcomplete tensor decomposition
Tensor decomposition refers to the following general class of estimation problems: Given (a noisy version of) a tensor of the form , the goal is to (approximately) recover one, most, or all of the component vectors . It turns out that under mild conditions on the components , the noise, and the tensor order , this estimation task is possible information theoretically. For example, generic components with are identified by their 3tensor [CO12] (up to a permutation of the components). Our concern will be what conditions on the components, the noise, and the tensor order allow us to efficiently recover the components.
Besides being significant in its own right, tensor decomposition is a surprisingly versatile and useful primitive to solve other estimation problems. Concrete examples of problems that can be reduced to tensor decomposition are latent Dirichlet allocation models, mixtures of Gaussians, independent component analysis, noisyor Bayes nets, and phylogenetic tree reconstruction
[LCC07, MR05, AFH12, HK13, BCMV14, BKS15, MSS16, AGMR17]. Through these reductions, better algorithms for tensor decomposition can lead to better algorithms for a large number of other estimation problems.Toward better understanding the capabilities of efficient algorithms for tensor decomposition, we focus in this section on the following more concrete version of the problem.
Problem 2.5 (Tensor decomposition, single component recovery, constant error).
Given an order tensor with component vectors , find a vector that is close^{6}^{6}6
This notion of closeness ignores the sign of the components. If the tensor order is odd, the sign can often be recovered as part of some postprocessing. If the tensor order is even, the sign of the components is not identified.
to one of the component vectors in the sense that .Algorithms for Problem 2.5 can often be used to solve a priori more difficult versions of the tensor decomposition that ask to recover most or all of the components or that require the error to be arbitrarily small.
A classical spectral algorithm attributed to Jennrich [Har70, LRA93] can solve Problem 2.5 for up to generic components if the tensor order is at least . (Concretely, the algorithm works for 3tensors with linearly independent components.) Essentially the same algorithm works up to generic^{7}^{7}7Here, the vectors are assumed to be linearly independent. components if the tensor order is at least . A more sophisticated algorithm [LCC07] solves Problem 2.5 for up to generic^{8}^{8}8Concretely, the vectors are assumed to be linearly independent. components if the tensor order is at least . However, these algorithms and their analyses break down if the tensor order is only 3 and the number of components is , even if the components are random vectors.
In this and the subsequent section, we will discuss a polynomialtime algorithm based on sumofsquares that goes beyond these limitations of previous approaches.
Theorem 2.6 ([Mss16] building on [Bks15, Gm15, Hsss16]).
There exists a polynomialtime algorithm to solve Problem 2.5 for tensor order 3 and components drawn uniformly at random from the unit sphere.
The strategy for this algorithm consists of two steps:

use sumofsquares in order to lift the given order3 tensor to a noisy version of the order6 tensor with the same components,

apply Jennrich’s classical algorithm to decompose this order6 tensor.
While Problem 2.5 falls outside of the scope of Theorem 2.1 (Metatheorem for efficient estimation) because the components are only identified up to permutation, the problem of lifting a 3tensor to a 6tensor with the same components is captured by Theorem 2.1. Concretely, we can formalize this lifting problem as the following set of parameter–measurement pairs,