1 Introduction
Decomposing a tensors into its components, and determining the number of those (= the rank) is a multidimensional generalization of the singular value decomposition and the matrix rank, and a reoccurring task in all practical sciences, appearing many times under different names; first discovered by Hitchcock [9] and then rediscovered under names such as PARAFAC [8] or CANDECOMP [4], it has been applied in many fields such as chemometrics, psychometrics, and signal processing [3, 16, 14]. An extensive survey of many applications can be found in [15, 6].
Recently, motivated by real world applications, orthogonality constraints on the decomposition have been studied in the literature, such as the orthogonal rank decomposition and the combinatorial orthogonal rank decomposition, which can be traced back to [7, 12], and the orthogonal decomposition in [13] and [10]
, the latter of which occurs for example in the identification of latent variable models from empirical moments, and several other statistical estimation tasks, see
[2] for a survey. The orthogonality constraints imposed in these two branches of literature are not the same, as [7, 12] imposes summandwise orthogonality, while in [13, 10, 2], factorwise orthogonality can be deduced from the model constraints. In [13], a Jacobilike and heuristic algorithm was described to obtain a close orthogonal decomposition via Jacobi angle optimization for general tensors; in
[2], the authors describe a second order fixed point method for obtaining the decomposition.In [11, 17], hierarchical tensor decomposition models are discussed in the context of latent tree graphical models, and algorithms for the identification of this decomposition are described. While this is not explicitly done in the language of orthogonal tensor decompositions, the idea of using flattenings is similar to the one presented, and, in the specific context of tree models, a specific instance orthogonal tensor decomposition, as described in [2].
In this paper, we study the orthogonal decomposition model, as it occurs in [10, 2], namely with factorwise orthogonality constraints. We show that this kind of decomposition can be directly transformed to a set of singular value decompositions, both theoretically and practically. We give identifiability results for this kind of orthogonal decomposition, showing that it is unique^{1}^{1}1up to natural symmetries in case of existence, and we provide algorithms to obtain the orthogonal decomposition, by reducing it to a sequence of singular value decompositions. We apply these algorithms to a latent variable identification problem which was discussed in [10, 2]
, reducing it to a series of eigenvalue problems. In particular, by performing the reduction to singular value decomposition, we show that all existing theory on the singular value decomposition, concerning theoretical issues as well as numerical and algorithmical ones, can be readily applied to the orthogonal decomposition problem.
2 Theoretical Background
2.1 Tensors
2.1.1 Definition of a Tensor
While tensors are common objects, their notation diverges throughout the literature. For ease of reading, we provide the basic definitions.
Definition 2.1 —
A real tensor of size and of degree is an element of the set
If we also write
2.1.2 Linear Transformation
Let us introduce a useful shorthand notation for linearly transforming tensors.
Definition 2.2 —
Let be a matrix. For a tensor , we denote by the application of to along all tensor dimensions, that is, the tensor defined as
Remark 2.3 —
For and , note that
2.1.3 Flattening
A flattening of a tensor is the tensor obtained from regarding different indices as one index.
Definition 2.4 —
Denote by }. A surjective map is called to flattening map.
Definition 2.5 —
Let be a tensor, and let be a to flattening map. Then, the flattening of is the degree tensor with defined as
Conversely, if , then we write and call the unflattening of .
Note that the indices of are, as defined, tuples of indices of ; however, this does not contradict the definition of tensor since can be bijectively mapped onto It is convenient to choose the lexicographical ordering for the bijection, but it is mathematically not necessary to fix any such bijection.
For unflattening, if only and are given, it is not clear what should be without further specification, since the same unflattening can arise from different tensors even if is fixed. Therefore, we will use it only in the context where a given flattening is being reversed, or partially reversed, therefore making the unflattening welldefined.
Example 2.6 —
Let be a tensor, let . The flattening of is a matrix . The columns of are all the subtensors of where second and third index are fixed. The columns of are indexed by the pairs , or, alternatively, by bijection, by the lexicographical index number . Taking any submatrix of , we can unflatten to obtain a tensor .
2.1.4 Outer Product
Furthermore, we introduce notation for creating tensors of higher order out of tensors of lower order:
Definition 2.7 —
Let . The outer product of the is the tensor defined by
In case that , we also write
Similarly, if and are tensors, the outer product of and is the tensor defined as
Outer products of several tensors by induction on , namely:
A useful calculation rule for linear transformation is the following:
Lemma 2.8.
Let and let . Then,
Similarly, if , then
Outer products are also compatible with flattenings:
Lemma 2.9.
Let and Let be a toflattening, let be the restriction of to , and let be the toflattening defined by . Then,
2.2 Orthogonality and Duality
We briefly review the notions of scalar product and some results, which can also be found in [12] in slightly different formulation and slightly less generality.
Definition 2.10 —
A scalar product is defined on by
As usual, are called orthogonal to each other if , and is called normal if . A set is called orthonormal if , where is the Kroneckerdelta.
By identification of with , where , the scalar product on tensors inherits all properties of the real scalar product.
Remark 2.11 —
It is seen by checking definitions that the scalar product on matrices is identical to the trace product, i.e., for .
An important property of the scalar product is compatibility with flattenings:
Lemma 2.12.
Let , let be a to flattening map. Then,
In particular, and are orthogonal to each other if and only if and are.
Proof.
A flattening is a bijection on the set of entries, therefore the result of the entrywise scalar product is not changed by flattening. ∎
Proposition 2.13.
Let , for . Then,
In particular, if there exists such that are orthogonal to each other, then the outer products and are orthogonal to each other.
Proof.
By performing induction on , it suffices to prove the statement for : Let and Then,
We proceed to prove this statement. Let be the toflattening, let be the toflattening. Let , and for . By Lemma 2.12, it holds that
Let be the toflattening defined by . Let . By Lemma 2.12, it holds that
By Lemma 2.9, it holds that
Using that scalar product on tensors is the trace product (see 2.11), we obtain
The cyclic property of the trace product for matrices yields
All equalities put together yield the claim. ∎
Corollary 2.14.
Let , and , such that . Then,
Definition 2.15 —
Let , let be a partition. A decomposition
with , and , such that the set of with fixed is orthonormal, is called rank orthogonal atomic decomposition of , with signature . If and , then the decomposition is called orthogonal CPdecomposition.
An orthogonal atomic decomposition does not need to exist necessarily. However, if it does, it is compatible with respect to flattenings, as Proposition 2.17 will show. We introduce notation for a more concise statement of the compatibility first:
Definition 2.16 —
Let be a partition of . We say a toflattening is compatible with the partition , if it holds that for some implies . We say that is strictly compatible with the partition , if it holds that for some if and only if .
Proposition 2.17.
Let . Let
be an orthogonal atomic decomposition with signature let be compatible with the signature. Then,
is an orthogonal atomic decomposition of . In particular, if is strictly compatible with the signature, then the decomposition is also an orthogonal CPdecomposition.
Proof.
This is a direct consequence of Lemma 2.12, checking compatibility of scalar product and orthogonality with the flattening at each of the sets of indices . ∎
2.3 Identifiability of the Orthogonal Atomic Decomposition
The orthogonal decomposition, as given in Definition 2.15, does not need to exist for a tensor, nor does it need to be unique. We will show that due to the compatibility with flattenings, if it exists, it is unique, if the rank is chosen minimal.
The main ingredient, besides flattenings, is uniqueness of singular value decomposition [18], a classical result, which we state in a convenient form:
Theorem 1.
Let , let . Then, there is a singular value decomposition (= orthogonal CPdecomposition)
such that the are orthonormal, and the are orthonormal. In particular, there is no singular value decomposition of rank strictly smaller than . Moreover, the singular value decomposition of is unique, up to:
 (a)

the sequence of summation, i.e., up to arbitrary permutation of the indices
 (b)

the choice of sign of , i.e., up to changing the sign in any two of for fixed
 (c)

unitary transformations of the span of or such that
Condition (c) includes (b) as a special case, and (c) can be removed as a condition if no two distinct have the same absolute value.
Theorem 2.
Let , and assume that has an orthogonal atomic decomposition
of signature , such that for all . Then:
 (i)

Denote for . Then, for all .
 (ii)

There is no orthogonal atomic decomposition of with signature , and of rank strictly smaller than .
 (iii)

The orthogonal atomic decomposition of of rank is unique, up to:
 (a)

the sequence of summation, i.e., up to arbitrary permutation of the indices
 (b)

the choice of sign of , i.e., up to changing the sign in any two of and the for fixed and arbitrary
 (c)

transformations of factors and their respective tensor products, such that , which induce unitary transformations in all flattenings compatible with the signature .
Condition (c) includes (b) as a special case, and (c) can be removed as a condition if no two distinct have the same absolute value.
Proof.
Fix some arbitrary . Consider the toflattening for , note that is compatible with the signature. Let , and . Note that is a matrix. Let
be the orthogonal atomic decomposition of , and let , and for all . Note that is an
vector, and
is an vector. By Proposition 2.17,is a singular value decomposition of .
(i) In particular, the are a system of orthonormal vectors in . Therefore, . Since was arbitrary, statement (i) follows.
(ii) Since the are nonzero, it holds that . Would there be an orthogonal atomic decomposition of with signature of rank strictly smaller than , there would be a singular value decomposition of of rank strictly smaller than , contradicting Proposition 2.17.
(iii) Observe that the flattening by induces a bijection between the orthogonal atomic decompositions of , of rank , and the singular value decompositions of , of rank . The statement in (iii) then follows directly from the uniqueness asserted in Proposition 2.17 for the singular value decomposition of .
∎
Again, we would like to stress that the present orthogonal decomposition model is different from the one in [12]; ours being factorwise orthogonal between different summands, while the orthogonal rank decomposition in [12] being summandwise orthogonal, and the combinatorial orthogonal rank decomposition enforcing orthogonality of factors in the same summand. Therefore, Theorem 2 does not contradict Lemma 3.5 in [12].
Another result which seems to be folklore, but not available in the literature, is that it is a strong restruction for a tensor to assume that it has an orthogonal decomposition. Since it is almost implied by the identifiability Theorem 2, we state a quantitative version of this:
Proposition 2.18.
The set of tensors , with , and for all , for which has an orthogonal CPdecomposition, is a Lebesgue zero set.
Proof.
The CPdecomposition can be viewed as an algebraic map
Since the left hand side is an irreducible variety, the image of the map also is. The orthogonal CPdecompositions form an algebraic subset of the left hand side. Therefore the state follows from the fact that
is not surjective. This follows from a degree of freedom resp. dimension count. One has
Theorem 2 (i) implies
An explicit computation shows that , which proves the statement.
The proof above can be rephrased in terms of the CPrank (see [5] for an introduction), can be obtained by observing that the generic CPrank of the tensors in questions must be strictly larger than , then proceeding again by arguing that the algebraic set of tensors with orthogonal CPdecompositions must be a proper subset of all tensors with that format, thus a Lebesgue zero set. ∎
Proposition 2.18 can be extended to orthogonal atomic decompositions with signature , by considering suitable unflattenings.
2.4 Tensors and Moments
We briefly show how tensors relate to moments of multivariate real random variables:
Definition 2.19 —
Let be a real dimensional random variable. Then, define:
where is a formal vector of variables. The th moment (or moment tensor) of , and the th cumulant (or cumulant tensor) of are defined^{2}^{2}2in case of convergence as the coefficients in the multivariate Taylor expansions
In the following, we will always assume that the moments and cumulants in question exist.
The moments and cumulants of a linearly transformed random variable are the multilinearly transformed moments.
Proposition 2.20.
Let be a real dimensional random variable and let Then,
Proof.
We prove the statement for cumulants, the proof for moments is completely analogous. For the cumulant generating functions of and of , it holds that
The last equality follows from the definition of . But by definition, it also holds that
therefore the statement follows from comparing coefficient tensors. ∎
3 Relation to Mixture Models
3.1 The Estimation Problem
Throughout the paper, we will consider the following independent rank mixture model:
Generative Model: are independent, valued random variables, with
, and probability/mass density functions
. Let be arbitrary such that , and let be the corresponding mixture of the . Assume that there are with , and random variables , such that . Assume that the are linearly independent, and for .Estimation Task: Given , or estimators thereof, determine/estimate and for .
3.2 Algebraic Formulation via Moments
The estimation problem presented above can be reformulated as a purely algebraic problem, see [2]. Namely, the are explicitly calculable in terms of the expectations of the and . Then, Proposition 2.20 implies that for all , therefore for all , thus yielding the following algebraic version of the estimation problem.
Algebraic Problem: Let , let be linearly independent and arbitrary such that . Given (exact or noisy estimators for)
determine the and .
4 Algorithms
4.1 Orthogonal Decomposition of Tensors
A special case of orthogonal decomposition is singular value decomposition (SVD). There are a huge amount of wellstudied methods for obtaining the singular value decomposition, which we will not discuss. However, we will make extensive use of the SVD algorithm, as described in Algorithm 1 as a black box.
First, for completeness, we treat the trivial case in Algorithm 2.
Now we explicitly describe how to compute the orthogonal decomposition if each summand has two tensor factors. Algorithm 3 computes the decomposition by a proper reformatting of the entries, computing the singular value decomposition, then reformatting again.
The algorithm for the general case, Algorithm 4, consists as well of repeated applications of reindexing and singular value decomposition. Variants of singular value decomposition exist with adjustable noise tolerance or singular value thresholding, and can therefore be employed to obtain thresholding and numerically stable variants of Algorithm 4. Furthermore, step 1 allows for an arbitrary choice of toflattening, in each recursion. Since in the presence of noise, the results might differ when taking a different sequence flattenings, the numerical stability can be improved by clustering the results of all possible choices, then averaging.
Termination of Algorithm 4 is implied by the observation that in each recursion, the partition of is made strictly finer. Since has finite cardinality, there is only a finite number of recursions. The fact that the decompositions in steps 6 and 7 have rank one, and coefficients , follows from the uniqueness of the orthogonal decomposition guaranteed in Theorem 2. Correctness of Algorithm 4 follows from repeated application of Proposition 2.17, and the uniqueness of singular value decomposition.
4.2 An Estimator for the Mixture Model
For illustrative purposes, we write out Algorithm 4 for the problem introduced in 3, which has also extensively been studied in [2]:
Example: Let , let be linearly independent and arbitrary such that . Given (exact or noisy estimators for)
determine the and .
Algorithm 5 solves the problem, by reducing it to
Theorem 4.3 in [2] implies that the tensor obtained in step 4 has an orthogonal CPdecomposition, and it implies the correctness of steps 8, and 12. The fact that in step 8 has rank one, and the coefficients are , follow from the uniqueness of the decomposition guaranteed in Theorem 2.
Note that explicit presentation of the algorithm could be substantially abbreviated by applying ODT directly to in step 4, with signature , and then performing the analogues of steps 8 and 12. Furthermore, the accuracy of the estimator in step 11 can be improved, by repeating the procedure for the three possible signatures and , then averaging, or weighted averaging, over the nine estimates for each , making use of the symmetry of the problem.
Also, similar to Algorithm 4, the presented Algorithm 5, while already numerically stable, can be modified to cope better with noise by, e.g., introducing thresholding to the singular value decomposition and rank computations. The numerical stability with respect to noise is governed by the numerical stability of the SVDs performed, and the pseudoinversion of in step 12.
5 Conclusion
We have demonstrated that computing the orthogonal decomposition of an arbitrary degree tensor, symmetric or not, can be reduced to a series of singular value decompositions, and we have described efficient algorithms to do so. This makes orthogonal tensor decomposition approachable by the wealth of theoretical results and existing methods for eigenvalue problems and singular value decomposition. Moreover, we have exemplified our method in the case of identifying components in a lowrank mixture model.
Acknowledgments
I thank Arthur Gretton, Zoltán Szabó, and Andreas Ziehe for interesting discussions. I thank the Mathematisches Forschungsinstitut Oberwolfach for support.
References
 Anandkumar et al. [2012a] Anima Anandkumar, Dean P. Foster, Daniel Hsu, Sham M. Kakade, and YiKai Lu. A spectral algorithm for latent dirichlet allocation. ArXiv eprint, 2012a.
 Anandkumar et al. [2012b] Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, and Matus Telgarsky. Tensor decomposition for learning latent variable models. ArXiv eprint, 2012b.
 Bro [1997] Rasmus Bro. PARAFAC. tutorial and applications. Chemometrics and Intelligent Laboratory Systems, 38(2):149 – 171, 1997.
 Carroll and Chang [1970] J. Douglas Carroll and JihJie Chang. Analysis of individual differences in multidimensional scaling via an nway generalization of ’eckart–young’ decomposition. Psychometrika, 35:283–319, 1970.
 Catalisano et al. [2002] Maria V. Catalisano, Anthony V. Germatia, and Allesandro Gimigliano. Rank of tensors, secant varieties of Segre varieties and fat points. Linear Algebra and its Applications, pages 263–285, 2002.
 De Lathauwer et al. [2008] Lieven De Lathauwer, Pierre Comon, and Nicola Mastronardi. Special issue on tensor decompositions and applications. SIAM Journal on Matrix Analysis and Applications, 30(3):.7–.7, September 2008. ISSN 08954798.
 Denis and Dhorne [1989] J. B. Denis and T. Dhorne. Orthogonal tensor decomposition of 3way tables. In R. Coppi and S. Bolasco, editors, Multiway data analysis., pages 31–37. Elsevier, Amsterdam, 1989.
 Harshman [1970] Richard A. Harshman. Foundations of the parafac procedure: Models and conditions for an "explanatory" multimodal factor analysis. UCLA Working Papers in Phonetics, 16(84):1–84, 1970.
 Hitchcock [1927] Frank L. Hitchcock. The expression of a tensor or a polyadic as a sum of products. Journal of Mathematics and Physics, 6:164–189, 1927.
 Hsu and Kakade [2012] Daniel Hsu and Sham M. Kakade. Learning mixtures of spherical Gaussians: Moment methods and spectral decompositions. ArXiv eprint, 2012.
 Ishteva et al. [2013] Maria Ishteva, Haeson Park, and Le Song. Unfolding latent tree structures using 4th order tensors. JMLR Workshop and Conference Proceedings, ICML 2013, pages 316–324, 2013.
 Kolda [2001] Tamara G. Kolda. Orthogonal tensor decompositions. SIAM Journal on Matrix Analysis and Applications, 23(1):243–255, 2001.
 Martin and Loan [2006] Carla D. Moravitz Martin and Charles F. Van Loan. A jacobitype method for computing orthogonal tensor decompositions. SIAM Journal on Matrix Analysis and Applications, pages 1219–1232, 2006.
 Nion and Sidiropoulos [2009] Dimitri Nion and Nikos D. Sidiropoulos. A PARAFACbased technique for detection and localization of multiple targets in a mimo radar system. In Proc. ICASSP ’09, pages 2077–2080, 2009.
 Sidiropoulos [2004] Nikos D. Sidiropoulos. Lowrank decomposition of multiway arrays: a signal processing perspective. In Sensor Array and Multichannel Signal Processing Workshop Proceedings, pages 52 – 58, 2004.
 Sidiropoulos et al. [2000] Nikos D. Sidiropoulos, Rasmus Bro, and Georgios B. Giannakis. Parallel factor analysis in sensor array processing. IEEE Transactions on Signal Processing, 48:2377–2388, August 2000.
 Song et al. [2013] Le Song, Maria Ishteva, Ankur Parikh, Eric Xing, and Haeson Park. Hierarchical tensor decomposition of latent tree graphical models. JMLR Workshop and Conference Proceedings, ICML 2013, pages 334–342, 2013.
 Young and Eckart [1936] G. Young and C. Eckart. The approximation of one matrix by another of lower rank. Psychometrika, 1:211–218, 1936.
Comments
There are no comments yet.