1 Introduction
Tensor completion of threeway arrays^{1}^{1}1 Semantics of the term “tensor” differs between research communities, as elucidated in Section 2 of [dSL08]. We will take “tensor” to be equivalent of “nway array”. had been used to model threeway interactions in many experimental fields, starting in the 1920s with the chemometrics and psychometrics communities. Kolda and Bader provide an extensive review of tensor factorization literature up to 2009 [KB09]. A shorter but more current review is given by Graesdyck et al. in [GKT13].
This work considers threeway interactions in a “Collaborative Filtering” (CF) context. In the classical CF problem, some quantity of interest (deterministic or stochastic) depends on two variables of large cardinality where and , which is naturally represented as a matrix. The matrix of known values is typically sparse, and the problem is to estimate the missing values, seeking the best approximation in the (Froebenius) norm. In the threeway case the quantity of interest depends on three variables and is represented as a cuboid tensor . See Figure 1 for an illustration and Section 2 for a more concrete example.
The two main tensor decompositions used are the CANDECOMP/PARAFAC (CP) model proposed by Hitchcock in 1927 [Hit27b, Hit27a], and the Tucker decomposition proposed by Tucker in 1963 [Tuc66, Tuc63, Tuc64]. In the CP model, a threeway array is approximated by a finite sum of rank1 tensors
(1) 
where , , are called “latent factor matrices”, and
For readers unfamiliar with machine learning terminology, we note that the name “latent factor” stems from an assumption that the data is generated from a fixed distribution governed by variables which are hidden (latent).
In the more general Tucker model the latent factor rows are multiplied by a “core tensor” of dimensions as
(2) 
The Tucker model is more expressive than the CP model, but its core tensor is typically dense, requiring parameters. It is also harder to interpret.
We note that the CP model has the property that its basic building block  the real triple product  does not distinguish between cases wherein the numerical values of the latent factors are permuted, for example between and (and similarly for other permutations). In other words, for the threeway interactions modeled by CF, a commutative building block is inherently less expressive than a noncommutative one. Thus, we speculate that threeway relations are better distinguished by a product of noncommuting latent factors than by the (commutative) real multiplication of the CP model. This intuition is expanded in Section 3.
Following this speculation, we propose a hybrid of the CP and the Tucker3 models which is pseudodiagonal (like the CP), but is built groundup from trilinear operations of NonCommuting Latent Factors (NCLF). The general form of the NCLF model is
(3) 
where the subscript “sym” denotes different permutation symmetries of latent factors, is a real trilinear mapping satisfying this symmetry mode, and is a real linear space to be determined.
A wellknown problem of unregularized CP models is that approximations of a certain rank may not exist, a situation commonly called “degeneracy”, see Section 3.3 of [KB09] and also [CLdA09]. De Silva and Lek Heng Lim show that such degeneracy can be generic, i.e., occurring at a non zeromeasure set of inputs [dSL08]. They also prove that degeneracy always cooccurs with the formation of collinear columns of the latent factor matrices, meaning that the set of vectors , where the colon sign denotes a running index, becomes linearly dependent, or almost so. This dependency manifests in very large columns which almost cancel each other. They also note that, while regularization removes nonexistence, proximity of the wellposed regularized problem to the illposed unregularized problem may still result in catastrophic illconditioning.
Much of the effort in lowerdimension tensor factorization have been directed into extending the Singular Value Decomposition (SVD), for example by applying orthogonality constraints on the columns of the latent factor matrices or of the core matrix of the Tucker decomposition  see a review in
[Kol01]. Orthogonality of matrixslices of the Tucker core tensor has been considered by L. de Lathauwer et al., who show that this model retains many properties of the original matrix SVD, therefore naming it the High Order SVD (HOSVD) [dLdMV00]. The core tensor, however, is still dense requiring parameters.When the dimension of the factors is small, orthogonality and collinearity of the latent matrix columns are mutually exclusive, and orthogonality removes degeneracy even for the CP model. For typical “big data” CF problems, however, dimensionality of each factor may be extremely large^{2}^{2}2 For example, each Yahoo user may receive her own latent row vector, and the number of such users is in the hundreds of millions. and so virtually all vector pairs are nearorthogonal. Nearorthogonality is therefore not useful in avoiding collinearity. We note that a standard CP expansion of a finiterank NCLF model will always have collinear parallel factors. Hence, some degenerate modes may be alleviated by the NCLF model. We leave the question of how much degeneracy is alleviated open^{3}^{3}3 Some examples wherein the CP model becomes degenerate are associated with differential operators, see [dSL08]. The NCLF model directly models CPdegenerate modes associated with firstorder finitedifference operators. Therefore, we speculate it removes degeneracy associated with firstorder differential operators, but not all the higherorder ones. .
In the completely different setting of particle physics, modeling threeway interactions (in threequark models) have been shown to be intrinsically related to noncommutativity of the underlying algebras. Kerner proposed using one such algebra in threecolor quark models [Ker10], and we shall use such ideas for the algebraic representation used by our model^{4}^{4}4For the reader unfamiliar with physics we note that the CF problems we consider are entirely different from quantum chromodynamics, so that we can propose much simpler models..
For the reader familiar with Geometric Algebra we add two notes, which other readers may safely ignore. First, we will use the two dimensional real representation of the Clifford Algebra , which in Physics is known as one of the flavors of a Majorana spinor. Second, some recent tensor factorization works use Grassman algebras to represent the completely antisymmetric components of the input [KB09, KSV]. In the third order case the standard triple product in , which is the approach we use for this component, is a Grassman Algebra.
The remainder of the paper is as follows. In Section 2 we formulate the specific CF problem we are interested in. In Section 3 we give the motivating intuitions of this work. Specifically, we conjecture that in order to distinguish between threeway relations by a single term, an algebraic representation must be noncommutative. Moreover, it must model, either implicitly or explicitly, different permutation symmetries of the latent factors. Following these intuitions, in Section 4 we construct the NCLF model, which we construct in several steps:

In Section 4.1 we recall the decomposition of a generic cubical tensor into its symmetrypreserving components. This decomposition is done via six linear operators.

In Section 4.2 we look for and find a noncommutative trilinear mapping on a twodimensional linear subspace of , which is the simplest such mapping we could devise. This mapping is the key component of our method, and will be used to construct five of the six symmetrypreserving components of the NCLF model. We denote this space by because it is the orthogonal complement of the representation of the Complex field in . The mapping is purely ternary, meaning that the space is closed under the trilinear operation, but not under the corresponding bilinear one. In other words, is a ternary algebra, not a standard (binary) algebra.

In Section 4.3 we approximate each of these components by its own trilinear mapping: the completely antisymmetric component is modeled by the standard tripleproduct in , and approximation of the other components are constructed by applying the symmetrizing operations on the mapping . We provide explicit expressions for each of the components.

Finally, in Section 4.4 we assemble the full approximation, and apply it to the general cuboid case.
In Section 5, we provide the results of numerical experiments on two publicly available datasets, the MovieLens movie rating dataset and the Fannie Mae Single Family Home Performance dataset. In both cases, the non commutative models outperform the standard CP model. We conclude and discuss future directions in Section 6.
2 A specific Threeway CF problem
The specific problem motivating this paper is that of predicting binary response via threeway CF in supervised learning. In this learning problem, the dependent variable is a Boolean event  like a purchase event, which we denote by
, and the independent variables belong to three classes of large cardinality, for example users, purchasable items and shopping venues, see Figure 1 on the right.The learning problem is therefore to estimate the probability of a purchase event for an (unseen) triplet , and . The value of for most of the triplets is unknown, making this a tensor completion problem.
We will use a Logistic Regression model, thereby estimating the logodds of this probability
(4a)  
or, equivalently,  
(4b) 
We will be using
(Tikhonov) regularized models and the logistic loss function, so that given a functional form
(like CP, or Tucker3) and data (known over a subset of the triplets ), training will consist of the solution of the minimization problem(5) 
where the last three terms are the regularization terms, and the parameter is the regularization parameter, to be chosen empirically via cross validation.
These four simplifying assumptions  of a supervised learning, binary response problem modeled by logistic regression with regularization  are applied in order to demonstrate the NCLF model on a concrete problem. Apriori, they only affect the numerical experiments in Section 5. We see no reason why the NCLF model should not apply to other threeway multilinear subspace learning problems.
3 The intuitive motivation
Let us look for the simplest extension to the trilinear CP model, which would still be be diagonal, but would provide a more expressive algebraic representation of a threeway relation between entities, for example between users, purchasable items and venues. Such a representation approximates how a threeway relation affects some measured quantity  for example the odds of a purchase event  which we take for simplicity to be real. Since we are estimating a real quantity, we consider real trilinear mappings.
Following intuitions from Physics [Ker10], we speculate that noncommutative parallel factors might be more expressive than commutative ones, i.e., that in reality a “green user, blue item red shop” combination is different than a “blue user, green item, red shop” combination, and will lead to a different propensity to purchase. Since the “colors” are arbitrary regions of the latent factor space corresponding to different coclusters, there is no reason, priory, to assume that a function representing the relation between parameter regions for shops, items and venues be commutative in the latent factors.
Hence, this article raises the following conjecture:
Conjecture 1
A trilinear tensor completion model which is built upon noncommutative parallel factors, i.e., that differentiates between different permutations of the same numerical values of its arguments, would in some way be “more realistic”  hence perform better than the standard CP model.
Conjecture 1 leads to two immediate outcomes. Firstly, the standard CP model is suboptimal  since its building block is the multiplication of real arguments and is inherently commutative. If a trilinear building block is to be used, the arguments must be of dimension two at least. Likewise, the next simplest extension which is the multiplications of complex arguments, cannot be used (at least naively), as it is commutative. Secondly, in order to differentiate between all different “color” permutations of three objects, there must be at least three “colors”. In other words, a single parallel factor must differentiate at least three coclusters of each class. Noncommutative threeway relations between coclusters must therefore involve, at the very least, a assignment  a mapping .
In the next Section we construct such a real trilinear approximation of threeway arrays in for . We shall later use this construction for a general tensor completion problem.
4 The Non Commutative Latent Factors (NCLF) method
4.1 Approximating a real array
component  

1  NA  
NA  
NA  NA  NA  NA  
NA  NA  NA  NA  
NA  NA  NA  NA  
NA  NA  NA  NA 
We recall that, given a threedimensional cuboid array of real numbers , it may be decomposed to six components according to their permutation symmetry properties. There are several options for doing this, and the decomposition we choose is
(6) 
Eq (6) is a list of linear combinations of and its index permutations. We note that the linear mapping (6) is invertible and wellconditioned.
The symmetry properties of the six components are given in Table 1. The first two components and
are eigenvectors of all the permutation symmetries  the first being symmetric under all permutations while the second being symmetric under cyclic (even) permutations and antisymmetric under acyclic (odd) ones. The next four components are eigenvectors of only a single permutation symmetry each, but all satisfy a Jacobilike identity:
(7) 
We use the images of these operators to define three linear subspaces of . The first two are the images of the totally symmetric and totally antisymmetric operators and . The third subspace is the sum of the images of the last four operators, which is also equal to the kernel of the Jacobi identity . Direct calculation gives that, taken as subspaces of with the Euclidean inner product associated with the Froebenius norm, the three spaces are pairwise orthogonal and span the full space, hence .
Next, we construct diagonal trilinear approximations of for each of these six components, which satisfy the relevant symmetries. The second component is approximated using the standard totally antisymmetric form, or standard triple product in , which is equal to , with threedimensional latent factors . In the next two Sections, we approximate the other five components using a twostep process:

In Section 4.2 we define a trilinear noncommutative mapping, which we shall denote by , over a twodimensional subspace of . As it is two dimensional, it is hard to think of a simpler such mapping.
In Section 5 we provide numerical indications that each of these two steps improves the overall approximation of the chosen datasets.
4.2 The space and operation
Let us look for the simplest “atom” for the Jacobi components  that is the simplest possible space supporting a noncommutative trilinear product. This space is the key component of our mathematical model. We note that the complex version of this space has been used in computational Physics of threecolor quantum models [Ker10].
A trilinear operation with one dimensional real arguments must be commutative, and so such a space must have at least two dimensional arguments. Noncommutativity and trilinearity leads us towards matrix multiplication as a representation.
Before we continue, let us recall two basic facts on the space of real matrices
. First, it is spanned by the identity matrix and the three Pauli spin matrices:
which are mutually orthogonal in the inner product associated with the Froebenius norm. In other words they are an orthogonal basis of . Second, the space of complex numbers is isomorphic, using the CayleyDickson construction, to the space of antisymmetric real matrices of the form
with matrix multiplication corresponding to the product of complex numbers. In this subspace of , matrix multiplication is commutative.
With these facts in mind, we therefore turn to the orthogonal complement of to look for noncommutative trilinear operations. From the fact that is an orthogonal basis it immediately follows that is the span of :
(8) 
It is also the space of traceless symmetric real matrices.
Additionally, for each ordered triplet , setting
(9) 
and similarly for , direct calculation shows that is closed under a triple matrix product:
Hence, the mapping
(10) 
is a well defined real trilinear operation. Considering commutativity, the product is symmetric with respect to exchange of the first and third parameters, but not to a permutation which changes the second argument^{5}^{5}5 Indeed, the algebra is defined as the two dimensional Clifford Algebra having one symmetric and one antisymmetric index.
(11) 
We note that is not closed under the standard (binary) matrix multiplication  for we have , not . Therefore, is not a group under matrix multiplication, and is hence not an algebra, but rather a ternary algebra. Similarly to the standard triple product in , the pair is a purely thirdorder construct.
4.3 Approximating the five components
Here, we approximate the symmetric and Jacobi components of , which are and , using linear combinations of the form on . Specifically, if the latent factor corresponding to item is
and similarly for and , we apply the symmetrizing operators of eq. (6) on to obtain these operators as explicit cubic polynomials of the coefficients. For example, the totally symmetric component is
(12a)  
and similarly  
(12b)  
(12c)  
(12d)  
(12e)  
Importantly, the symmetry (11) of implies that the completely antisymmetric combination vanishes
This is reassuring, as the Jacobi and symmetric components are orthogonal to the antisymmetric component.
4.4 The general cuboid case
The previous subsections dealt with a cubical array in . We shall reuse the same model in the general cuboid case as is, without any formal justification. The intuition behind this is that the previous derivation applies to modeling the relations of coclusters (aka “colors”), which can be cubical even if the approximated tensor is a cuboid. The ultimate judge is, of course, empirical evidence.
Therefore, combining the results of this Section, given a threedimensional (cuboid) array of real numbers , we approximate it as
(13)  
where are corresponding bias terms, the operators are as defined in (12), is the standard triple product in and the quantities , which generalize singular values, imply summation over the and components.
Equation (13) is the concrete, explicit model of the general form (3), and is the key result of this paper. Note that this approximation is as close to diagonal as possible, while still being noncommutative, i.e., while differentiating between different permutations of the latent factors, as required by Conjecture 1.
5 Numerical Experiments
Here we present the results of numerical experiments for two public datasets  the MovieLens Dataset [Gro14] and the Fannie Mae SingleFamily Loan Performance dataset [Mae14]. The goal of experiments was a comparison of the expansion (13) with the standard CP model, rather than obtaining the optimal model for each Dataset. In both cases we used a binary response variable and a logisticregression model, so that the probability of a positive event is modeled by (4) and training consists of solving the minimization problem (5), see Section 2.
5.1 Benchmark Approximations
Five benchmark approximations of the logodds were compared:

A biasonly method, which is equivalent to a Naive Bayes approximation:
The total logodds bias and the relative biases for each entity of factor were estimated as empirical logodds
(14) where are the total counts of positive and negative events for the training set and are the same counts for each entity .

The standard CP approximation (1) with a latent dimension equal to that of the NCLF method .

The standard CP with the best latent dimensions for both the MovieLens and Fannie Mae datasets. The best dimensions were chosen via ninefold crossvalidation.

In order to test the utility of the derivation of Section 4.3, i.e., of using the separate approximations (12) for each of the five components and , we also benchmark a “primitive” NCLF approximation given by
(15) This approximation explicitly models only the totallyantisymmetric component , while using the primitive operation instead of modeling each of the five components and . We recall that has partial symmetry (11). This implies that the partiallyantisymmetric components are not approximated by (15), while the rest of the components are.

The proposed NCLF method, wherein is given by (13), and each of the components has a single latent factor .
Models were trained using the Stochastic Gradient Descent method (SGD) of the momentum variant, with decreasing timesteps. In all the approximations 15, the bias terms were taken to be identical. Specifically, they were not trained by SGD but rather chosen, before the SGD simulations, by (
14). The parallel factors were regularized using the norm, using ninefold crossvalidation to pick the regularization parameter, and fold crossvalidation to measure performance of the best configuration.5.2 The Datasets
The MovieLens Dataset [Gro14] contains a million userratings of movies on a scale of one to five. Ratings of and were considered to be positive events, and lower ratings as negative events. Overall, negative and positive rating events were considered. The three factors we consider are those of item, user and hour of week (totaling bins).
The Fannie Mae SingleFamily Loan Performance dataset [Mae14] is a publicly available dataset which, at the time of submission, holds fixed rate prime mortgage acquisition and performance data, at monthly resolution, for the period from January 1999 till June 2013, including. Only firsttime home buyers whose loan purchase was buying or undefined were considered. The three factors chosen where creditscore, property location denoted by property state and 3digit zip code, and origination month. We chose not to group or smooth different values of credit scores or time periods longer than a month, so as not to make the prediction problem easier. A mortgage was considered to have defaulted if delinquent more than 150 days over the full period. Nondefault events were uniformly downsampled. Overall, nondefault and default acquisition events were considered.
5.3 Results
MovieLens  

Method  AUC  AUC  L1  L1  L2  L2  
1  Bias only  
2  CP,  
3  best CP,  
4  primitive NCLF  
5  NCLF  
NCLFbest CP 
give sample standard errors, multiplied by
. The last row gives the absolute difference of the CP with the best rank to the NCLF.Fannie Mae  

Method  AUC  AUC  L1  L1  L2  L2  
1  Bias only  
2  CP,  
3  best CP,  
4  primitive NCLF  
5  NCLF  
NCLFbest CP 
Crossvalidation performance of the five approximations of 5.1 applied to the MovieLens dataset is given in Table 2, and their performance over the Fannie Mae dataset is given in Table 3. In both cases, we see that the NCLF models considerably outperforms the standard CP model of the same latent dimension , and significantly outperforms CP models of lower dimensions, as measured by all metrics: AUC, error and error.
The numerical experiments therefore strongly corroborate Conjecture 1, at least for these datasets and with the SGD numerical method  under these assumptions, noncommutative latent factors outperform the standard CP.
6 Discussion and Future Directions
In this study, we develop a novel tensorcompletion method for threeway arrays, which is both diagonal and built upon noncommutative latent factors. In order to do this, we apply symmetrizing operations on the simplest noncommutative purely trilinear operation we could find  that of threematrix product on a twodimensional space. We test our model and numerical method on a binaryresponse supervisedlearning problem from two publiclyavailable datasets, finding that it outperforms the CP model.
The specific application we are interested in is modeling sparse, largescale threeway relations in the supervisedlearning setting, i.e., in threeway CF problems. However, we find no apriori reason that this model may not be extended to a broader setting. Some future avenues for research include:

Unsupervised learning: An interesting question is if and how much a noncommutative model may be used to discover noncommutative patterns in threewayrelation data. The intuitions leading to its development in Section 3 should still apply.

(Dense) Tensor Factorization: A possible future direction may be the analysis of this model in the context of tensorfactorization  i.e., of approximation a full tensor with no missing values. We note that in this setting there are Fourierbased generalizations of the SVD [KM11] in addition to the HOSVD of Delathauwer et al., and a comparison of the three options may be interesting.

Extension to Quaternions: The space is in fact a twodimensional subspace of the ring of quaternions. One may consider applying the symmetrizing operators (6) on threequaternion products instead of on  in fact, this was the original direction of this work. The resulting approximation might be more expressive than NCLF, but have a double latent dimension, and so be more likely to overfit. Nevertheless, in a world where the volume of data keeps increasing, such an extension might some day prove superior.
In summary, Non Commuting Latent Factors present a simple, scalable extension of the CP model which outperforms it on the two datasets tried.
References
 [CLdA09] P. Comon, X. Luciani, and A. L. F. de Almeida. Tensor decompositions, alternating least squares and other tales. Journal of Chemometrics, 3:393–405, Aug. 2009.
 [dLdMV00] L. de Lathauwer, B. de Moor, and J. Vandewalle. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl., 21:1253–1278, 2000.
 [dSL08] V. de Silva and L.H. Lim. Tensor rank and the illposedness of the best lowrank approximation problem. SIAM J. Matrix Anal. Appl., 30(3):1084–1127, 2008.
 [GKT13] L. Grasedyck, Daniel Kressner, and Christine Tobler. A literature survey of lowrank tensor approximation techniques. (1302.7121), 2013.
 [Gro14] GroupLens. MovieLens dataset, 2014.
 [Hit27a] F. L. Hitchcock. The expression of a tensor or a polyadic as a sum of products. J. Math. Phys, 6(1):164–189, 1927.
 [Hit27b] F. L. Hitchcock. Multiple invariants and generalized rank of a pway matrix or tensor. J. Math. Phys, 7(1):39–79, 1927.
 [KB09] T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455–500, 2009.
 [Ker10] Richard Kerner. Cubic and ternary algebras, ternary symmetries, and the lorentz group. In RIMS Kokyuroku published in 2014, No. 18721920, volume 1705, pages 134–146, 2010.
 [KM11] Misha E. Kilmer and Carla D. Martin. Factorization strategies for thirdorder tensors. Linear Algebra Appl., 435(3):641–658, 2011.
 [Kol01] Tamara G Kolda. Orthogonal tensor decompositions. SIAM J. Matrix Anal. Appl., 23(1):243–255, 2001.
 [KSV] Daniel Kressner, Michael Steinlechner, and Bart Vandereycken. Lowrank tensor completion by riemannian optimization. BIT Numerical Mathematics, pages 1–22.
 [Mae14] Fannie Mae. Fannie mae singlefamily loan performance data, 2014.
 [Tuc63] L.R. Tucker. Implications of factor analysis of threeway matrices for measurement of change. Problems in measuring change, pages 122–137, 1963.
 [Tuc64] L. R. Tucker. The extension of factor analysis to threedimensional matrices. Contributions to mathematical psychology, pages 109–127, 1964.
 [Tuc66] L. R. Tucker. Some mathematical notes on threemode factor analysis. Psychometrika, 31:279–311, 1966.
Comments
There are no comments yet.