Matrix decompositions are critically enabling algorithms for scientific computing and data analysis applications across every field of the engineering, social, biological, and physical sciences. Of particular importance is the singular value decomposition (SVD), which provides a principled method for dimensionality reduction and computation of interpretable subspaces within which the data reside. So widespread is the usage of the algorithm, and minor modifications thereof, that it has generated a myriad of names across various communities, including Principal Component Analysis (PCA), the Karhunen-Loève (KL) decomposition, Hotelling transform [2, 3], Empirical Orthogonal Functions (EOFs)  and Proper Orthogonal Decomposition (POD) [5, 6]. However, in order to use the SVD, data, which generally may be of distinct dimensions, must be flattened into a matrix form, potentially compromising the statistical accuracy of the subspaces computed. Tensor decompositions are a generalization of the SVD concept to higher dimensions, allowing for -way arrays () of data to be decomposed into their constitutive, low-rank subspaces without flattening, which is especially advantageous for categorical data types. It is often the case that one of the dimensions considered in the tensor is the time variable. In this paper, we develop a version of a tensor decomposition algorithm that restricts the time dynamics to analytically tractable solutions sparsely selected from a large, over-complete library of candidate functions. In so doing, we provide a more interpretable framework for the tensor modes in the decomposition process and analytic expressions for their associated time dynamics.
With the rise of data science and data-driven discovery, tensor decompositions are of increasing value and importance for characterizing underlying structure and dimensionality of data8, 9]. In applications, one of the important dimensions of the data set is a time variable which measures how the other quantities of interest evolve over a prescribed time course. A tensor decomposition produces the low-rank time variable evolution. However, the low-rank time modes often are complicated and noisy due to the form of the data itself. In contrast, we often expect simple and highly structured temporal signatures, whether it be oscillations of a prescribed frequency or exponential growth/decay of a signal, for instance. The natural remedy is to constrain the form of the temporal modes extracted from the tensor decomposition. By specifying an over-complete library of temporal functions, we are able to extract analytic forms for the best fit temporal evolution of the data. The appropriate time behavior in our over-complete library is selected through sparse regression techniques so as to select a minimal, but most informative, set of time dynamics. This is highly advantageous for characterizing the structure of the data and for data-driven discovery of underlying processes responsible for producing the dynamics observed.
Unlike the SVD for matrices, tensor decompositions are not unique, and there are a variety of decompositions that can be applied to -way arrays () of data. We consider the CANDECOMP/PARAFAC (CP) decomposition [10, 11] illustrated in Fig. 1, which arranges -rank data into a series of outer products of vectors. There are other decompositions available, including the Tucker tensor decomposition  and the recently developed tensor-based method  for Dynamic Mode Decomposition (DMD) . The former method is widely used in the tensor community while the latter method provides a regression that enforces Fourier mode behavior in the time mode. All three methods fall short of our primary goal, which is to provide a tensor decomposition with analytically tractable time dynamics capable of modeling transient phenomena. The DMD algorithm solves the first part of this objective but fails in modeling transient phenomena. Although a multi-resolution DMD method has been proposed to handle transients , it has a multiple pass architecture that sometimes struggles to extract spatio-temporal structures in a completely unsupervised manner. In this work, we demonstrate that the CP tensor decomposition can be modified to constrain the time dynamic mode to a broad range of analytic solutions that are selected from a large and over-complete library of candidate functions. By using sparse regression techniques, the best candidate functions are selected for representing the temporal dynamics. We call this technique Shape Constrained Tensor Decomposition (SCTD). The clear advantage of the SCTD over standard CP decompositions is that it gives analytic results which are readily interpretable. We demonstrate the method on a number of examples, including high-dimensional data generated from Houston crime data and global temperature measurements.
The rest of the paper is organized as follows: Sec. II develops the basic mathematical architecture of the CP tensor decomposition and our refinement, the SCTD algorithm. This is followed by Sec. III in which we discuss practical details such as how to select tuning parameters and how to construct an appropriate over-complete library. Sec. IV tests the algorithm on simulated data, and Sec. V provides examples demonstrating the effectiveness of the algorithm on real-world data sets. Conclusions and an outlook for the SCTD algorithm are discussed in Sec. VI. For full details, all MATLAB and R codes used for this paper are available online at github.com/BethanyL/SCTD.
In this manuscript, we present a number of modifications to the standard CP tensor decomposition that are intuitively appealing and improve interpretability of the low-rank modes extracted from data. Specifically, we introduce an over-complete library of temporal responses that constrains the time mode dynamics. A sparsity-promoting algorithm further selects a small number of these modes to represent the data. Thus the procedure can be thought of as a sparsity-promoting, constrained optimization problem.
The SCTD method is illustrated at a high level in Figs. 2 and 3. Fig. 2 shows the selection process whereby a small number of modes from an over-complete library of temporal functions are selected to best represent the temporal evolution of the data. We rely on an optimization procedure so as to obtain a sparse representation of the temporal dynamics. The algorithm thus restricts the temporal mode in the decomposition. While the temporal functions populating the library may be arbitrary, the power of the SCTD relies on the fact that temporal dynamics are typically far from arbitrary. Fig. 3 illustrates some temporal functions that serve as prototypes that characterize real-world temporal dynamics. Note that some of these functions are ideally suited for handling transient dynamics. Indeed, the success of the method is directly related to the temporal library functions included in the regression procedure.
A specific demonstration of the SCTD is shown in Fig. 4. In this example, three different spatial mode structures are combined with three specific time dynamics. The imposed time dynamics are representative of simple functional forms that are often difficult for the standard CP or DMD methods to model or resolve, i.e. temporal responses that have finite time windows of activity. In this example, the sequence of data snapshots are gathered into a 3-way data tensor . Different snapshots of the dynamics depict the spatial structure arising from the combination of the different modal structures. The objective of the SCTD is to solve the inverse problem: Given the data tensor , find the low-rank decomposition that correctly reconstructs the spatial modes and their time dynamics. The algorithm proposed here, which is based on the CP decomposition, can indeed recover the three modes and their time dynamics as shown in Fig. 4.
In the subsections that follow, the technical details of the CP tensor decomposition algorithm are considered along with strategies for building an over-complete library and enforcing a parsimonious combination of temporal dynamics prototypes.
Ii-a CP Tensor Decompositions
We begin by first reviewing some useful notation. We denote the th column of a matrix by . Given matrices and , their Khatri-Rao product is denoted by and is defined to be the matrix of column-wise Kronecker products, namely
For an -way tensor of size , we denote its entry by . The inner product between two -way tensors and of compatible dimensions is given by
The Frobenius norm of a tensor , denoted by , is the square root of the inner product of with itself, namely . Finally, the mode- matricization or unfolding of a tensor is denoted by .
where represents outer product and represents the th column of the factor matrix of size . We refer to each summand as a component. Assuming each factor matrix has been column-normalized to have unit Euclidean length, we refer to the ’s as weights. We will use the shorthand notation , where . A tensor that has a CP decomposition is sometimes referred to as a Kruskal tensor.
For the rest of this article, we consider a -way tensor where two modes index state variation and the third mode indexes time variation.
Let and denote the factor matrices corresponding to the two state modes and denote the factor matrix corresponding to the time mode. This -way decomposition is illustrated in Fig. 1.
Ii-B Sparse Representations in Over-Complete Libraries
We can further impose structure on the factors of the low-rank decomposition. For example, we could impose sparsity and smoothness on factors [16, 17]. Here we assume that the time mode can be coded as a sparse linear combination from a known over-complete library , namely
where the elements of are predominantly zero, i.e. , and . This set up can be thought of as a sparse version of CANDELINC (canonical decomposition with linear constraints) . Figs. 2 and 3 show both the constrained decomposition and some example library functions used.
We seek the CP model that maximizes a penalized correlation with the data tensor
The matrix norm is the sum of all the absolute values of and not the induced matrix 1-norm. Thus, the non-negative parameter trades off the degree of correlation of the model to the data and the sparsity level in the loadings . The inequality constraints are added to ensure that the feasible set of the optimization problem is compact. The Bolzano-Weierstrass theorem ensures that a solution to the problem exists. Without these constraints, the optimization problem is not well posed as there is no global maximum.
If we add the additional constraint for some constant , then maximizing the penalized correlation is equivalent to minimizing the penalized squared error
where . Thus, we see that the optimization problem given in (2) is closely related to a special case of the sparse coding problem where we seek to learn sparse coefficients as well as a dictionary matrix which must obey rather strong structural constraints.
We now describe an algorithm for computing our structured low-rank approximation . Note that the CP constraint renders the optimization problem in (2) non-convex. Note, however, that if we fix all but one of the block variables or , the optimization problem involves a straightforward concave optimization whose solutions can be written in closed form. Thus, we propose a block coordinate ascent (BCA) algorithm.
One complication of adopting a BCA algorithm is that the updates for and each separate into identical optimization problems. To be explicit, consider the problem of updating when and are fixed.
Consequently, solving for the entire factor matrix at once will yield identical columns, namely . To deal with this degeneracy, we construct via deflation. The idea is to find the most correlated rank-1 tensor and then subtract it from the data tensor. We then repeat the procedure on the residual tensor.
We are now ready to summarize at a high level the BCA algorithm. Suppose we have completed rounds so far and let denote the residual, namely . At the th round, we solve the following optimization problem.
We again solve the above maximization problem with block coordinate ascent. Once BCA has converged, we determine by solving the problem:
The solution to the above scalar optimization problem is given by
We now detail the updates for the factors and .
where . The update simply requires normalizing .
where . The update simply requires normalizing .
The derivation of this update rule is given in the Appendix. We then update , followed by calculating the next residual .
Iii Details of SCTD Algorithm
Now that we are familiar with the basic procedure, we next discuss important details in the SCTD. We first describe how the sparsity-inducing parameter is chosen. We then describe how the over-complete library is constructed.
Iii-a Picking The Regularization Parameter
At each iteration, we use the Bayesian Information Criterion (BIC)  to pick regularization parameter . The BIC is a quantitative score that balances how well the model fits the data against how complicated the model is. In the context of the SCTD, a constrained rank-1 Kruskal tensor with low BIC corresponds to a rank-1 Kruskal tensor which fits the data well in light of how many free parameters were used in fitting it. As defined in , for this problem, the BIC criterion is
where is the number of non-zero elements of . This can be derived from each update being an -norm penalized regularization problem.
We use the BIC criterion to pick the best from a range of options. We further refine the value of by checking the neighborhood of the current best option until the neighborhood is sufficiently small or the BIC curve is sufficiently constant on that neighborhood. An upper limit of is the point at which all entries of are zero. Since we accumulate small amounts of error on each iteration, we wish to encourage increasing levels of sparsity as increases. We thus use as a lower bound for , unless is greater than the current upper bound.
Iii-B Constructing the Over-Complete Library
We can choose an over-complete library based on knowledge of the application area or data set. Some natural candidates are displayed in Fig. 3
. If we expect periodic but transient dynamics, we may choose to populate the library with windowed sines and cosines, varying the frequencies of the sines and cosines and the widths and shifts of the windows. If we anticipate transient phenomena that are non-periodic, it may be appropriate to include Gaussians with a range of means and variances. If the time domain itself is periodic, such as hour of the day, then we might improve the results by including dynamics that have this period. For example, to allow a Gaussian-like mode to vary smoothly through the night, we could generate cosines with varying frequencies and shifts, but only include one period of the cosine (see “wrapped cosines” in Fig.3).
Iv Simulation Experiments
We begin by testing the SCTD on a simulated data set similar to Fig. 4. Recall that this data set is composed of three spatio-temporal modes (specifically, it is a Kruskal tensor with rank three). We can think of this data set as a video or sequence of frames. Our goal is to decompose it into three modes (a rank-three Kruskal tensor) with an analytical description for the temporal dimension. Although the SCTD is exceptional for the data in Fig. 4, the example is limited since no noise was included in the data.
We next consider a more realistic example shown in Fig. 5
. In this experiment, we added white Gaussian noise with standard deviationin the frequency domain to the data . In this case, we used , resulting in a signal-to-noise ratio of 0.1374, where signal-to-noise ratio is defined as the ratio of the summed squared magnitude of the signal to the summed squared magnitude of the noise. The algorithm outlined in Sec. II can now be applied to the data and a direct comparison can be made to a CP decomposition and a DMD reduction. In particular, for the CP decomposition, we use the CP_ALS function in the Matlab Tensor Toolbox , , which uses an alternating least squares algorithm. Fig. 5 shows that despite the inclusion of noise, the modes and temporal dynamics can be cleanly extracted using the SCTD. Indeed, analytic forms for the time dynamics can be discovered. In comparison, the CP algorithm gives a decomposition with noisy time modes which lack analytic description. The DMD algorithm (using data flattening) can give analytic expressions for the time dynamics, but the temporal expressions are significantly flawed due to the fact that DMD cannot handle such transient and/or intermittent time dynamics, i.e. only time dynamics of the form are allowed.
The SCTD also provides a diagnostic for performing an -rank truncation. For an SVD decomposition, the singular values provide the requisite metric for truncation. Similarly, Fig. 6 shows the decay of reconstruction error as a function of the number of tensor modes. We can choose the rank, or number of components to keep in the SCTD, by considering the trade-off between error and complexity. Here we see diminishing returns in reconstruction error after the inclusion of the first three components, suggesting that a rank-3 approximation sufficiently captures the majority of the systematic variation in the data.
To further explore the example shown in Fig. 5, we consider a number of different cases which highlight the use of the algorithm and the choice of library prototypes. Thus we consider the following:
Case (a): Library contains true modes. We start with an easy case. We construct a library with 3000 prototypes, including the true temporal modes, and we do not add noise to the data. We see in Fig. 7 and Table I that in two iterations, the SCTD picks exactly one prototype, and in one iteration, the SCTD picks 299 from the 3000 and accumulates a small amount of error.
Case (b): Library does not contain true modes. Next, we want to assess how robust the SCTD is to “model misspecification”: We construct another library of 3000 prototypes but do not “cheat” by including the true time dynamics. As we can see in Fig. 7 and Tab. I, in this experiment, the method uses extra prototypes (about of the library) and accumulates more error. However, the factor accuracy only reduces from 0.989 to 0.948.
Case (c): Library does not contain true modes and the data is noisy. Finally, we increase the difficulty by adding white Gaussian noise to the data (). The results are very similar to Case (b) without noise (see Fig. 7 and Tab. I). Note that although the resulting analytic expression of hundreds of modes is not simple, if you want a simple analytical expression, you can pick the mode with the largest coefficient and still maintain accuracy. The top mode is plotted in green on top of the linear combination (blue) and the true mode (black) in Fig. 7.
|case||# prototypes||relative||factor||frequency of|
Next, we consider Case (c) in more detail. So far, we have seen two examples of this case. In Fig. 5
, the library contains 40,000 prototypes and the white noise has standard deviation. In Fig. 7 (and Tab. I), the library contains 3,000 prototypes and the white noise has standard deviation . We now consider the effect of (Fig. 8) and the effect of the library size (Fig. 9).
In Fig. 8, we fix the library size to 50,000 and vary . As the magnitude of the noise increases, so does the error. However, this growth in error is slow when the error is measured against the original (noiseless) data. For context, see Fig. 5 for a visualization of data with .
In Fig. 9, we consider data that has no noise and vary the library size. Once the library is sufficiently large, the relative error does not improve. However, the number of prototypes chosen grows.
V Real Data Examples
We now apply the SCTD to two real-world data sets exhibiting complex spatio-temporal dynamics with intermittency. These examples illustrate the power of the SCTD to produce interpretable results, especially in the constrained time dynamics. Figures were rendered with the ggmap and ggplot2 R packages [24, 25].
V-a Houston Crime
Data mining is beginning to be applied to a myriad of law enforcement problems . These techniques can be used to help agencies deploy their employees more efficiently, predict the outcomes of new initiatives, and identify trends in crime in order to take preventative measures.
We apply the SCTD to a data set, collected by the Houston Police Department, with 85,622 crimes occurring in Houston from January to August 2010. We use a preprocessed version of the data included in the ggmap R package . We create a 3-way tensor of counts of these crimes. The dimensions are type of crime (aggravated assault, auto theft, burglary, robbery, or theft), crime beat (118 options), and hour of day (0–23). We then apply the SCTD to this data set using a mix of the three types of functions displayed in Fig. 3—windowed sines and cosines, Gaussians, and wrapped cosines.
The first three components of the SCTD are displayed in Fig. 10. In the first component, most beats are at least lightly included, although some are more intense. Theft in the evening is especially emphasized. The second component adds non-theft crimes to a different set of hot spots and subtracts non-theft crime from some of the beats that were important in the first component. The third component re-emphasizes some of the same beats, this time adding burglary and subtracting theft in the morning and conversely adding theft and subtracting burglary in the evening.
In Fig. 11, we compare our results to the first three components from CP_APR (the non-negative CP tensor decomposition with alternating Poisson regression  as implemented in the Matlab Tensor Toolbox ). The results using the SCTD are much smoother and more interpretable in the time dimension, but many of the same beats are considered important. Again, the ability to produce analytically tractable time functions helps our ability to both interpret and predict the dynamics associated with this complex, socially-inspired network.
V-B El Niño
Sensor and imaging technologies (oceanic, terrestial and satelite) have led to a significant increases in climate data and a limited but growing understanding of how to extract meaningful information from it. Interest in this interdisciplinary field has spawned, for example, the annual International Workshop on Climate Informatics .
We demonstrate the SCTD on a data set of sea surface temperatures. The data are freely available from the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA. We used the weekly sea surface temperature from the NOAA_OI_SST_V2 data set, which can be downloaded from http://www.esrl.noaa.gov/psd/. In particular, we consider the Pacific Ocean from 1995 through the end of 2000. We subtracted the background from the data using DMD , and then we created a library with a combination of Gaussians and windowed sines and cosines. Two of the first twelve modes that the SCTD extracted are shown in Fig. 12. The second component finds one-time phenomena related to the El Niño event of 1997–1998. In particular, we see unusually warm temperatures in the eastern Pacific ocean, especially near Peru, but almost stretching to New Guinea. By mid-to-late 1997, unusually cool waters occurred near the coast of Australia. The third component finds annual variation in temperature, split over the equator.
These results could not be obtained with standard DMD because the El Niño event is not a Fourier mode. However, recent innovations around multi-resolution analysis and DMD (the multi-resolution DMD algorithm ) does allow for a significantly improved description. Likewise, a traditional CP tensor decomposition might extract similar patterns, they would not be accompanied with a sparse analytic description. We show that the SCTD choses a sparse linear combination of our over-complete library in Fig. 13.
Data-driven discovery has become ubiquitous across the sciences, leading to the rise of the fourth paradigm of scientific discovery 
. Critical in meeting the challenges of this emerging paradigm is the development of algorithms that are capable of extracting meaningful and interpretable low-dimensional features from data that is high-dimensional and includes many distinct dimensions. The success of machine learning is largely due to its ability to represent data in low-dimensional feature spaces where data can be more effectively analyzed, classified and clustered. Matrix decomposition techniques, which project to low-rank subspaces via some underlying optimization algorithm, are the workhorses of the data science industry. For instance, Principal Component Analysis is now standard across almost every field of the engineering, social, biological and physical sciences. This SVD-based method provides a least-square fitting algorithm for data, thus providing low-rank subspaces that best represent the features of the data.
The success of the SVD is difficult to overestimate. It is simply the most dominant and successful matrix decomposition method being used today. The SVD requires, however, that multi-dimensional data first be flattened before being processed through the decomposition. This can lead to less parsimonious fitting than if the data was preserved in its original -way data tensor. Tensor decompositions, on the other hand, allow the data to be preserved in its original multi-dimensional context, which is especially advantageous for categorical data. Although tensors have been the subject of active research for the past four decades, it has been difficult for tensor decompositions to displace standard SVD with flattening decompositions. This is in part due to the multitude of potential tensor decompositions available to the practitioner, i.e. it is not unique. Moreover, the SVD has numerous enhancements for handling high-dimensional data, such as the randomized SVD [31, 32], which enables efficient computation of the matrix decomposition even with extraordinarily large data.
In this manuscript, we have developed what we think is a highly useful innovation to the standard CP tensor decomposition. By constraining the time dimension of the tensor decomposition, a more intuitively appealing and interpretable decomposition can be achieved. Indeed, analytic solution forms for the time dependency of the data decomposition can be extracted. This is done by using an over-complete library of potential temporal functions in order to select the best candidate functions via sparse regression. This work merges three distinct mathematical methods: tensor decompositions, sparse regression, and over-complete libraries. The success of the SCTD method is demonstrated on a number of simulated problems and two real-world applications where preserving the tensor nature of the data is highly desirable and advantageous. The SCTD method provides a viable data-discovery algorithm that can be used in a host of settings where low-rank features of an -way data tensor need to be analyzed. It should also be noted that one can easily envision also constraining other dimensions of the data, not just the time dimension.
Ultimately, the most useful data analysis techniques developed allow for interpretable diagnostics which are also predictive in nature. The SCTD advances a theoretical framework for tensor decompositions that provides an intuitively appealing framework for understanding the rich time dynamics of low-rank decompositions without requiring data-flattening. With the emergence of many categorical data structures, this can be especially appealing. Thus, we render a tensor decomposition package that is user-friendly and aids in identifying important dynamics structures in data, including intermittent phenomena, which are very difficult for standard tensor, DMD and PCA-like methods to deduce.
Matricization of a tensor: The mode- matricization or unfolding of a tensor is denoted by and is of size where . In this case, the tensor element with index maps to matrix element where
Derivation of update:
We prove that the update for is the solution to the optimization problem given in (7).
The negative of the Lagrangian for (7) (to express the optimization as a minimization) is given by
The KKT conditions are given by
There are two cases to consider. If , then the pair satisfies the KKT conditions. The stationarity condition is satisfied since since solves the lasso problem. The other conditions are easily verified.
If , then the pair satisfies the KKT conditions. The only tricky condition to verify is the stationarity condition. The other conditions are easy to verify. Since is the solution to the lasso problem we have that
Note that we have used the fact that for all . Therefore, the pair satisfies the KKT conditions. ∎
J. N. Kutz would like to acknowledge support from the Air Force Office of Scientific Research (FA9550-15-1-0385).
-  Karl Pearson. On lines and planes of closest fit to systems of points in space. Philosophical Magazine Series 6, 2(11):559–572, 1901.
-  Harold Hotelling. Analysis of a complex of statistical variables with principal components. Journal of Educational Psychology, 24:417–441, September 1933.
-  Harold Hotelling. Analysis of a complex of statistical variables with principal components. Journal of Educational Psychology, 24:498–520, October 1933.
-  Edward N. Lorenz. Empirical orthogonal functions and statistical weather prediction. Technical report, Massachusetts Institute of Technology, December 1956.
-  John L. Lumley. Stochastic tools in turbulence. Applied mathematics and mechanics. Academic press, New York, London, 1970.
-  Philip Holmes, John L. Lumley, and Gal Berkooz. Turbulence, coherent structures, dynamical systems, and symmetry. Cambridge monographs on mechanics. Cambridge University Press, Cambridge, England, 2nd edition, 2012.
-  Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455–500, September 2009.
-  Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.
-  Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press, 2012.
-  J. Douglas Carroll and Jih-Jie Chang. Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition. Psychometrika, 35:283–319, 1970.
-  Richard A. Harshman. Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multi-modal factor analysis. UCLA working papers in phonetics, 16:1–84, 1970. Available at http://www.psychology.uwo.ca/faculty/harshman/wpppfac0.pdf.
-  Stefan Klus, Patrick Gelß, Sebastian Peitz, and Christof Schütte. Tensor based dynamic mode decomposition. arXiv:1512.06527 [math.NA], 2016.
-  Jonathan H. Tu, Clarence W. Rowley, Dirk M. Luchtenburg, Steven L. Brunton, and J. Nathan Kutz. On dynamic mode decomposition: Theory and applications. Journal of Computational Dynamics, 1(2):391–421, 2014.
-  J. Nathan Kutz, Xing Fu, and Steven L. Brunton. Multiresolution dynamic mode decomposition. SIAM Journal on Applied Dynamical Systems, 15(2):713–735, 2016.
-  Brett W. Bader and Tamara G. Kolda. Efficient MATLAB computations with sparse and factored tensors. SIAM Journal on Scientific Computing, 30(1):205–231, December 2007.
-  Daniela M. Witten, Robert Tibshirani, and Trevor Hastie. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3):515–534, 2009.
Sparse higher-order principal components analysis.
Proceedings of the 15th International Conference on Artificial Intelligence and Statistics, 2012.
-  J. Douglas Carroll, Sandra Pruzansky, and Joseph B. Kruskal. Candelinc: A general approach to multidimensional analysis of many-way arrays with linear constraints on parameters. Psychometrika, 45(1):3–24, 1980.
-  Bruno A. Olshausen and David J. Field. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision Research, 37(23):3311 – 3325, 1997.
-  Eric C. Chi, Genevera I. Allen, Hua Zhou, Omid Kohannim, Kenneth Lange, and Paul M. Thompson. Imaging genetics via sparse canonical correlation analysis. In Biomedical Imaging (ISBI), 2013 IEEE 10th International Symposium on, pages 740–743, 2013.
-  Gideon Schwarz. Estimating the dimension of a model. The Annals of Statistics, 6(2):461–464, 03 1978.
-  Brett W. Bader, Tamara G. Kolda, et al. Matlab tensor toolbox version 2.6. Available online, February 2015.
-  Brett W. Bader and Tamara G. Kolda. Algorithm 862: MATLAB tensor classes for fast algorithm prototyping. ACM Transactions on Mathematical Software, 32(4):635–653, December 2006.
-  David Kahle and Hadley Wickham. ggmap: Spatial visualization with ggplot2. The R Journal, 5(1):144–161, 2013.
-  Hadley Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009.
-  Colleen McCue. Data mining and predictive analysis: Intelligence gathering and crime analysis. Butterworth-Heinemann, 2014.
-  Eric C. Chi and Tamara G. Kolda. On tensors, sparsity, and nonnegative factorizations. SIAM Journal on Matrix Analysis and Applications, 33(4):1272–1299, December 2012.
-  Valliappa Lakshmanan, Eric Gilleland, Amy McGovern, and Martin Tingley, editors. Machine Learning and Data Mining Approaches to Climate Science, Proceedings of the 4th International Workshop on Climate Informatics. Springer, 2015.
-  Jacob Grosek and J Nathan Kutz. Dynamic mode decomposition for real-time background/foreground separation in video. arXiv preprint arXiv:1404.7592, 2014.
-  Tony Hey, Stewart Tansley, and Kristin M. Tolle, editors. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, 2009.
-  Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217–288, 2011.
-  N. B. Erichson, J. N. Kutz, S. L. Brunton, and S. Voronin. Randomized matrix decompositions using r. arXiv preprint arXiv:1608.02148, 2016.