Hybrid Kronecker Product Decomposition and Approximation

12/06/2019 ∙ by Chencheng Cai, et al. ∙ Rutgers University 0

Discovering the underlying low dimensional structure of high dimensional data has attracted a significant amount of researches recently and has shown to have a wide range of applications. As an effective dimension reduction tool, singular value decomposition is often used to analyze high dimensional matrices, which are traditionally assumed to have a low rank matrix approximation. In this paper, we propose a new approach. We assume a high dimensional matrix can be approximated by a sum of a small number of Kronecker products of matrices with potentially different configurations, named as a hybird Kronecker outer Product Approximation (hKoPA). It provides an extremely flexible way of dimension reduction compared to the low-rank matrix approximation. Challenges arise in estimating a hKoPA when the configurations of component Kronecker products are different or unknown. We propose an estimation procedure when the set of configurations are given and a joint configuration determination and component estimation procedure when the configurations are unknown. Specifically, a least squares backfitting algorithm is used when the configuration is given. When the configuration is unknown, an iterative greedy algorithm is used. Both simulation and real image examples show that the proposed algorithms have promising performances. The hybrid Kronecker product approximation may have potentially wider applications in low dimensional representation of high dimensional data

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 15

page 16

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

High dimensional data often has low dimensional structure that allows significant dimension reduction and compression. In applications such as data compression, image denoising and processing, matrix completion, high dimensional matrices are often assumed to be of low rank and can be represented as a sum of several rank-one matrices (vector outer products) in a singular value decomposition (SVD) form.

Eckart and Young (1936) reveals the connection between singular value decomposition and low-rank matrix approximation. Recent studies include image low-rank approximation (Freund et al., 1999), principle component analysis (Wold et al., 1987; Zou et al., 2006), factorization in high dimensional time series (Lam and Yao, 2012; Yu et al., 2016), non-negative matrix factorization (Hoyer, 2004; Cai et al., 2009), matrix factorization for community detection (Zhang and Yeung, 2012; Yang and Leskovec, 2013; Le et al., 2016), matrix completion problems (Candès and Recht, 2009; Candes and Plan, 2010; Yuan and Zhang, 2016)

, low rank tensor approximation

(Grasedyck et al., 2013)

, machine learning applications

(Guillamet and Vitrià, 2002; Pauca et al., 2004; Zhang et al., 2008; Sainath et al., 2013), among many others.

As an alternative to vector outer product, the Kronecker product is another way to represent a high dimensional matrix with a much fewer number of elements. The decomposition of a high dimensional matrix into the sum of several Kronecker products of identical configuration is known as Kronecker product decomposition (Van Loan and Pitsianis, 1993). Here configuration refers to the dimensions of the component matrices of the Kronecker product. The form of Kronecker product appears in many fields including signal processing, image processing and quantum physics (Werner et al., 2008; Duarte and Baraniuk, 2012; Kaye et al., 2007), where the data has an intrinsic Kronecker product structure. Cai et al. (2019) considers to model a high dimensional matrix with a sum of several Kronecker products of the same but unknown configuration. For a given configuration, the approximation using a sum of several Kronecker products can be turned into an approximation using a sum of several rank-one matrices problem after a rearrangement operation of the matrix elements (Van Loan and Pitsianis, 1993; Cai et al., 2019). The unknown configuration can be estimated by minimizing an information criterion as proposed by Cai et al. (2019).

However, it is often the case that the Kronecker outer Product (KoPA) using a single configuration requires a large number of terms to make the approximation accurate. By allowing the use of a sum of Kronecker products of different configurations, an observed high dimensional matrix (image) can be approximated more effectively using a much smaller number of parameters (elements). We note that often the observed matrix can have much more complex structure than a single Kronecker product can handle. For example, representing an image with Kronecker products of the same configuration is often not satisfactory since the configuration dimensions determine the block structure of the recovered image, similar to the pixel size of the image. A single configuration is often not possible to provide as much details as needed, as seen from the top row of Figure 10 later. Similar to the extension from low rank matrix approximation to KoPA of a single configuration, we propose to extend the Kronecker product approach to allow for multiple configurations. It is more flexible and may provide more accurate representation with a smaller number of parameters.

In this paper, we generalize the KoPA model in Cai et al. (2019) to a multi-term setting, where the observed high dimensional matrix is assumed to be generated from a sum of several Kronecker products of different configurations – we name the model hybrid KoPA (KoPA). As a special case, when all the Kronecker products are vector outer products, KoPA corresponds to a low rank matrix approximation.

We consider two problems in this paper. We first propose a procedure to estimate a KoPA with a set of known configurations. The procedure is based on an iterative backfitting algorithm with the basic operation being finding the least squares solution of a single Kronecker product of a given configuration to a given matrix. This operation can be obtained through a rearrangement operation and a SVD estimation. Next, we consider the problem of determining the configurations in a KoPA for a given observed matrix. As exploiting the space of all possible configuration combinations is computationally expensive, we propose an iterative greedy algorithm similar to the boosting algorithm (Freund et al., 1999). In each iteration, a single Kronecker product term is added to the model by fitting the residual matrix from the previous iteration. The configuration of the added Kronecker product is determined using the procedure proposed in Cai et al. (2019). This algorithm efficiently fits a KoPA model with a potentially sub-optimal solution as a compromise between computation and accuracy.

The rest of the paper is organized as follows. The KoPA model is introduced and discussed in Section 2, with a set of identifiability assumptions. In Section 3, we provide the details of the iterative backfitting estimation procedure for the model with known configurations and the greed algorithm to fit a KoPA with unknown configurations. Section 4 demonstrates the performance of the proposed procedures with a simulation study and a real image example. Section 5 concludes.

Notations: For a matrix , stands for its Frobenius norm and stands for its spectral norm, which corresponds to the largest singular value of .

2 Hybrid Kronecker Product Model

For simplicity, in this paper we assume the dimensions of the observed matrix are of powers of 2. As a consequence, all the component matrices in KoPA are also of powers of 2. The model and procedures apply to all other cases of for which the dimensions of the component matrices in KoPA are then the factors of and , with more complex notations.

In a -term KoPA model, we assume that the observed matrix is generated from the sum of Kronecker products of configurations ,

(1)

where the matrix is of dimension , the matrix is of dimensions , for . The matrix is the noise matrix with i.i.d. standard Gaussian entities. The operation denotes the Kronecker product such that for any two matrices and , is a block matrix defined by

where is the -th element of . To better present the block structure of the Kronecker product, for any matrix , we denote as the -th block of such that . We denote the dimensions as the configuration of the Kroneker product . When are fixed, we simplify the configuration notation as .

We denote the configuration of the KoPA model in (1) as the collection of configurations . The Kronecker product components in (1) are allowed to have different configurations . When the model configuration is known, we need to estimate the component matrices and , for in model (1). When the configuration is unknown, the estimation of model (1) requires the determination of the configuration as well, resulting in a configuration determination problem in addition to the estimation problem.

Some existing researches on Kronecker product structured data can be viewed as the special cases of model (1). When and the configuration is unknown, model (1) reduces to the single-term KoPA model investigated in Cai et al. (2019). When the configurations of the Kronecker products are known and equal such that , an estimation of model (1) is provided by the Kronecker product decomposition approximation (Van Loan and Pitsianis, 1993).

First we note that model (1) is not identifiable. Similar to the single-term KoPA model in Cai et al. (2019), Assumption 1 below is imposed such that and are normalized and the Kronecker products are in decreasing order of the coefficient such that given the values of the Kronecker products, , and can be uniquely determined with an exception for the sign changes of and .

Assumption 1 (Identifiability Condition 1).

We assume

and .

Notice that if a matrix of dimension is smaller than a matrix of dimension in that , then for any matrix , we have

(2)

for any of dimension , and of dimension . Therefore, additional identifiability condition is needed to be imposed to and when and as shown in the following Assumption 2.

Assumption 2 (Identifiability Condition 2).

For any such that and , we assume

for all and , where denotes the matrix whose -th element is 1 with all other elements being 0. Furthermore, if and , we assume

Assumption 2 is an orthogonal assumption on and when both dimensions of are less than or equal to the ones of . If and do not satisfy the condition in Assumption 2, one can always perform an orthogonalization operation by finding a

non-zero matrix

, whose -th element is given by

(3)

such that Assumption 2 is satisfied for and . Note that in (3) is the least squares solution of . The procedure of orthogonalizing and

can be generalized to multiple terms through the Gram-Schmidt process depicted in Figure

1. Note that the identifiability Assumption 2 can be replaced with the same condition on the s, but there is no need to impose the condition on both ’s and ’s.

6.5in

Figure 1: Gram-Schmidt Process for KoPA Model
  • Let be a shuffle of the set such that

    1. for all ,

    2. for all such that .

  • Set , and .

  • For ,

    • Let .

    • Fit the following lease square problem

      and denote the solution as for .

    • Set

    • For , update

  • , , satisfy Assumptions 1 and 2.

3 Methodology

In this section, we propose the estimation procedures for fitting the KoPA model in (1). Specifically, when the configurations are known, we adopt the backfitting approach (or alternating least squares approach) to fit the model. And when the configurations are unknown, we propose a greedy approach by adding one Kronecker product component at a time.

3.1 Hybrid Kronecker Product Model with Known Configurations

When the configurations , , are known, we consider the following least squares problem.

(4)

When , such a problem can be solved by singular value decomposition of a rearranged version of matrix as shown in Cai et al. (2019). Specifically, the rearrangement operation reshape the matrix to a new matrix such that

where stands for the -th block of matrix and is the vectorization operation that flattens a matrix to a column vector. It is pointed out by Van Loan and Pitsianis (1993) and Cai et al. (2019) that the rearrangement operation can transform a Kronecker product to a vector outer product such that

This can be seen from the fact that all the elements in the matrix are in the form of , which is exactly the same as those in , where is the -th element in and is the -th element in .

Therefore, the least squares optimization problem

is equivalent to a rank-one matrix approximation problem since

whose solution is given by the leading component in the singular value decomposition of as proved by Eckart and Young (1936).

When there are multiple terms in the model (1), we propose to solve the optimization problem (4) through a backfitting algorithm (or alternating least squares algorithm) by iteratively estimating , and by

using the rearrangement operator and SVD, with fixed , and () from previous iterations.

When all configurations are distinct, the backfitting procedure for KoPA is depicted in Figure 2, where is the inverse of the vectorization operation that convert a column vector back to a matrix. When terms indexed by in the KoPA model have the same configuration, these terms are updated simultaneously in the backfitting algorithm by keeping the first components from the SVD of the residual matrix . We also Orthonormalize the components by the Gram-Schmidt process at the end of each backfitting round.

Since each iteration of the backfitting procedure reduces the sum of squares of residuals, the algorithm always converges, though it may land in a local optimal. Empirical experiences show that most of the time the global minimum is reached. Starting with different initial values and order of backfitting helps.

6.5in

Figure 2: Backfitting Least Squares Procedure
  • Set .

  • Repeat until convergence:

    • For ,

      • Let .

      • Compute SVD of such that

        where the singular values are in decreasing order such that .

      • Update , and .

    • Orthonormalize the components by the Gram-Schmidt process in Figure 1.

  • Output .

3.2 Hybrid KoPA with Unknown Configurations

In this section, we consider the case when the model configuration is unknown. We use a greedy method similar to boosting to obtain the approximation by iteratively adding one Kronecker product at a time, based on the residual matrix obtained from the previous iteration. Specifically, at iteration , we obtain

where , and are obtained in the previous iterations, starting with . Then we use the single-term KoPA with unknown configuration proposed in Cai et al. (2019) to obtain

where the configuration of and is obtained by minimizing the information criterion

(5)

where is the number of parameters of the single-term model with configuration and is the penalty coefficient on model complexity. When the information criterion is a monotone function of the mean squared error (MSE). When and , the information criterion is same as Akaike information criterion (AIC) (Akaike, 1998) and Bayes information criterion (BIC) (Schwarz, 1978), correspondingly. As shown in Cai et al. (2019)

, in a single-term Kronecker product case, when the signal-to-noise ratio is large enough, minimizing the information criterion

in (5) produces consistent estimators of the true configuration.

The procedure is repeated until a stopping criterion is reached. The algorithm is depicted in Figure 3.

Remark: The iterative algorithm in Figure 3 is a greedy algorithm, which does not guarantee a global optimal in all configuration combinations. The output of the algorithm satisfies Assumption 1 but does not satisfy Assumption 2. However, searching the configuration space in a greedy and additive way requires less computational power. It is possible that, given the configuration obtained in the greed algorithm, a refinement step can be engaged using the algorithm proposed in Section 3.1. If more computational resources are available, the refinement can be done at the end of each iteration based on obtained to obtain better partial residual matrix so the configuration determination in future iterations are more accurate.

Remark: The stopping criterion can be selected according to the objective of the study. For denoising applications, one may specify the desired level of proportion of the total variation explained by the KoPA to be reached. Similarly a scree plot approach can be used. It is also possible to minimize an overall information criterion in the form of

(6)

where is the total number of parameters with the configuration . For image compression applications, the information criterion can be replaced with the ratio of explained variation to the total storage size. In Section 4.2

, we will illustrate a practical stopping criterion using random matrix theory.

Remark: Compared with the backfitting algorithm for known configuration problems, the iterative algorithm in Figure 3 focuses more on the configuration determination in an iterative and additive way. When the configurations are known exactly, the backfitting algorithm in Figure 2 gives more accurate estimates of the components.

6.5in

Figure 3: Iterative Algorithm for KoPA Estimation
  • Set .

  • For :

    • For all :

      • Do SVD for such that

      • Set , and .

      • Obtain .

    • Obtain through

    • Set , and .

    • Break if a stopping criterion is met.

    • Calculate .

  • Return , where is the number of terms determined by the stopping criterion.

4 Empirical Examples

4.1 Simulation

In this simulation, we exam the performance of the least squares backfitting algorithm in Figure 2 for a two-term Kronecker product model and determine the factors that may affect the accuracy and convergence speed of the algorithm.

Specifically, we simulate the data according to

where and , () satisfy Assumption 1. Here we assume and are linearly independent to ensure identifiability Assumption 2, as discussed in Section 2. If they are not linearly independent, we can always apply Gram-Schmidt process to reformulate the model so and are linearly independent. One of the objectives of the simulation study is to see the impact of linear dependence of and , once we control the linear dependence of and . The shape parameters for this simulation study is set to

To simulate the component matrices, we first generate and by normalizing corresponding i.i.d standard Gaussian random matrices. and are then orthogonalized in the sense that is minimized at . Orthonormal and are generated in a similar way. We set

where is a matrix of ones and controls the linear dependency between and .

Note that when , and are linearly independent. When , and the model can be represented with a single term Kronecker product.

In this simulation we consider and . The benchmark setting is and , under which the signal-to-noise ratio is and of the variation in can be explained by .

We first examine the effect of linear dependency of and , controlled by . Hence we fix and check the performance of the backfitting algorithm under different values of , using the known configurations. Figure 4 shows the relative error for the first 40 iterations for the five different values of . At , a perfect fit is expected to have an error of

under the simulation setting. It is seen from Figure 4 that the estimates tend to overfit as the final relative errors are all smaller than the expected value. This is due to the fact that the observed error term is not orthogonal to the observed signal. However, the less and are linearly dependent, the less the model is overfitted.

Figure 4: Fitting error against number of iterations for different values, when
Figure 5: Errors in components against number of iterations for different values.

By comparing the convergence speed of different values, we notice that larger value of , corresponding to higher linear dependency between and , results in a slower convergence rate. When and are linearly independent (), only one iteration is needed.

The errors of the estimation of the coefficients , , , , and plotted against the number of iterations are shown in Figure 5. For all the components, higher values of result in slower convergence rates and less accurate final results.

Notice that when , the four matrices involved in the two-term model are not symmetric under the identifiability conditions using the iterative backfitting algorithm. On the one hand, and are enforced to be orthogonal, while and are not. On the other hand, the iterative backfitting algorithm fits and first, resulting in an overfitting for the first Kronecker product in the first iteration. Due to the asymmetry, the performance of estimating and is better than that of estimating and when and have linear dependency. A strong correlation between the errors in and is observed. We also note that the convergence rate for both and are slower than that for the matrices, especially for the high linear dependence cases.

Next, we examine the effect of the noise level . We fix and consider five different values of the noise . The error in estimating is reported in Figure 6. It is seen that higher noise level results in larger errors, as expected. A small difference in the convergence speed is observed as well. The algorithm converges faster when the noise level is high, but it is not as sensitive as that in the change of linear dependence level .

Figure 6: Fitting error against number of iterations for different values

Errors for estimating the different components in the model are plotted in Figure 7 for different noise levels. The difference in convergence rates is less obvious. We also observe that the performance for estimating the smaller component matrices and is better than that for the larger matrices and .

Figure 7: Errors in components against number of iterations for different values.

4.2 Real image example

In this exercise, we apply the KoPA to analyze the image of Lenna, which has been used widely as a benchmark image in image processing studies. The Lenna’s image shown in Figure 8 is a gray-scaled picture, which is represented by a () real matrix . The elements of are real numbers between 0 and 1, where represents black and represents white. Besides the original image, in this example we also consider some artificially blurred images using

where

is a matrix of i.i.d. standard Gaussian random variables and

denotes the noise level.

Figure 8: Lenna’s image

Here we consider three noise levels . Note that the original image scale is . Hence the image with noise level is considered to be heavily blurred. The blurred images are shown in Figure 9.

Figure 9: Lenna’s image with three noise levels

For this real image, the configurations in the KoPA model (1) is unknown. Therefore, we adopt the iterative greedy algorithm proposed in Section 3.2, where the configuration in each iteration is determined by BIC. For each , we consider to fit the image with at most 20 Kronecker product terms. The configurations selected , the fitted

and the cumulated percentage of variance explained (c.p.v.) for the first 10 iterations are reported in Table 

1. It is seen that for all noise level values, the first several Kronecker products terms can explain most of the variances of . With larger , decreases slower as increases, because the residual matrix has smaller variance. We also note that as the number of terms increases, configurations with smaller number of parameters tend to be selected, with some exceptions. This is partially because the residual matrix tends to become less complex as higher level complexity tends to be explained by the earlier terms. Note that configuration has total parameters, while has total parameters. The proportion of variance of the true image is shown in the bottom row in Table 1. An overfitting is observed for after the seventh iteration.

In the heavily blurred cases, configurations close to the center such as are more likely to be selected by BIC. These configurations correspond to more squared and matrices.

k
c.p.v. c.p.v. c.p.v. c.p.v.
1 (6, 7) 95.07 91.81 (5, 6) 91.21 66.76 (5, 6) 91.80 41.40 (4, 6) 88.28 23.21
2 (6, 6) 14.21 93.86 (5, 6) 21.88 70.60 (3, 6) 15.42 42.57 (4, 5) 26.97 25.37
3 (5, 7) 12.18 95.39 (5, 6) 19.48 73.65 (5, 4) 14.07 43.58 (4, 5) 18.05 26.34
4 (6, 6) 10.17 96.47 (4, 5) 8.00 74.16 (5, 4) 13.52 44.47 (3, 6) 17.37 27.24
5 (5, 6) 6.47 96.90 (5, 4) 7.66 74.63 (5, 4) 12.60 45.25 (4, 5) 15.68 27.97
6 (5, 5) 4.65 97.12 (4, 5) 7.00 75.03 (3, 6) 11.91 45.96 (4, 5) 15.24 28.66
7 (4, 5) 3.48 97.24 (4, 5) 6.67 75.39 (3, 6) 11.07 46.60 (3, 6) 14.70 29.32
8 (4, 5) 3.29 97.35 (5, 4) 6.40 75.72 (5, 4) 10.56 47.13 (4, 5) 13.84 29.90
9 (5, 5) 3.66 97.49 (5, 4) 6.19 76.03 (5, 4) 9.92 47.62 (4, 5) 13.76 30.47
10 (4, 5) 2.92 97.58 (5, 4) 6.05 76.32 (3, 6) 9.59 48.08 (2, 7) 13.63 31.00
- - 100 - - 79.01 - - 48.38 - - 29.32
Table 1: Configurations selected, the fitted and the cumulated percentage of variance explained (c.p.v.) for the first 10 iterations. Proportion of variance of the true image is shown in the bottom row
1
2
3
4
5
6
Figure 10: Fitted Images in the first 6 iterations.

The fitted images of using one to six Kronecker product terms at different noise levels are plotted in Figure 10 with the number of parameters shown under each image. It is seen from the figures that, as expected, the smaller noise levels result in better recovery of the image with less numbers of Kronecker product terms. Note that the images plotted in Figure 10 are ordered according to the number of Kronecker product terms used instead of the number of parameters used. The configuration selected in the first iteration in the case uses roughly four times the number of parameters of the configuration selected in the case.

In additional to the iterative greedy algorithm, we use a stopping criterion based on random matrix theory to determine number of Kronecker products. Specifically, at iteration , an estimate of is

Under the i.i.d. Gaussian assumption on , we have

according to the non-asymptotic analysis on the random matrices

(Vershynin, 2010). Here we set

such that the probability is bounded by 0.01. We terminate the algorithm at step

if

(7)

and use the first terms as the optimal approximation. Specifically, when or , the stopping criterion is never met in the first iterations and we use . When , a 9-term model is selected, and when , the stopping criterion results in a 7-term model.

Singular value decomposition (SVD) is another widely used approach in image denoising and compression. It assumes

The complexity level in such a approach is , the number of rank one matrices used in the approximation. Note that SVD is a special case of KoPA model in which each Kronecker product is a vector outer product, corresponding to configuration in KoPA model. Hence no configuration determination is needed. The number of parameters used in such an approximation is where is the number of rank-one matrices used. To compare the performance of KoPA with the SVD approach, we calculate the relative squared error of the fitted matrix by

where is the original image without noise and is the fitted matrix of the noisy version . For different noise levels , we plot RSE against the number of parameters used in the approximation, shown in Figure 11 for both SVD approximation and KoPA. Models with different number of terms are marked in Figure 11 and the KoPA with the proposed random matrix stopping criterion is highlighted with a “” marker. Comparing the error curve of SVD with the one of KoPA, Figure 11 reveals that for any level of model complexity (or the number of parameters), KoPA is more accurate than the standard low rank SVD approximation. When noises are added, overfitting is observed for both KoPA and SVD approximation as the error (comparing to the true image) increases when too many terms are used, seen from the -shape of the curves. The stopping criterion in (7) prevents the model from significantly overfitting. The realized relative error of the KoPA with number of terms selected by (7) is close to the minimum attainable error, though the estimated number of terms is not the optimal one.

Figure 11: Error of fitted matrix against the number of parameters used.

5 Conclusion and Discussion

In this paper, we extend the single-term KoPA model proposed in Cai et al. (2019) to a more flexible setting, which allows multiple terms with different configurations and allows the configurations to be unknown. Identifiability conditions are studied to ensure unique representation of the model. And we propose two iterative estimation algorithms.

With a given set of configurations, we propose a least squares backfitting algorithm that fits each Kronecker product component in an alternating way. The simulation study shows the performance of the algorithm and the impact of the linear dependency between the component matrices.

When the configurations are unknown, the extra flexibility of KoPA allows for more parsimonious representation of the underlying matrix, though it comes with the challenge of configuration determination. An iterative greedy algorithm is proposed to jointly determine the configurations and estimate each Kronecker product component. The algorithm iteratively adds one Kronecker product term to the model by finding the best one term KoPA to the residual matrix obtained from the previous iteration, using the procedure proposed in Cai et al. (2019). By analyzing a real image, we demonstrated that the proposed algorithm is able to obtain reasonable KoPA and the results are significantly superior than the standard low rank matrix approximation using SVD.

The matrix dimensions discussed in this article are powers of 2. This is certainly not necessary. If the observed matrix is dimension , and if has factors and has factors , then the possible configuration set is all combinations of for and , excluding the trivial and cases. Note that configuration corresponds to the traditional low rank matrix SVD approximation setting. All algorithms presented in this paper apply for the general cases. Of course the more factors and

have, the larger the possible configuration set is hence we have more flexibility to find a better approximation. One can augment the observed matrix with additional rows and columns to increase the size of allowable configuration set. In image processing, a common tool used for augmentation is super-sampling by interpolating more pixels. In matrix denoising and compression, one can add zeros or duplicate certain rows and columns of the original matrix.

As discussed in Section 3, the greedy algorithm for configuration determination is similar to boosting algorithm. It can be improved at the expense of higher computational cost. In addition to combine it with the backfitting estimation procedure under a fixed configuration obtained by the greedy algorithm, the configurations can also be improved and adjusted under the backfitting framework. Their theoretical properties need to be investigated. A more sophisticated stopping criterion also needs to be investigated. Rrevious researches on principle component analysis rank determination such as those Minka (2001); Lam and Yao (2012); Bai et al. (2018) may be extended to our case.

References

  • H. Akaike (1998) Information theory and an extension of the maximum likelihood principle. In Selected Papers of Hirotugu Akaike, pp. 199–213. External Links: ISBN 978-1-4612-1694-0, Document, Link Cited by: §3.2.
  • Z. Bai, K. P. Choi, and Y. Fujikoshi (2018)

    Consistency of AIC and BIC in estimating the number of significant components in high-dimensional principal component analysis

    .
    The Annals of Statistics 46, pp. 1050–1076. External Links: Document Cited by: §5.
  • C. Cai, R. Chen, and H. Xiao (2019) KoPA: automated Kronecker product approximation. Note: preprint https://arxiv.org/abs/1912.02392 Cited by: §1, §1, §1, §2, §2, §3.1, §3.2, §5, §5.
  • D. Cai, X. He, X. Wang, H. Bao, and J. Han (2009) Locality preserving nonnegative matrix factorization. In

    Twenty-First International Joint Conference on Artificial Intelligence

    ,
    Cited by: §1.
  • E. J. Candes and Y. Plan (2010) Matrix completion with noise. Proceedings of the IEEE 98 (6), pp. 925–936. Cited by: §1.
  • E. J. Candès and B. Recht (2009) Exact matrix completion via convex optimization. Foundations of Computational mathematics 9 (6), pp. 717. Cited by: §1.
  • M. F. Duarte and R. G. Baraniuk (2012) Kronecker compressive sensing. IEEE Transactions on Image Processing 21 (2), pp. 494–504. External Links: Document, ISSN Cited by: §1.
  • C. Eckart and G. Young (1936) The approximation of one matrix by another of lower rank. Psychometrika 1 (3), pp. 211–218. External Links: ISSN 1860-0980, Document, Link Cited by: §1, §3.1.
  • Y. Freund, R. Schapire, and N. Abe (1999) A short introduction to boosting. Journal-Japanese Society For Artificial Intelligence 14 (771-780), pp. 1612. Cited by: §1, §1.
  • L. Grasedyck, D. Kressner, and C. Tobler (2013) A literature survey of low-rank tensor approximation techniques. GAMM-Mitteilungen 36 (1), pp. 53–78. Cited by: §1.
  • D. Guillamet and J. Vitrià (2002)

    Non-negative matrix factorization for face recognition

    .
    In Catalonian Conference on Artificial Intelligence, pp. 336–344. Cited by: §1.
  • P. O. Hoyer (2004) Non-negative matrix factorization with sparseness constraints. Journal of machine learning research 5 (Nov), pp. 1457–1469. Cited by: §1.
  • P. Kaye, R. Laflamme, and M. Mosca (2007) An introduction to quantum computing. Oxford University Press. Cited by: §1.
  • C. Lam and Q. Yao (2012) Factor modeling for high-dimensional time series: inference for the number of factors. The Annals of Statistics 40 (2), pp. 694–726. Cited by: §1, §5.
  • C. M. Le, E. Levina, and R. Vershynin (2016) Optimization via low-rank approximation for community detection in networks. The Annals of Statistics 44 (1), pp. 373–400. Cited by: §1.
  • T. P. Minka (2001) Automatic choice of dimensionality for PCA. In Advances in neural information processing systems, pp. 598–604. Cited by: §5.
  • V. P. Pauca, F. Shahnaz, M. W. Berry, and R. J. Plemmons (2004) Text mining using non-negative matrix factorizations. In Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 452–456. Cited by: §1.
  • T. N. Sainath, B. Kingsbury, V. Sindhwani, E. Arisoy, and B. Ramabhadran (2013)

    Low-rank matrix factorization for deep neural network training with high-dimensional output targets

    .
    In 2013 IEEE international conference on acoustics, speech and signal processing, pp. 6655–6659. Cited by: §1.
  • G. Schwarz (1978) Estimating the dimension of a model. Ann. Statist. 6 (2), pp. 461–464. External Links: Document, Link Cited by: §3.2.
  • C. F. Van Loan and N. Pitsianis (1993) Approximation with Kronecker products. In Linear algebra for large scale and real-time applications, pp. 293–314. Cited by: §1, §2, §3.1.
  • R. Vershynin (2010) Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing, Cited by: §4.2.
  • K. Werner, M. Jansson, and P. Stoica (2008) On estimation of covariance matrices with Kronecker product structure. IEEE Transactions on Signal Processing 56 (2), pp. 478–491. External Links: Document, ISSN Cited by: §1.
  • S. Wold, K. Esbensen, and P. Geladi (1987) Principal component analysis. Chemometrics and intelligent laboratory systems 2 (1-3), pp. 37–52. Cited by: §1.
  • J. Yang and J. Leskovec (2013) Overlapping community detection at scale: a nonnegative matrix factorization approach. In Proceedings of the sixth ACM international conference on Web search and data mining, pp. 587–596. Cited by: §1.
  • H. Yu, N. Rao, and I. S. Dhillon (2016) Temporal regularized matrix factorization for high-dimensional time series prediction. In Advances in neural information processing systems, pp. 847–855. Cited by: §1.
  • M. Yuan and C. Zhang (2016) On tensor completion via nuclear norm minimization. Foundations of Computational Mathematics 16 (4), pp. 1031–1068. Cited by: §1.
  • T. Zhang, B. Fang, Y. Y. Tang, G. He, and J. Wen (2008) Topology preserving non-negative matrix factorization for face recognition. IEEE Transactions on Image Processing 17 (4), pp. 574–584. Cited by: §1.
  • Y. Zhang and D. Yeung (2012) Overlapping community detection via bounded nonnegative matrix tri-factorization. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 606–614. Cited by: §1.
  • H. Zou, T. Hastie, and R. Tibshirani (2006) Sparse principal component analysis. Journal of computational and graphical statistics