Many applications involves observations in a high dimensional matrix form and often the observed matrix has many missing entries and is observed with error. In many recent machine learning studies of high dimensional matrix observations, one of the main approaches is to assume that the high dimensional observed matrix has a underlying low rank structure. The low rank representation explains the interaction between matrix entities with a smaller number of parameters and reveals the core factors that drive and control the high dimensional observations, resulting in significant dimension reduction. Such a low rank assumption also makes it possible to recover the missing entries in an observed matrix, which is known as the matrix completion problem. Some of the matrix completion applications include collaborative filtering(Goldberg et al., 1992), global positioning (Biswas et al., 2006) and remote sensing (Schmidt, 1986). One of the most famous examples is the Netflix recommendation system contest, in which the winning algorithm recovers the movie-rating matrix by a rank one matrix based on the observed entities.
Two different settings of matrix completion problems have been studied in the literature. One is the exact matrix completion problem whose goal is to recover the original matrix exactly when a proportion of the matrix entries is missing. When the original matrix rank is known, it can be recovered through the alternating minimization algorithm proposed by Jain et al. (2013) under certain conditions. When the matrix rank is unknown, it is still possible to exactly recover the matrix through nuclear norm optimization (Candès and Recht, 2009; Candès and Tao, 2010)
. The nuclear norm optimization approach can also be applied to tensor completion problems whose goal is to recover a tensor structure(Yuan and Zhang, 2016). The second setting considers the circumstances when the observed entities are corrupted with noises while a portion of the matrix entries is missing. It is known as the stable matrix completion problem. Candes and Plan (2010) extends the nuclear norm optimization approach in the exact matrix completion problem to the stable matrix completion problem by relaxing the constraint. Assuming the matrix rank is known, Keshavan et al. (2010) approaches the problem using a combination of spectral techniques and manifold optimization. Specifically for stable rank one matrix completion problem, Cosse and Demanet (2017) proposes to solve it using two rounds of semi-definite programming relaxation. Note that the alternating minimization algorithm in Jain et al. (2013) is applicable for the stable matrix completion problem as well.
It is observed that in many applications of image processing, signal processing and quantum computing, the high dimensional data in matrix forms often has a low-rank structure in terms of Kronecker product instead of singular value decomposition(Werner et al., 2008; Duarte and Baraniuk, 2012; Kamm and Nagy, 1998). In cases when the configuration of the underlying Kronecker product is unknown, it is usually difficult to directly apply classical matrix completion algorithms to such data since different configurations of Kronecker product may exist for the same observed matrix. A configuration determination procedure is required before matrix completion can be carried out.
In this article, we consider the matrix completion problem under the setting that the original matrix has a -term Kronecker product structure with an unknown configuration. We propose to use an information criterion to determine the configuration as that in Cai et al. (2019)
. The procedure can be summarized to two steps. In the first step, for all conformable configurations, a standard matrix completion process using the alternating minimization algorithm is conducted and the information criterion is calculated. In the second step, the configuration is estimated by minimizing the information criterion and the estimated complete matrix is obtained using the determined configuration.
The rest of the paper is organized as follows. In Section 2, we formally define the matrix completion problem under a Kronecker product structure assumption on the underlying matrix. The estimation procedure and configuration determination procedure are presented in Section 3. Section 4 provides a simulation study and a real image example on the configuration selection procedure and the performance of the proposed matrix completion procedure. Section 5 concludes.
For a vector, denotes its Euclidean norm. For a matrix , represents its Frobenius norm such that and represents its spectral norm defined by .
2 Matrix Completion Problem with the Kronecker product structure assumption
Let be a high dimensional matrix. Suppose both and can be factorized as and . Then has a complete Kronecker product decomposition (KPD) in the form
where and , and the Kronecker product of and is given as
where is the element of in -th row and -th column. The identifiability condition
are needed and will be assumed so that given and the configuration, , and are uniquely defined up to a sign change. The or simply is called the configuration of the Kronecker product decomposition. When and have multiple factors, there may exist multiple configurations.
A low rank KPD assumes that
with . This is similar to matrix low rank assumption in which the matrix is assumed to have the form
Let be the observed matrix with missing entries to be estimated. We assume
where i.i.d Bernoulli, and is the complete data matrix. The rate is called the observing rate. We further assume is a corrupted version of an underlying signal matrix in that
where is a matrix with i.i.d. standard Gaussian entities and denotes the noise level. The underlying signal matrix is assumed to have a low rank KPD in (2).
Let be the set of the indices of observed entities in and define be the observed version of such that
Note that, the standard matrix completion algorithms (Candès and Recht, 2009; Candes and Plan, 2010; Jain et al., 2013) assumes low rank structure of the signal matrix in the form of (3). It is a special case of our model, with and . In this paper we propose to use an information criterion to determine the configuration and the rank as well, for the purpose of matrix completion under the low rank KPD setting. We name the problem as the stable matrix completion problem for Kronecker product structured data, or the Kronecker matrix completion problem for short.
In this section, we propose a two-step procedure to solve the stable matrix completion problem. In the first step, we explore all candidate configurations of the underlying KPD. For each configuration, the Kronecker matrix completion problem is equivalent to the classical matrix completion problem after a rearrangement operation of the matrix elements and hence is solved with existing low rank matrix completion algorithms. We obtain an information criterion for each possible configurations. In the second step, we determine the configuration by minimizing the information criterion over all candidate configurations and the final completion estimate is obtained using the estimated optimal configuration. The procedure is similar to that in Cai et al. (2019) but modified to solve the Kronecker matrix completion problem.
As discussed in Van Loan and Pitsianis (1993) and Cai et al. (2019), the Kronecker product of two matrices can be transformed to the outer product of their vectorized version through rearrangement of elements. It can be seen from (1) that all the elements in have the form and all the elements in have the same form , where is the vectorization operation that flattens a matrix to a column vector. Hence and have exactly the same elements in the matrices, except the arrangement – one is a matrix and the other is a matrix. We will define a rearrangement operator so that . Specifically, assuming is a matrix, a rearrangement operation is defined as
where is the -th block of of size . Then for any matrix and matrix , we have
That is, for any given configuration, the rearrangement operation turns a Kronecker product into a rank-one matrix. Note that the rearrangement operation is linear, in that and the inverse operator exist so that . Hence problems arisen in the Kronecker product context can therefore be solved under rank-one matrix context. For example, (2) is a direct consequence of a SVD decomposition of
Let be any allowed configuration for a matrix. That is, is a factor of and is a factor of . Let be a fixed rank. Given any partially observed corrupted matrix of dimensions and , the Kronecker matrix completion problem under configuration is the following optimization problem.
where , , and records the indices of observed entities after the rearrangement such that for any matrix , we have . To solve the optimization in (7), we adopt the alternating minimization algorithm proposed by Jain et al. (2013), where the initial values for and are directly estimated from the singular value decomposition of as in Keshavan et al. (2010). The algorithm is depicted in Figure 1. The recovered matrix is therefore , where is the inverse operation of rearrangement and , and are the optimal solution of (7). In practice, the rank needs to be determined. Here we simply follow the standard low rank approximation using matrix SVD on the rearranged matrix as in Eckart and Young (1936).
Notice that in classical matrix completion problems (Candès and Recht, 2009; Candes and Plan, 2010), several assumptions are made on the observed set and the components and . One assumption is that the information of and spreads across all the columns and rows of the matrix , which is known as the incoherence condition. In the rank-r case as in (7), the incoherence condition bounds the norm of all and . Another assumption is imposed on the set (or the set corresponding to the optimization (7)). It is often assumed that for each row or column of , at least one element is observed. Otherwise, if row of is completely missing, the -th value of is not recoverable as it can have an arbitrary value and the value of the objective function in (7) does not change. Therefore, whenever a missing column or a missing row is observed in , the algorithm in Figure 1 returns nothing.
If any column or row in is completely unobserved, return null.
Initialize as the -th leading singular value of and ,
as the corresponding left and right eigenvectors for.
Repeat until convergence:
Fixing , calculate that minimizes (7). Specifically, for , let
where is the -th row of , is the -th row of , and is a diagonal matrix whose -th diagonal is 1 if and is 0 otherwise.
Fixing , calculate that minimizes (7). Specifically, for , let
where is the -th row of , is the -th column of , and is a diagonal matrix whose -th diagonal is 1 if and is 0 otherwise.
Normalizing the and such that
is a SVD in standard form.
Return the values of from the last iteration.
3.2 Configuration Determination
Section 3.1 provides the algorithm to estimate from the partially observed when an arbitrary allowed Kronecker product configuration is given. It remains to estimate the true configuration from all possible ones. Again, let be the dimension of and let be all the factors of and be all the factors of . Note that and for all and . Hence all possible configurations are in the set
where the configurations and are excluded because they correspond to trivial Kronecker products in which one of the matrices is a scalar. To simplify notation, in the following, we use to denote the configuration, instead of . Note that for any configuration in the candidate set , the algorithm discussed in Section 3.1 gives an empty output when a missing row or column is observed after rearrangement and returns the recovered matrix when it is feasible. Denote the set of candidate configurations for which the algorithm in Section 3.1 returns an recovered matrix as
Since is the set of all possible candidates under which can be recovered, before conducting any configuration selection procedure over , we need to assume that the true configuration belongs to . In other words, the Kronecker matrix completion problem defined in Section 2 is feasible at least for the true configuration.
(Feasibility Assumption) For any column or row in the matrix , at least one of its elements is observed, where is the true KPD configuration of the signal matrix .
Remarks: The feasibility assumption ensures that the complete signal matrix can be faithfully estimated in theory. In practice when an identified configuration is infeasible, certain missing entries in cannot be recovered. We call these entries ’infeasible entries’ under the specific configuration. Note that different infeasible configurations yield different infeasible entries. Hence the infeasible entries under the optimal configuration may be estimated with an alternative (and possibly nearly as good) configuration that are feasible for these specific entries. We also note that the inverse arrangement operation tends to scatter the infeasible entries in the original matrix formation. If we assume the signal matrix has certain local smoothness, the infeasible entries can also be filled using the average of its neighboring entries. These strategies are currently under investigation. In this paper we focus on the cases that the true configuration is feasible.
With the feasibility assumption we propose to determine the configuration through minimizing the information criterion below over .
where is the recovered matrix under configuration , is the number of parameters in the Kronecker product model (4) and is the coefficient of penalty on model complexity. When , the information criterion is a monotone function of the mean squared error (on observed entities). When , corresponds to the Akaike information criterion (AIC) (Akaike, 1998). corresponds to the Bayes information criterion (BIC) (Schwarz, 1978). An estimate of the true configuration is obtained by
Here we heuristically discuss why the minimization in (9) gives the true configuration
. For a fixed observing probability, when the size of matrix is large enough such that , the number of observed entities increases to infinity as well such that . Therefore, since the elements in are observed independently. As pointed out in Section 3.1, is approximately the (Kronecker) low rank approximation of under configuration . The information criterion in (8) approximates the information criterion at , which is
As proved in Cai et al. (2019), the information criterion
can select the true configuration consistently when the signal-to-noise ratioexceeds certain threshold. Consequently, as long as approximates well, the true configuration is expected to be selected consistently by the minimization in (9). A rigorous theoretical investigation is needed though.
Another difference between the configuration selection procedure (9) for Kronecker matrix completion and the one for Kronecker product models in Cai et al. (2019) is the candidate set. The Kronecker product model corresponds to the case when all entities are observed () and the configuration selection is conducted over all non-trivial configurations . In Kronecker matrix completion, the configuration selection is limited to a subset under the feasibility assumption. Configurations close to the corners and are more likely to be excluded from . To see it, recall is the probability that each element in is observed. The probability that at least one entire column is missing is
Similarly, the probability of missing an entire row in is
Assuming sufficiently large , the configurations close to has and the configurations close to has . The probability as a function of when for different values of is plotted in Figure 2, with , , and .
It is pointed out in Cai et al. (2019) that the most competitive wrong configurations in terms of the full matrix information criterion are likely being those with or . And the requirement on the signal-to-noise ratio for consistently selecting ensures the corner configurations do not outperform the true configuration. However, in the Kronecker matrix completion problem, the corner cases have a high probability to be excluded from the candidate set as shown in Figure 2. It implies that the configuration selection in (9) can perform well even when the signal-to-noise ratio is small as long as is sufficiently large.
Remark: In practice the matrix dimension and may not have many factors, which limits the flexibility of the KPD approach as the candidate set can be quite small. In this case it is possible to augment the observed matrix with additional missing rows and columns so that the new dimension and significantly expand the candidate set. One such choice is to make and . Actually by using different and the candidate set can be expanded significantly. With a good configuration determination procedure, significant improvement in matrix completion tasks can be obtained. Of course, augmentation violates the i.i.d missing assumption hence the theoretical properties may be different and difficult to obtain, though it has no impact on practical implementations.
4 Empirical Examples
In this simulation experiment, we exam the performance of the configuration determination procedure in the Kronecker product decomposition under a rank-1 setting. Specifically, the data is generated in a random scheme, where and are realizations of Gaussian random matrices of size and size , respectively, with i.i.d entries. Let
where is the parameter used to control the signal-to-noise ratio. After re-parametrization of above construction, we have the identifiable parameters as
The underlying complete matrix is generated according to
where is a Gaussian noise matrix with i.i.d. standard normal entities. The observed set is generated independently such that for all .
In this simulation, we set (hence the observed matrix is and .) A combination of six different values and five different probabilities are considered. Three different types of information criteria are used: the mean squared error (MSE) (), AIC () and BIC(). For each combination of and , we repeat the simulation 100 times and record the number of repetitions that the true configuration is selected by the minimization in (9). The result is reported in Table 1.
In general, AIC and BIC perform better than MSE, especially when the is large and the signal-to-noise ratio is small. For MSE criterion, we observe that the configuration selection works better under smaller value of , while for AIC and BIC criterion, larger values of have higher successes in configuration selection. This phenomenon can be explained by the following two factors, which play important roles in the configuration determination. On one hand, with more entities observed (large ), the information criterion can approximate better because and more entities are involved in calculating the mean squared error. On the other hand, higher value of means more corner configurations are included in the candidate set and the information criterion values of these corner configurations can be close to the one of the true configuration. In AIC and BIC, the corner configurations are automatically excluded by the penalty term on number of parameters. As a result, the first factor plays an important role, which implies more accurate configuration selection under larger . In MSE criterion, the model complexity is not penalized and the second factor dominates the first one. A lower value of helps ruling out corner configurations and improves the performance in configuration selection. The threshold value for MSE to be consistent in configuration selection is below 0.6 for this simulation, which is lower than that of Kronecker product model considered in Cai et al. (2019).
With the configuration selected by , matrix can be recovered through the Kronecker matrix completion with respect to . We measure the error of recovered matrix by
which is the Frobenius norm of error matrix, normalized by the Frobenius norm of . Since BIC performs the best in determining the configurations in this setting, we only report the estimation error under the BIC configuration. The averaged error of over 100 repetitions are reported in Table 2. Table 2 reveals that the error in decreases when more entities are observed and when the signal-to-noise ratio increases. As a comparison, SVD matrix completion, whose rank is selected such that the number of parameters are similar to the one selected by BIC, is reported in Table 2 as well. We notice that, with (almost) same number of parameters used, KPD matrix completion with configuration selected by BIC is uniformly better than its SVD counterpart when the true matrix has a low Kronecker rank structure instead of low matrix rank structure. This simulation also reveals that it is not optimal to apply classical matrix completion on matrices with Kronecker product structure.
|KPD with BIC||0.1||1.158||0.833||0.571||0.417||0.300||0.206|
4.2 Real image example
In this section, we apply the matrix completion approach for Kronecker products to images based on the image of Lenna. The original image is colored picture, which has been widely used as a benchmark in image processing researches. The colored image has three channels: red, blue and green, corresponding to three matrices, whose entities are integers between 0 and 255. To simplify the demonstration, we converted the original color image to grayscale as shown in Figure 3 (left image) such that the image can be represented by a matrix of real numbers between 0 and 1. 0 stands for black and 1 stands for white.
To simulate the process of noise corruption and missing data, we first add a noise matrix to the grayscale image such that
where the entries of are i.i.d. standard Gaussian errors. We set and the corrupted image is shown in the middle of Figure 3. We set the missing rate to 80% () and generated the missing set with i.i.d Bernoulli(). Let be the observed matrix which is plotted in the right of Figure 3, where missing entities are filled with white.
We follow the configuration determination procedure proposed in Section 3.2. Based on , all information criteria MSE (), AIC () and BIC () select the configuration , corresponding to decompose to the Kronecker product of a matrix and a matrix. Recovered images using configuration and ranks 1 to 3 are shown in in Figure 4. The face can be recognized from the rank-one recovered matrix and more details are added as the rank increases.
To compare the performance of matrix completion through KPD with the classical approach through low rank SVD, the observed matrix with 80% missing is fitted by the alternating minimization algorithm assuming a SVD structure, which is equivalent to apply KPD matrix completion with the configuration . SVD matrix completion with ranks 4, 8 and 12 are fitted to match the numbers of parameters of ranks one to three of KPD matrix configuration . The recovered matrices from SVD matrix completion are shown in Figure 5. The superiority of the KPD approach is obvious from the images. Besides judging the recovered images by eyesight, we measure the error of recovered matrix by
where is the image without noise and is the recovered matrix by matrix completion through either KPD or SVD. Table 3 reports the errors of fitted matrices from KPD approaches and the ones of SVD matrix completion with similar number of parameters. Notice that, extreme values in can be trimmed such that elements greater than 1 are replaced with value 1 and elements less than 0 are replaced with value 0. The errors on the trimmed matrices are also reported in Table 3. The discrepancy between the error of fitted matrix and the error of trimmed fitted matrix indicates any overfitting in the completion problem – the errors on the observed entities are minimized, while the errors on the unobserved entities are not. By trimming extreme pixel values, the error for KPD matrix completion improves a little bit, while the one for SVD matrix completion improves a lot in rank 8 and rank 12 cases. It shows that the SVD approach is not as robust as the KPD approach. The severe overfitting in SVD is partially due to high missing proportion. With similar number of parameters, KPD matrix completion can recover the Lenna’s image more accurately compared with SVD matrix completion. The result is anticipated since KPD matrix completion has more flexibility in selecting the configurations, with SVD matrix completion being one of its special cases. The proposed configuration determination procedure is able to find the best configuration, and the best results from the best configuration is expected to performance at least as good as the SVD one.
In this article, we extend the classical matrix completion algorithms under a low rank matrix assumption to that under a low rank Kronecker product assumption. The new model includes the traditional model as a special case. The major challenge is to determine the unknown underlying Kronecker product configuration. We propose an information criterion for the configuration determination. By choosing the configuration with minimum information criterion among all candidate configurations, the matrix completion task can be completed with the estimated configuration. The estimation of the underlying full matrix under a given configuration is done by rearrange the observed matrix to align with the classical matrix completion problem and the problem is solved with alternating minimization algorithm (Jain et al., 2013). The information criterion is based on the estimation error of the observed entities and the model complexity measured by the number of parameters used under the configuration.
As we pointed out in Section 3.2, the information criterion used in Kronecker matrix completion problem is an approximation of the one proposed by Cai et al. (2019) for denoising a (fully observed) matrix using a Kronecker product model. Cai et al. (2019) showed that the information criterion is consisted in determining the underlying configuration under certain signal to noise ratio condition. We are currently investigating the theoretical properties of the information criterion under the stable matrix completion setting. The simulation study in Section 4.1 shows promising results that such a configuration determination procedure may be consistent in selecting the true configuration when the signal-to-noise ratio exceeds certain threshold and when the missing proportion is controlled. We also showed, in the empirical study including the completion of an image with unknown underlying structure, that the KPD approach to matrix completion can significantly improve the performance obtained by the traditional approach under low rank matrix assumption.
Our extension of classical matrix completion problem to the setting of Kronecker product has a wide range of applications. On one hand, the classical low rank matrix completion problem corresponds to a special configuration ( and ) in the Kronecker matrix completion problem. The Kronecker matrix completion considers a wider range of possible configurations. On the other hand, we have noticed many high dimensional data in signal processing and image processing nowadays appear to have the Kronecker product structure (Kamm and Nagy, 1998; Werner et al., 2008; Duarte and Baraniuk, 2012). The classical matrix completion methods cannot handle such data effectively. The Kronecker product approach also provides significant flexibility for solving matrix completion problems by expanding the candidate configuration set using augmentation of the observed matrix, as discussed in Section 3.
The flexibility of KPD approach also introduces further improvement through model averaging – the averaging of completion estimates under different configurations. Averaging also helps to deal with missing patterns. Note that the traditional completion algorithm under low rank matrix assumption cannot deal with the situation of missing complete rows or columns. KPD under one configuration also has the problem that it cannot recover some entries if the rearranged matrix under the configuration has rows and columns missing completely. Model averaging can significantly increase the likelihood of being able to recover all missing entries under multiple configurations. The approach is currently under study.
- Information theory and an extension of the maximum likelihood principle. In Selected Papers of Hirotugu Akaike, pp. 199–213. External Links: Cited by: §3.2.
- Semidefinite programming based algorithms for sensor network localization. ACM Transactions on Sensor Networks (TOSN) 2 (2), pp. 188–220. Cited by: §1.
- On the consistency and configuration determination of kronecker product decomposition. Note: Technical Report Cited by: §1, §3.1, §3.2, §3.2, §3.2, §3, §4.1, §5.
- Matrix completion with noise. Proceedings of the IEEE 98 (6), pp. 925–936. Cited by: §1, §2, §3.1.
- Exact matrix completion via convex optimization. Foundations of Computational mathematics 9 (6), pp. 717. Cited by: §1, §2, §3.1.
- The power of convex relaxation: near-optimal matrix completion. IEEE Transactions on Information Theory 56 (5), pp. 2053–2080. Cited by: §1.
- Stable rank one matrix completion is solved by two rounds of semidefinite programming relaxation. arXiv preprint arXiv:1801.00368. Cited by: §1.
- Kronecker compressive sensing. IEEE Transactions on Image Processing 21 (2), pp. 494–504. External Links: Cited by: §1, §5.
- The approximation of one matrix by another of lower rank. Psychometrika 1 (3), pp. 211–218. Cited by: §3.1.
- Using collaborative filtering to weave an information tapestry. Communications of the ACM 35 (12), pp. 61–71. Cited by: §1.
Low-rank matrix completion using alternating minimization.
Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pp. 665–674. Cited by: §1, §2, §3.1, §5.
- Kronecker product and svd approximations in image restoration. Linear Algebra and its Applications 284 (1), pp. 177 – 192. Note: International Linear Algebra Society (ILAS) Symposium on Fast Algorithms for Control, Signals and Image Processing External Links: Cited by: §1, §5.
- Matrix completion from noisy entries. Journal of Machine Learning Research 11 (Jul), pp. 2057–2078. Cited by: §1, §3.1.
- Multiple emitter location and signal parameter estimation. IEEE transactions on antennas and propagation 34 (3), pp. 276–280. Cited by: §1.
- Estimating the dimension of a model. Ann. Statist. 6 (2), pp. 461–464. External Links: Cited by: §3.2.
- Approximation with kronecker products. In Linear algebra for large scale and real-time applications, pp. 293–314. Cited by: §3.1.
- On estimation of covariance matrices with kronecker product structure. IEEE Transactions on Signal Processing 56 (2), pp. 478–491. External Links: Cited by: §1, §5.
- On tensor completion via nuclear norm minimization. Foundations of Computational Mathematics 16 (4), pp. 1031–1068. Cited by: §1.