Clipped Matrix Completion: a Remedy for Ceiling Effects

09/13/2018 ∙ by Takeshi Teshima, et al. ∙ 0

We consider the recovery of a low-rank matrix from its clipped observations. Clipping is a common prohibiting factor in many scientific areas that obstructs statistical analyses. On the other hand, matrix completion (MC) methods can recover a low-rank matrix from various information deficits by using the principle of low-rank completion. However, the current theoretical guarantees for low-rank MC do not apply to clipped matrices, as the deficit depends on the underlying values. Therefore, the feasibility of clipped matrix completion (CMC) is not trivial. In this paper, we first provide a theoretical guarantee for an exact recovery of CMC by using a trace norm minimization algorithm. Furthermore, we introduce practical CMC algorithms by extending MC methods. The simple idea is to use the squared hinge loss in place of the squared loss well used in MC methods for reducing the penalty of over-estimation on clipped entries. We also propose a novel regularization term tailored for CMC. It is a combination of two trace norm terms, and we theoretically bound the recovery error under the regularization. We demonstrate the effectiveness of the proposed methods through experiments using both synthetic data and real-world benchmark data for recommendation systems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Ceiling effect is a measurement limitation that occurs when the highest possible score on a measurement instrument is reached, thereby decreasing the likelihood that the testing instrument has accurately measured the intended domain (Salkind, 2010). The ceiling effect has long been discussed across a wide range of scientific fields such as sociology (DeMaris, 2004), educational science (Kaplan, 1992; Benjamin, 2005), biomedical research (Austin and Brunner, 2003; Cox, 1984), and health science (Austin et al., 2000; Catherine et al., 2004; Voutilainen et al., 2016; Rodrigues et al., 2013), because it is a crucial information deficit known to inhibit effective statistical analysis (Austin and Brunner, 2003).

The ceiling effect is also conceivable in the context of machine learning, e.g., in recommendation systems with a five-star rating. After rating an item with a five-star, a user may find another item much better later. In this case, the true rating for the latter item should be above five, but the recorded value is still a five-star. As a matter of fact, we can observe right-truncated shapes indicating ceiling effects in the histograms of well-known benchmark data sets for recommendation systems, as shown in Figure 

2.

Restoring data from ceiling effects can lead to multiple benefits in many fields. The recovered data may provide us with further findings in the case of scientific research that suffers from ceiling effects in its measurements. The recommendation system may be able to find latent superiority or inferiority between items with the highest ranking and predict unobserved entries better.

In this paper, we investigate methods for restoring a matrix data from ceiling effects, which is a potential novel remedy for ceiling effects. In particular, we consider the recovery of a clipped matrix, i.e., values are clipped at a predefined threshold in advance to observation, because ceiling effects are often modeled as a clipping phenomenon (Austin and Brunner, 2003).

Movielens 100K

Rating

Probability

FilmTrust

Rating

Probability
Figure 1: Ceiling effects may exist in benchmark data sets of recommendation systems (details of the data are in Section 6). The histograms of the rated values are plotted. The right-truncated look of the histogram is typical for a variable under ceiling effects (Greene, 2012).
(a) True matrix
(b) Observation
(c) Restored
Figure 2: Illustration of the task of CMC. The original low-rank matrix has a distinct structure of large values. However, the observed data is clipped at a predefined threshold . The goal of CMC is to restore from the value of and . The illustrated matrix (fig:res-illust-CMC) is an actual result of applying a proposed method (Fro-CMC).

1.1 Our problem: clipped matrix completion (CMC)

We consider the recovery of a low-rank matrix whose observations are clipped at a predefined threshold (Figure 2). We call this problem clipped matrix completion (CMC). Let us first introduce its background, low-rank matrix completion.

Low-rank matrix completion (MC) aims to recover a low-rank matrix from various information deficits, e.g., missing, noise, or discretization (Candès and Recht, 2009; Candès and Plan, 2010; Recht, 2011; Chen et al., 2015; Davenport et al., 2014; Lan et al., 2014; Bhaskar, 2016)

. The principle that enables low-rank MC is the dependency among entries of a low-rank matrix; each element can be expressed as an inner product of the latent feature vectors of the corresponding row and the column. With the principle of low-rank MC, we may be able to recover the entries of a matrix from a ceiling effect.

Clipped matrix completion (CMC)

The clipped matrix completion (CMC) problem is illustrated in Figure 2. It is a problem to recover a low-rank matrix from random observations of its entries.

Formally, the goal of CMC in this paper can be stated as follows. Let be the ground-truth low-rank matrix where , and be the clipping threshold. Let be the clipping operator that operates on matrices element-wise. We observe a subset of entries of . The set of observed indices is denoted by . The goal of CMC is to accurately recover from , and .

The limitations of MC.

There are two limitations regarding the application of existing MC methods to the CMC problem.

  1. The applicability of the principle of low-rank MC to clipped matrices is non-trivial, because clipping occurs depending on the underlying values whereas the existing theoretical guarantees of MC methods presume the information deficit (e.g., missing or noise) to be independent of the values (Bhojanapalli and Jain, 2014; Chen et al., 2015; Király et al., 2015; Liu et al., 2017).

  2. Most of existing MC methods fail to take ceiling effects into account, as they assume that the observed values are equal to or close to the true values (Candès and Recht, 2009; Candès and Plan, 2010), whereas clipped values may have a large gap from the observed values.

The goal of this paper is to overcome these limitations and to propose low-rank completion methods suited for CMC.

1.2 Our contribution and approach

From the perspective of MC research, our contribution is three-fold.

1) We provide a theoretical analysis to establish the validity of the low-rank principle in CMC (Section 2).

To do so, we provide an exact recovery guarantee: a sufficient condition for a trace norm minimization algorithm to perfectly recover the ground truth matrix with high probability. As the first step, our analysis is based on the notion of incoherence (Candès and Recht, 2009; Recht, 2011; Chen et al., 2015).

2) We propose practical algorithms for CMC (Section 3) and provide an analysis of the recovery error (Section 4).

We propose practical CMC methods which are extensions of the Frobenius norm minimization that is well used for MC (Toh and Yun, 2010)

. The simple idea of extension is to replace the squared loss function with the squared hinge loss to reduce the penalty of over-estimation on clipped entries. We also propose a regularizer consisting of two trace norm terms, which is motivated by a theoretical analysis of a recovery error bound.

3) We conducted experiments using synthetic and real-world data to demonstrate the validity of the proposed methods (Section 6).

Using synthetic data with known ground truth, we confirmed that the proposed CMC methods can actually recover randomly-generated matrices from clipping. We also investigated the improved robustness of CMC methods against the existence of clipped training entries in comparison with MC methods. Using real-world data, we conducted two experiments to validate the effectiveness of the CMC methods.

1.3 Additional notation

For commonly-used notation, please see Table 4 in Appendix. The symbols , and are used throughout the paper. Let be the rank of . The set of observed clipped indices is denoted by . Given a set of indices , we define its projection operator by , where denotes the indicator function giving if the condition is true and otherwise.

2 Feasibility of the CMC problem

As noted earlier, it is not trivial if the principle of low-rank MC is guaranteed to recover clipped matrices. In this section, we establish that the principle of low-rank completion is still valid for some matrices by providing a sufficient condition under which an exact recovery by trace norm minimization is achieved with high probability.

We consider a trace-norm minimization for CMC

(1)

where “s.t.” stands for “subject to.” Note that the optimization problem Eq. (1) is convex, and there are algorithms that can solve it numerically (Liu and Vandenberghe, 2010).

2.1 Definitions and intuition of characteristic quantities

Here, we define the quantities required for stating the theorem. The quantities reflect the difficulty of recovering , therefore the sufficient condition stated in the theorem will be that these quantities are small enough. Let us begin with the definition of coherence that captures how the information of a matrix is distributed around its entries (Candès and Recht, 2009; Recht, 2011; Chen et al., 2015).

Def. 1 (Coherence and joint coherence (Chen et al., 2015)).

Let

have a skinny singular value decomposition

. We define

where () is the -th (resp. -th) row of (resp. ). Now the coherence of is defined by

In addition, we define the following joint coherence

The feasibility of CMC depends upon the amount of information that the clipping can hide. To characterize the amount of information obtained from observations of , we define a subspace that is used in the existing recovery guarantees for MC (Candès and Recht, 2009).

Def. 2 (The information subspace of (Candès and Recht, 2009)).

Let be a skinny singular value decomposition ( and ). We define

where are the -th column of and , respectively. Let and denote the projections onto and , respectively.

Using , we define the quantities to capture the amount of information loss due to clipping, in terms of different matrix norms representing different types of dependencies. To express the factor of clipping, we define a transformation on that describes the amount of information left after observation. Therefore, if these quantities are small, it is implied that enough information for recovering is preserved after clipping.

Def. 3 (The information loss measured in various norms).

Define

where the operator is defined by

where .

In addition, we define the following quantity that captures how much information owes to the clipped entries of . If this quantity is small, it is implied that enough information of is left in non-clipped entries.

Def. 4 (The importance of clipped entries for ).

Define

where .

We follow Chen et al. (2015) to assume the following observation scheme. As a result, it amounts to assuming that is a result of random sampling where each entry is observed with probability independently.

Assumption 1 (Assumption on the observation scheme).

Let . Let and . For each , let be a random set of matrix indices that were sampled according to independently. Then, was generated by .

The need for Assumption 1 is technical (Chen et al., 2015). Refer to the proof in Appendix C for details.

2.2 The theorem

We are now ready to state the theorem.

Theorem 1 (Exact recovery guarantee for CMC).

Assume , and Assumption 1 for some . For simplicity of the statement, assume and . If, additionally,

is satisfied, then the solution of Eq. (1) is unique and equal to with probability at least , where

The proof and the precise expressions of and are available in Appendix C. The characteristic quantities (Def. 3 and Def. 4) do not appear in either the order of or that of , but they appear as coefficients and deterministic conditions that enable the theorem to hold. The existence of a deterministic condition is in accordance to the intuition that an all-clipped matrix can never be completed no matter how many entries are observed.

3 Practical algorithms

In this section, we introduce practical algorithms for CMC (clipped matrix completion). The trace norm minimization (Eq. (1)) is known to require impractical running time as the size of the problem increases from small to moderate or large (Cai et al., 2010).

A popular method for matrix completion is to minimize the Frobenius norm between the predicted matrix and the observed matrix, under some regularization (Toh and Yun, 2010). We develop our CMC methods from this approach.

Throughout this section, generally denotes an optimization variable, which may be further parametrized by (where for some ). Regularization terms are denoted by , and regularization coefficients by .

Frobenius norm minimization for MC.

In the MC methods based on the Frobenius norm minimization (Toh and Yun, 2010), we define

(2)

and obtain the estimator by

(3)

The problem in using this method for CMC is that it is not robust to clipped entries as the loss function is designed under the belief that the true values are close to the observed values. We extend this method for CMC with a simple idea.

The general idea of extension.

The general idea of extension is not to penalize the estimator on clipped entries when the predicted value exceeds the observed value. Therefore, we modify the loss function to

(4)

where is the squared hinge loss, which does not penalize over-estimation. Then we obtain the estimator by

(5)

From here, we discuss three designs of regularization terms for CMC. The methods are summarized in Table 1, and further details of the algorithms can be found in Appendix A.

Double trace norm regularization.

We first propose to use . For this method, we conducted a theoretical analysis of the recovery error, which is provided in Section 4. For the optimization, we employ an iterative method based on subgradient descent (Avron et al., 2012). Even though the second term, , is a composition of a nonlinear mapping and a non-smooth convex function, we can take advantage of its simple structure to approximate it with a convex function of whose subgradient can be calculated for each iteration. We refer to this algorithm as DTr-CMC (Double Trace-norm regularized CMC).

Trace norm regularization.

Trace norm regularization is a method to relax the trace norm minimization (Eq. (1)) by replacing the exact constraints by the quadratic penalties (Eq. (2) for MC and Eq. (4) for CMC). For the optimization, we can employ an accelerated proximal gradient (APG) algorithm proposed by Toh and Yun (2010), by taking advantage of the differentiability of the squared hinge loss. We refer to this algorithm as Tr-CMC (Trace-norm-regularized CMC), in contrast to Tr-MC (its MC counterpart; Toh and Yun, 2010).

Frobenius norm regularization.

This method first parametrizes as and use for regularization. A commonly used method for the optimization in the case of MC is the alternating least squares (ALS) method (Jain et al., 2013). Here, we employ an approximate optimization scheme motivated by ALS for our experiments. We refer to this algorithm as Fro-CMC (Frobenius-norm-regularized CMC), in contrast to Fro-MC (its MC counterpart; Jain et al., 2013).

Method Param. Loss on Reg. Opt.
DTr-CMC Sq. hinge Tr + Tr SUGD
Tr-CMC Sq. hinge Tr APG
Fro-CMC Sq. hinge Fro ALS
Table 1: List of the proposed methods for CMC (Fro: Frobenius norm, Tr: Trace norm, Sq.hinge: Squared hinge loss, SUGD: SUb-Gradient Descent, APG: Accelerated Proximal Gradient, ALS: Alternating Least Squares, Param.: Parametrization, Reg.: Regularization, Opt.: Optimization).

4 Theoretical analysis for DTr-CMC

In this section, we provide a theoretical guarantee for DTr-CMC. Let be the hypothesis space

for some and . Here, we analyze the estimator

(6)

The minimization objective of Eq. (6) is not convex. However, it is upper bound by the convex loss function (Eq. (4)) (the proof is provided in Appendix A.1). Therefore, DTr-CMC can be seen as a convex relaxation of Eq. (6) with the constraints turned into regularization terms. To state our theorem, we define the unnormalized coherence of a matrix.

Def. 5 (Unnormalized coherence).

Here, we consider unnormalized coherence defined by

using and from Def. 1. Here we use an unnormalized definition for ease of notation.

Now we are ready to state our theorem.

Theorem 2 (Theoretical guarantee for DTr-CMC).

Suppose that , and that is generated by independent observation of entries with probability . Let , and be a solution to the optimization problem Eq. (6). Then there exist universal constants and , for which with probability at least we have

(7)

and

We provide the proof in Appendix D. The right hand side of Eq. (7) converges to zero as with , and fixed. From this theorem, it is expected that if and are believed to be small, DTr-CMC can accurately recover .

5 Related work

In this section, we describe related work from the literature on matrix completion and that on ceiling effects. Table 2 provides a brief comparison of the related work on matrix completion.

5.1 Matrix completion methods.

Theory:

Our feasibility analysis in Section 2 followed the approach of Recht (2011) while basing some details of the proof on Chen et al. (2015). There is further research to weaken the assumption of the uniformly random observation (Chen et al., 2015; Bhojanapalli and Jain, 2014). For simplicity, we omit such extensions. Nevertheless, we believe it is relatively easy to incorporate such additional factors into our theoretical analysis.

Our theoretical analysis for DTr-CMC in Section 4 is inspired by the theory for 1-bit matrix completion (Davenport et al., 2014). The difference is that our analysis effectively captures the additional low-rank structure in the clipped matrices in addition to the original matrix.

Problem setting:

Our problem setting of clipping can be related to quantized matrix completion (Lan et al., 2014; Bhaskar, 2016). Lan et al. (2014) and Bhaskar (2016) formulated a probabilistic model which assigns discrete values according to a distribution conditional on the underlying values of a matrix. Bhaskar (2016)

provided an error bound for restoring the underlying values, assuming that the quantization model is fully known. The model of Q-MC can provide a different formulation for ceiling effects than ours by assuming the existence of latent random variables. However, Q-MC methods require the data to be fully discrete

(Lan et al., 2014; Bhaskar, 2016). Therefore, neither the method nor the theory can be applied to real-valued observations. On the other hand, our methods and theories allow observations to be real-valued. We believe that the ceiling effect is worth studying independently from quantization, since the data analyzed under ceiling effects are not necessarily discrete.

Methodology:

The use of the Frobenius norm for MC has been studied for MC from noisy data (Candès and Plan, 2010; Toh and Yun, 2010). Our algorithms are based on this line of research, while extending them for CMC.

Methodologically, (Mareček et al., 2017) is closely related to our Fro-CMC. Mareček et al. (2017) considered completion of missing entries under “interval uncertainty” which yield interval constraints indicating the ranges in which the true values should reside. They employed the squared hinge loss for enforcing the interval constraints in their formulation, hence coinciding with our formulation of Fro-CMC. There are a few key differences between their work and ours. First, our motivations are quite different. We are analyzing a different problem than theirs. They considered completion of missing entries with robustness, whereas we considered recovery of clipped entries. Secondly, they did not provide any theoretical analysis of the problem. We provided an analysis by specifically looking at the problem of clipping. Lastly, as a minor difference, we employed an ALS-like algorithm whereas they used a coordinate descent method (Mareček et al., 2017; Marecek et al., 2018), as we found the ALS-like method to work well for moderate sized matrices.

5.2 Related work on ceiling effects.

From the perspective of dealing with ceiling effects, the present paper adds a novel potentially effective method for the analysis of data affected by a ceiling effect. Ceiling effect is also referred to as censoring (Greene, 2012) or

limited response variables

(DeMaris, 2004). In this paper, we used “ceiling effect” to represent these phenomena. In econometrics, the de facto standard of dealing with ceiling effects is to use the Tobit model (Greene, 2012). In Tobit models, a censored likelihood function is modeled, and is maximized with respect to the parameters of interest. Although this method is justified by the theory of M-estimation (Schnedler, 2005; Greene, 2012), it does not automatically guarantee a use for matrix completion. In addition, Tobit models require strong distributional assumptions. This is problematic especially if the distribution cannot be safely assumed.

Type of deficit Related work
Missing Candès and Recht (2009) etc.
Noise Candès and Plan (2010) etc.
Quantization Bhaskar (2016) etc.
Clipping This paper
Table 2: Our target problem is the restoration of a low-rank matrix from clipping at a predefined threshold. No existing work has considered this type of information deficit.

6 Experimental results

In this section, we show the results for numerical experiments to compare the proposed CMC methods to the MC methods to demonstrate the effectiveness of our approach.

6.1 Experiment with synthetic data

We conducted an experiment to recover randomly generated data from clipping. The primary purpose of the experiment was to confirm that the principle of low-rank completion is still effective for the recovery of a clipped matrix, as indicated by Theorem 1. Additionally, within the same experiment, we investigated how sensitive the MC methods are to the disturbance of the existence of clipped training entries by looking at the growth of the recovery error on non-clipped test entries in relation to increased rates of clipping.

Data generation process.

We randomly generated non-negative integer matrices in that are exactly rank- and of the same magnitude parameter (see Appendix B). The generated data was randomly split into three with ratio , then the first part was clipped at the threshold , to generate the training (), the validation (), and the testing () matrix, respectively. We iterated over , and was fixed at .

Evaluation metrics.

We used the relative root mean square error (rel-RMSE) as the evaluation metric, and we considered a result as a good recovery when the error is of order

(Toh and Yun, 2010)

. We separately reported the rel-RMSE on two sets of entries: the whole indices, and the non-clipped test entries. For tuning of hyperparameters, we used the rel-RMSE on validation indices:

. We reported the mean of five independent runs. The clipping rate was calculated by the ratio of entries of above .

Compared methods.

We evaluated the proposed CMC methods (DTr-CMC, Tr-CMC, and Fro-CMC) and their MC counterparts (Tr-MC and Fro-MC). We also applied MC methods after ignoring all clipped training entries (Tr-MCi and Fro-MCi, with “i” standing for “ignore”). While this treatment wastes some training data, it may improve the robustness of MC methods against the existence of clipped training entries.

Result 1: the validity of low-rank completion.

In Figure (3fig:experiment:synthetic:1), we show the rel-RMSE for different clipping rates. The proposed methods successfully recover the true matrices with very low error of order even when half the observed training entries are clipped. One of them (Fro-CMC) was able to recover the matrix after the clipping rate was above . This may be explained in part by the fact that the synthetic data were exactly low rank, and that the correct rank was in the search space of the bilinear model of the Frobenius norm based methods.

Result 2: the robustness against the existence of clipped training entries.

As seen in Figure (3fig:experiment:synthetic:2), the test error of recovery by MC methods on non-clipped entries increased with the rate of clipping. This indicates the disturbance effect for MC methods due to the existence of the clipped training entries. The MC methods with ignoring the clipped training entries (Tr-MCi and Fro-MCi) were also prone to increasing test error on non-clipped entries in the region of high clipping rates, most likely due to wasting too much information. On the other hand, the proposed methods show an improved profile of growth, indicating an improved robustness.

Clipping rate
(a) On all test entries

Clipping rate
(b) On non-clipped test entries
Figure 3: Relative RMSE for varied (Dotted: previous MC methods, Solid: proposed CMC methods).

6.2 Experiments with real-world data

We conducted two experiments using real-world data. The difficulty of evaluating CMC with real-world data is that there are no known ground truths, i.e., the true values unaffected by the ceiling effect. Therefore, instead of evaluating the accuracy of recovery for the ground-truth (which is unavailable), we evaluated the performance of distinguishing entries with the ceiling effect and those without. Therefore, to measure the performance of CMC methods, we considered two binary classification tasks in which we predict whether held-out test entries are of high ratings. The tasks are reasonable, because the purpose of a recommendation system is usually to predict which entries have high scores.

Preparation of data sets.

We used the following benchmark data sets of recommendation systems.

  • FilmTrust (Guo et al., 2013)111https://www.librec.net/datasets.html consists of ratings from 1,508 users to 2,071 movies on a scale from to

    with a stride of

    (approximately 99.0% missing). For ease of comparison with other data, we doubled the values of the data so that the ratings are integers from to .

  • Movielens (100K)222http://grouplens.org/datasets/movielens/100k/ consists of ratings from 943 users to 1,682 movies on an integer scale from to (approximately 94.8% missing).

Task 1: using artificially clipped training data

In the first task, we artificially clipped the training data and predicted whether the test entries were above the threshold. We artificially clipped the training entries at the threshold . We used for FilmTrust, and for Movielens. We then predicted whether the ratings of the test entries were above the threshold. For prediction, we set the prediction threshold at , and predicted positively for entries above and negatively otherwise.

Task 2: using raw data

In the second task, we used the raw training data and predicted whether the test entries were the maximum value. For running CMC methods, we treated the maximum value of the rating as , i.e., for FilmTrust, and for Movielens. We then predicted whether the ratings of the test entries were the maximum value. For prediction, we set the prediction threshold at , and predicted positively for entries above and negatively otherwise.

Protocols and evaluation metrics.

In both experiments, we first split the observed indices of the raw data with ratio , which were used as training, validation, and test indices. Then for the first task, we artificially clipped the training data at . If a user or an item had no training entries, we removed them from all matrices.

We measured the performance by the f score. Hyperparameters were selected according to the f

score on the validation entries. We reported the mean and the standard error after five independent runs.

Compared methods.

We compared the proposed CMC methods with the corresponding MC methods. The uninformative baseline in these experiments (indicated as “baseline”) is to predict positive for all entries, for which the recall is and the precision is the ratio of the positive class.

Result for task 1

The results are compiled in Table 3. By comparing the results between the CMC methods and their corresponding MC methods, we conclude that CMC methods have improved the ability to recover clipped values in real-world data as well. Considering that the completion methods are not designed to optimize the accuracy on the high values, we regard it is acceptable that some MC methods scored below the baseline.

Result for task 2

The results are compiled in Table 3. The CMC methods show improved performance for predicting entries of the maximum value of rating than MC methods. Considering that the only difference between the improved CMC methods and the corresponding MC methods is the use of squared hinge loss function, we regard this is an indication of an improved robustness against the existence of the clipped training entries.

One interesting fact is that we obtain the performance improvement by only changing the loss function to be robust to ceiling effects and without adding extra complexity to the model (such as introducing an ordinal regression model).

Data Methods Task 1 f Task 2 f
Film DTr-CMC 0.47 (0.01) 0.46 (0.01)
Trust Fro-CMC 0.35 (0.01) 0.40 (0.01)
Fro-MC 0.27 (0.01) 0.35 (0.01)
Tr-CMC 0.36 (0.00) 0.39 (0.00)
Tr-MC 0.22 (0.00) 0.35 (0.01)
(baseline) 0.41 (0.00) 0.41 (0.00)
Movielens DTr-CMC 0.39 (0.00) 0.38 (0.00)
(100K) Fro-CMC 0.41 (0.00) 0.41 (0.01)
Fro-MC 0.21 (0.01) 0.38 (0.01)
Tr-CMC 0.40 (0.00) 0.40 (0.00)
Tr-MC 0.12 (0.00) 0.38 (0.00)
(baseline) 0.35 (0.00) 0.35 (0.00)
Table 3: Results of the two tasks measured in f. Bold-face indicates the highest score.

7 Conclusion

In this paper, we showed the first result of exact recovery guarantee in the novel problem of clipped matrix completion. We proposed practical algorithms as well as a theoretically-motivated regularization term. We showed the effectiveness of the proposed method, and that the CMC methods obtained by modifying MC methods are more robust to clipped data, through numerical experiments. An important future work is to consider a specialization of our theoretical analysis for the case of discrete data to analyze the ability of Q-MC methods for recovering discrete data from ceiling effects.

8 Acknowledgments

TT would like to thank Ikko Yamane, Han Bao, and Liyuan Xu, for helpful discussions. MS was supported by the International Research Center for Neurointelligence (WPI-IRCN) at The University of Tokyo Institutes for Advanced Study.

References

  • Austin and Brunner [2003] Peter C Austin and Lawrence J Brunner. Type I error inflation in the presence of a ceiling effect. The American Statistician, 57(2):97–104, 2003. ISSN 0003-1305, 1537-2731. doi: 10.1198/0003130031450. URL http://www.tandfonline.com/doi/abs/10.1198/0003130031450.
  • Austin et al. [2000] Peter C. Austin, Michael Escobar, and Jacek A. Kopec. The use of the Tobit model for analyzing measures of health status. Quality of Life Research, 9(8):901–910, September 2000. ISSN 1573-2649. doi: 10.1023/A:1008938326604. URL https://doi.org/10.1023/A:1008938326604.
  • Avron et al. [2012] Haim Avron, Satyen Kale, Shiva Prasad Kasiviswanathan, and Vikas Sindhwani. Efficient and practical stochastic subgradient descent for nuclear norm regularization. In Proceedings of the 29th International Conference on Machine Learning, 2012.
  • Benjamin [2005] Rifkin Benjamin. A ceiling effect in traditional classroom foreign language instruction: Data from russian. The Modern Language Journal, 89(1):3–18, February 2005. ISSN 0026-7902. doi: 10.1111/j.0026-7902.2005.00262.x. URL https://onlinelibrary.wiley.com/doi/full/10.1111/j.0026-7902.2005.00262.x.
  • Bhaskar [2016] Sonia A. Bhaskar. Probabilistic Low-Rank Matrix Completion from Quantized Measurements. Journal of Machine Learning Research, 17(60):1–34, 2016. URL http://jmlr.org/papers/v17/15-273.html.
  • Bhojanapalli and Jain [2014] Srinadh Bhojanapalli and Prateek Jain. Universal Matrix Completion. In Proceedings of the 31st International Conference on Machine Learning, pages 1881–1889, January 2014. URL http://proceedings.mlr.press/v32/bhojanapalli14.html.
  • Boucheron et al. [2013] Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, Oxford, 1st ed edition, 2013. ISBN 978-0-19-953525-5. OCLC: ocn818449985.
  • Cai et al. [2010] Jian-Feng Cai, Emmanuel J. Candès, and Zuowei Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4):1956–1982, January 2010. ISSN 1052-6234. doi: 10.1137/080738970. URL https://epubs.siam.org/doi/abs/10.1137/080738970.
  • Candès and Plan [2010] Emmanuel J. Candès and Yaniv Plan. Matrix completion with noise. Proceedings of the IEEE, 98(6):925–936, June 2010. ISSN 0018-9219. doi: 10.1109/JPROC.2009.2035722.
  • Candès and Recht [2009] Emmanuel J. Candès and Benjamin Recht. Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717, 2009.
  • Candès et al. [2011] Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright.

    Robust principal component analysis?

    Journal of the ACM, 58(3):1–37, May 2011. ISSN 00045411. doi: 10.1145/1970392.1970395. URL http://portal.acm.org/citation.cfm?doid=1970392.1970395.
  • Catherine et al. [2004] Lam Catherine, Young Nancy, Marwaha Jasvir, McLimont Marjorie, and Feldman Brian M. Revised versions of the Childhood Health Assessment Questionnaire (CHAQ) are more sensitive and suffer less from a ceiling effect. Arthritis Care & Research, 51(6):881–889, December 2004. ISSN 0004-3591. doi: 10.1002/art.20820. URL https://onlinelibrary.wiley.com/doi/full/10.1002/art.20820.
  • Chen et al. [2015] Yudong Chen, Srinadh Bhojanapalli, Sujay Sanghavi, and Rachel Ward. Completing any low-rank matrix, provably. Journal of Machine Learning Research, 16:2999–3034, 2015. URL http://jmlr.org/papers/v16/chen15b.html.
  • Cox [1984] D. R. Cox. Analysis of Survival Data. Routledge, June 1984. ISBN 978-1-351-46661-5. doi: 10.1201/9781315137438. URL https://www.taylorfrancis.com/books/9781351466615.
  • Davenport et al. [2014] Mark A Davenport, Yaniv Plan, Ewout Van Den Berg, and Mary Wootters. 1-bit matrix completion. Information and Inference: A Journal of the IMA, 3(3):189–223, 2014.
  • DeMaris [2004] Alfred DeMaris. Regression with Social Data: Modeling Continuous and Limited Response Variables. Wiley series in probability and statistics. Wiley-Interscience, Hoboken, NJ, 2004. ISBN 978-0-471-22337-5.
  • Greene [2012] William H. Greene. Econometric Analysis. Prentice Hall, Boston, 7th ed edition, 2012. ISBN 978-0-13-139538-1. OCLC: ocn692292382.
  • Gross [2011] David Gross. Recovering low-rank matrices from few coefficients in any basis. IEEE Transactions on Information Theory, 57(3):1548–1566, March 2011. ISSN 0018-9448, 1557-9654. doi: 10.1109/TIT.2011.2104999. URL http://arxiv.org/abs/0910.1879.
  • Guo et al. [2013] G. Guo, J. Zhang, and N. Yorke-Smith. A Novel Bayesian Similarity Measure for Recommender Systems. In

    Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI)

    , pages 2619–2625, 2013.
  • Horn [1995] Roger A. Horn.

    Norm bounds for Hadamard products and an arithmetic - geometric mean inequality for unitarily invariant norms.

    Linear Algebra and its Applications, 223-224:355–361, July 1995. ISSN 0024-3795. doi: 10.1016/0024-3795(94)00034-B. URL http://www.sciencedirect.com/science/article/pii/002437959400034B.
  • Jain et al. [2013] Prateek Jain, Praneeth Netrapalli, and Sujay Sanghavi. Low-rank matrix completion using alternating minimization. In

    Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing

    , pages 665–674, 2013.
    ISBN 978-1-4503-2029-0. doi: 10.1145/2488608.2488693. URL http://doi.acm.org/10.1145/2488608.2488693.
  • Kaplan [1992] Charles Kaplan. Ceiling effects in assessing high-IQ children with the WPPSI–R. Journal of Clinical Child Psychology, 21(4):403–406, December 1992. ISSN 0047-228X. doi: 10.1207/s15374424jccp2104˙11. URL https://doi.org/10.1207/s15374424jccp2104_11.
  • Király et al. [2015] Franz J. Király, Louis Theran, and Ryota Tomioka. The Algebraic Combinatorial Approach for Low-Rank Matrix Completion. Journal of Machine Learning Research, 16:1391–1436, 2015. URL http://jmlr.org/papers/v16/kiraly15a.html.
  • Kohler and Lucchi [2017] Jonas Moritz Kohler and Aurelien Lucchi. Sub-sampled Cubic Regularization for Non-convex Optimization. arXiv:1705.05933 [cs, math, stat], May 2017. URL http://arxiv.org/abs/1705.05933.
  • Lan et al. [2014] Andrew S. Lan, Christoph Studer, and Richard G. Baraniuk. Matrix recovery from quantized and corrupted measurements. In In IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, 2014.
  • Ledoux and Talagrand [1991] Michel Ledoux and Michel Talagrand. Probability in Banach Spaces: Isoperimetry and Processes. Springer, Berlin, 1991.
  • Lee and Seung [2001] Daniel D. Lee and H. Sebastian Seung. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems, pages 556–562, 2001.
  • Liu et al. [2017] Guangcan Liu, Qingshan Liu, and Xiaotong Yuan. A new theory for matrix completion. In Advances in Neural Information Processing Systems, pages 785–794, 2017.
  • Liu and Vandenberghe [2010] Zhang Liu and Lieven Vandenberghe. Interior-Point Method for Nuclear Norm Approximation with Application to System Identification. SIAM Journal on Matrix Analysis and Applications, 31(3):1235–1256, January 2010. ISSN 0895-4798, 1095-7162. doi: 10.1137/090755436. URL http://epubs.siam.org/doi/10.1137/090755436.
  • Mareček et al. [2017] Jakub Mareček, Peter Richtárik, and Martin Takáč. Matrix completion under interval uncertainty. European Journal of Operational Research, 256(1):35–43, January 2017. ISSN 0377-2217. doi: 10.1016/j.ejor.2016.07.014. URL http://www.sciencedirect.com/science/article/pii/S0377221716305513.
  • Marecek et al. [2018] Jakub Marecek, Stathis Maroulis, Vana Kalogeraki, and Dimitrios Gunopulos. Low-Rank Methods in Event Detection. arXiv:1802.03649 [cs], February 2018. URL http://arxiv.org/abs/1802.03649.
  • Recht et al. [2010] B. Recht, M. Fazel, and P. Parrilo. Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization. SIAM Review, 52(3):471–501, January 2010. ISSN 0036-1445. doi: 10.1137/070697835. URL https://epubs.siam.org/doi/abs/10.1137/070697835.
  • Recht [2011] Benjamin Recht. A simpler approach to matrix completion. Journal of Machine Learning Research, 12(Dec):3413–3430, 2011.
  • Rodrigues et al. [2013] Simey de Lima Lopes Rodrigues, Roberta Cunha Matheus Rodrigues, Thais Moreira Sao-Joao, Renata Bigatti Bellizzotti Pavan, Katia Melissa Padilha, Maria-Cecilia Gallani, Simey de Lima Lopes Rodrigues, Roberta Cunha Matheus Rodrigues, Thais Moreira Sao-Joao, Renata Bigatti Bellizzotti Pavan, Katia Melissa Padilha, and Maria-Cecilia Gallani. Impact of the disease: Acceptability, ceiling and floor effects and reliability of an instrument on heart failure. Revista da Escola de Enfermagem da USP, 47(5):1090–1097, October 2013. ISSN 0080-6234. doi: 10.1590/S0080-623420130000500012. URL http://www.scielo.br/scielo.php?script=sci_abstract&pid=S0080-62342013000501090&lng=en&nrm=iso&tlng=en.
  • Salkind [2010] Neil J. Salkind, editor. Encyclopedia of Research Design. SAGE Publications, Thousand Oaks, Calif, 2010. ISBN 978-1-4129-6127-1. OCLC: ocn436031218.
  • Schnedler [2005] Wendelin Schnedler. Likelihood estimation for censored random vectors. Econometric Reviews, 24(2):195–217, 2005.
  • Toh and Yun [2010] Kim-Chuan Toh and Sangwoon Yun. An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pacific Journal of Optimization, 6(615-640), 2010.
  • Tropp [2012] Joel A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, 12(4):389–434, August 2012. ISSN 1615-3375, 1615-3383. doi: 10.1007/s10208-011-9099-z. URL http://arxiv.org/abs/1004.4389.
  • Voutilainen et al. [2016] Ari Voutilainen, Taina Pitkäaho, Tarja Kvist, and Katri Vehviläinen-Julkunen. How to ask about patient satisfaction? The visual analogue scale is less vulnerable to confounding factors and ceiling effect than a symmetric Likert scale. Journal of Advanced Nursing, 72(4):946–957, April 2016. ISSN 03092402. doi: 10.1111/jan.12875. URL http://doi.wiley.com/10.1111/jan.12875.

Appendix A Details of the algorithms

Here we describe instances of CMC (clipped matrix completion) algorithms in detail. A more general description can be found in Section 3 of the main article.

a.1 Details of DTr-CMC

The optimization method for DTr-CMC based on subgradient descent [Avron et al., 2012] is described in Algorithm 1. In the algorithm, we let denote a skinny singular value decomposition subroutine, and the all-one matrix.

0:  , , , , , and
  Initialize:
  for  do
      // Take the gradient of
      // Take a subgradient of
      // Same as above
      // Update in the direction of the subgradient
     
     
     
     
  end for
  return  
Algorithm 1 A subgradient descent algorithm for DTr-CMC

Derivation of the algorithm.

Let denote the Hadamard product. The second regularization term of DTr-CMC can be rewritten as

where is defined by . This is a composition of a non-smooth convex function and a nonlinear operator , hence it is not trivial to find a method to minimize this function. Here, in order to minimize the objective function, we employ an iterative scheme to approximate this function with one that has a known subgradient. For each iteration

, we find a subgradient of the following heuristic objective at

:

and update the parameter in the descending direction. This function is a combination of the trace norm and a linear transformation. A subgradient can be calculated by first performing a singular value decomposition

, and then calculating .

Initialization.

While we expect the regularization term to encourage the recovery of the values above the threshold, the task is difficult as it requires extrapolating the values to outside the range of any observed entries. To compensate for this difficulty, we initialize the parameter matrix with values strictly above the threshold. This allows the algorithm to start from a matrix whose values are above the threshold and simplify the hypothesis. Therefore, in the experiment, we initialized all elements of with (here, we used reflecting the spacing between choices on the rating scale of the benchmark data of recommendation systems. This value can be arbitrarily configured).

Range of hyperparameters.

In the experiments, we used . The regularization coefficients and were grid-searched from .

Relation to the theoretical analysis in Section 4.

Here, we show that the loss function (Eq. (4)) is a convex upper bound of the loss function in Eq. (6).

Proof.

By a simple calculation,

Therefore, is an upper bound of the objective function of Eq. (6). ∎

a.2 Details of Tr-CMC

For trace-norm regularized clipped matrix completion, we used the accelerated proximal gradient singular value thresholding algorithm (APG) introduced in [Toh and Yun, 2010]. APG is an iterative algorithm in which the gradient of the loss function is used. Thanks to the differentiability of the squared hinge loss function, we are able to use APG to minimize the CMC objective (Eq. (4)) with . We obtained an implementation of APG for matrix completion from http://www.math.nus.edu.sg/m̃attohkc/NNLS.html, and modified the code for Tr-CMC.

Experiments

In the experiments, we used , and . For the regularization coefficient, the default values proposed by Toh and Yun [2010] was used, i.e., using a continuation method to minimize Eq. (5) with for iteration , to eventually minimize Eq. (5) with .

a.3 Details of Fro-CMC

This method first parametrizes as , where , and minimizes Eq. (5) with . Here we use to denote the (transposed) row vectors of and .

Original algorithm.

The minimization objective is not jointly convex in . Nevertheless, it is separately convex when one of and is fixed. The idea of alternating least squares (ALS) is to fix when minimizing Eq. (3) with respect to , and vice versa. In its original form, each update is analytic and takes the form

Proposed algorithm.

The squared hinge loss is differentiable, and its derivative is . Thanks to its differentiability, the loss function Eq. (4) is also differentiable. However, in the case of CMC, a closed-form minimizer is not obtained as in the derivation of the original ALS, due to the existence of indicator function in its derivative. As an alternative, we derive a method to alternately update the parameters by minimizing an approximate objective at each iteration. Denoting , we use the following heuristic update rules to approximately obtain the minimizers.

(8a)
(8b)

where

is the identity matrix. We iterate between Eq. (

8a) and Eq. (8b) as in Algorithm 2. For the same reason as DTr-CMC, we let the algorithm start from a matrix whose values are all . In the algorithm, we let and be the all-one matrices.

0:  , , , and .
  Initialize: and
  for  do
     Obtain according to Eq. (8a)
     Obtain according to Eq. (8b)
  end for
  return  
Algorithm 2 An approximate alternating least squares algorithm for Fro-CMC.

Experiments

In the experiments, hyperparameters were grid-searched from , and .

Symbol Meaning
Ground truth matrix (rank )
Clipped ground truth matrix
Estimated matrix
Observed entries
Observed clipped entries
Characteristic operator
Subspace of matrices spanned by products of singular vectors of
Skinny singular value decomposition of
The row-wise (column-wise) coherence parameters
Coherence of
Joint coherence of
The set of feasible perturbations
The importance of for
The Lipschitz property of w.r.t. the Frobenius norm
The Lipschitz property of w.r.t. the infinity norm
The Lipschitz property of w.r.t. the operator norm
The number of partitions that generated (introduced for the theoretical analysis)
The loss function for CMC using only the squared loss
The loss function for CMC using both the squared loss and the squared hinge loss
The set of real numbers
The set of natural numbers
Landau’s asymptotic notation for
where
Element of the matrix
Projection to ();
the linear operator to set matrix elements outside to zero
The transpose
The Euclidean norm of vectors
The trace norm
The operator norm
The Frobenius norm
The entry-wise infinity norm
The subspace of spanned by
The range of a mapping
Table 4: List of symbols used in the main text.

Appendix B The generation process of the synthetic data

In order to obtain rank- matrices with different rates of clipping, synthetic data were generated by the following procedure.

  1. For a fixed , we first generated a matrix

    whose entries were independent samples from a uniform distribution over

    .

  2. We used a non-negative matrix factorization algorithm [Lee and Seung, 2001] to approximate with a matrix of rank at most .

  3. We repeated the generation of until the rank of was exactly . Note that with this procedure, may become larger than .

  4. We randomly split into with ratio , which were used for training, validation and testing, respectively.

  5. We clipped at the clipping threshold to generate and removed entries randomly with probability .

The visual demonstration of CMC in Figure 2 was generated by the process above with , and . Figure (2fig:res-illust-CMC) is a result of applying Fro-CMC to the generated matrix.

Appendix C Proof of Theorem 1

We define and . We also define linear operators , and by , and . Note are all self-adjoint. We denote the identity map by . The summations indicate the summation over . The maximum indicate the maximum over