1 Introduction
Recommender systems aim at improving customers’ experience by maximizing the use of the available information, including useritem interactive data, such as ratings or clicking behavior, and attribute information, such as category or context profiles. Methods that utilize the interaction data are referred as collaborative filtering [10, 22, 26, 29] while the other methods that use the textual information are referred as contentbased methods [5, 24, 30]
. In particular, collaborative filtering is a method predicting the missing ratings given by a specific user to a specific item. Based on the idea that users and items are highly correlated to each other, the unspecified ratings can be estimated via learning the hidden relations.
Collaborative filtering can be seen as a special case of matrix completion task. It has become a cornerstone of most powerful recommender systems while it is mainly founded on two main streams of methods: neighbourhoodbased methods [10, 17, 22, 26, 29] and modelbased methods [6, 1, 18, 23, 31, 32]. Though neighbourhoodbased methods are easy to interpret and implement, they cannot extract enough information and suffer from low prediction accuracy when observed data is sparse. In this case, dimension reduction methods [1, 4, 28] and graphs [14, 25]
were tried to address the sensitivity issue. Alternatively, modelbased methods define a parameterized model which can be optimized by the available data during the training process. Numerous modelbased approaches were tested in previous research, including Support Vector Machines
[15], Maximum Entropy [36][27]and Singular Value Decomposition (SVD)
[18, 23, 31, 32].Under the assumption that the continuation of data points is convincing and compelling, standard collaborative filtering methods take observed entries of a rating matrix as real numbers. However, the adequacy of this measurement is undoubtedly questionable when intervals between data points are different. For instance, personal judgments from different customers vary as a result of personality. Say, generous customers tend to give fairly higher ratings than curmudgeon customers. Thus, instead of taking data as continuous numbers, it is more feasible considering them as categories, especially the binary case. For instance, researchers [2, 7, 9] use a small part of binary subset generated by the realvalued entries, namely ‘’ for “Recommended” and ‘’ for “Not Recommended”. Experiments show their approaches perform significantly better than continuous matrix completion methods.
Although 1bit matrix completion has proven its success in recommender systems, same as most other matrix completion methods, it suffers from a fundamental limitation: every user/item is treated merely as standalone individuals, which arrogantly ignores the homogeneity of products and the clustering characteristic of social behaviors. For instance, some fundamental management theory points out that people have a propensity of conformity nature based on demographic, psychographic and behavioral variables [21]. Some recent research was noticed focusing on integrating preliminary clusters into continuous matrix completion task [3] and experiments demonstrated that their approach outclassed traditional SVD methods. However, to the best of our knowledge, so far there is not any 1bit matrix completion methods taking cluster information into consideration. Moreover, stateoftheart recommender systems either take clustering as an independent task or treat clusters as preliminaries, there is not any existing method for summarizing clusters along with matrix completion. Since the clustering nature of individuals plays a vital role in social behavior research, it is consequently significant to introduce a new method that learns the clusters, on the other hand also to utilize the clustering effects for matrix completion. In this work, we focus on two tasks: (i) integrating group information into 1bit matrix completion, namely groupspecific 1bit matrix completion (GS1MC), and (ii) proposing an efficient algorithm of cluster developing matrix completion on the binary case, viz. cluster developing matrix completion (CDMC).
To exploit the grouping effects, based on current latent variable model, we expand the scope of quantized matrix completion to developing clusters automatically as well as leveraging their effects. The proposed methods can be used to take advantages of preliminary known user/item clusters or learn the groups during the training process according to the subspace correlations of targets. Experimentally, we show that the proposed GS1MC outperforms existing known modelbased 1bit matrix completion methods. And more importantly, CDMC successfully captures targets’ generic features and achieves convergence of both user/item clusters.
The rest of the paper is constructed as follows: In Section 2, we discuss preliminary knowledge and background of the problem setting. In Section 3, we introduce groupspecific 1bit matrix completion(GS1MC). In Section 4, the method is further extended to positively learn the cluster identities: cluster developing matrix completion(CDMC). In Section 5, we evaluate our method on synthetic data as well as a realworld application. Section 6 presents the discussion and future aspiration.
2 Background
In this section, we discuss some preliminary knowledge of the research, including traditional SVDbased matrix completion, the framework of probabilistic 1bit matrix completion and sparse subspace clustering techniques.
2.1 Matrix Completion
Consider as the original utility matrix, where and are the number of users and items, respectively. Within , each is the explicit feedback given by user towards item of a scale, e.g., from to
, where the intervals probably differ as a result of personal bias. Regularized SVD (RSVD)
[13] predictor assumes as a lowrank matrix because of instance correlations and make the approximation (prediction) by:(1) 
where and are dimensional latent variables associated to user and item , respectively. RSVD estimates the latent variables by minimizing the sum of residuals of observed entries via gradient descent method with a regularization term:
and
where denotes all items rated by user and stands for all users who rated item .
As the most fundamental SVD method, RSVD has been extended in different directions. For instance, a variety of regularization terms were applied for specific considerations [35], and biased version of SVD methods [19, 20, 23] were also introduced. To take the advantages of the general preference of each user and discrimination of each item, a set of biasing variables were incorporated in biased SVD methods. Then, apart from taking individualspecific bias, users/items can also be allocated into clusters and aggregated with group effects. For instance, taking preliminary cluster identities as inputs, a set of latent variables representing the group bias [3] can be learned via the training process.
2.2 1Bit Matrix Completion
Though matrix completion methods have been used for recommender systems for long, 1bit matrix completion [9] has been officially introduced lately. Varied from the continuous model which applies numerical computation on discrete rating data directly, original observation is converted into a binary matrix by comparing each observed entry to the average rating score. Then, the objective of the task is formalized as learning an latent variable matrix . The predicted binary feedback is finally computed by:
(2) 
where is the set of all the observed entries and
can be the Sigmoid function defined as:
(3) 
Similar to other lowrank matrix completion methods, a wide variety of approaches have been applied to constrain the latent variable matrix. For instance, a tracenorm [9] was considered under the assumption of uniform sampling. Then, a maxnorm method as a convex relaxation [7] was explored under a general sampling model. Moreover, the theory has been extended further to discuss the exact lowrank constraint [2]. However, all these existing 1bit matrix completion methods treat every instance as autonomous individuals. In other words, predictions have been made generously, ignoring the ground truth that users/items tend to have a specific baseline or belong to certain clusters. Furthermore, as far as we know, there is not any methodology that can both learn the cluster identities and leverage their group effects for matrix completion at the same time.
2.3 Sparse Subspace Clustering
Sparse subspace clustering (SSC) [12] aims at clustering data points in their lowdimensional subspace via the self expressive matrix, which represents each instance by an affine combination of other points within the same subspace.
Nevertheless, in terms of the fact that representations for each data point by the other should be as sparse as possible, which results in an NPhard problem, a convex relaxation must be proposed to get around the NP difficulty. Thus, SSC formalizes the original problem as a norm optimization task. Take the most standard procedure as an example, SSC assumes the whole noisefree dataset can be separated into subspaces of dimensions . Alternatively speaking, the matrix of the whole dataset can be written as:
where is an unknown permutation matrix and is a subset of the data points lying in , namely a rank matrix of points (). Now, each data point can be reconstructed by a combination of other points within the same subspace as:
(4) 
Then, different norm functions can be applied for the estimation of (4). Finally the problem is defined as, under the norm constraint,
(5) 
where corresponds to the nontrivial subspacesparse representation for all the data points s.
Since useritem interaction data is exceedingly sparse and highdimensional, many dimensions are irrelevant and covered by noise. In the meantime, the correlation between individuals can be interpreted as similarities of their private latent variable, which is not strictly around any centroids. Thus, conventional clustering methods that utilizing the spatial proximity is not applicable in this case. Differently, subspace clustering methods aim at grouping the points that are not necessarily close but lie in the same subspace, which does not depend on the spatial characteristic of the data. Moreover, as sparse subspace clustering deploys a convex approach to pick out the sparse representation of each point, the optimization process automatically eliminates some common issues of clustering methods, such as sensitivity to the ideal cluster size and bordering matter of the overlapped subspace.
3 Groupspecific 1Bit Matrix Completion (GS1MC)
In this section, we integrate group effects into 1bit matrix completion task such that biases of clusters can be learned along with latent variable training process.
3.1 Model Framework
Suppose is the observed binary rating matrix with entries equal to ‘’ or ‘’, corresponding to “interested” or “not interested”, where is the number of users/items, the “not observed” entries are represented by ‘’. stands for the observed useritem pairs, i.e. entries with same indexes as ‘’ and ‘’ in . We construct the latent variable matrix as . To make predictions for missing entries by (2), our main objective is to find the estimation of that best explains the observed data.
Since it has been proved that the exact lowrank method results in a high convergence rate [2], especially when the fraction of revealed entries is small (coldstart problem), we choose to apply an exact lowrank constraint on
. We assume that every user/item is classified into one single user/item group, respectively. We formulate the latent variable matrix
by integrating group bias into matrix factorization. Then each entry in can be written as:(6) 
Here and are dimensional latent factors standing for user u’s preference and item i’s character, while and represent biases of clusters that individuals belong to. For instance, means the cluster effect of user cluster , i.e. the cluster user belongs to. Here we have assumed that there are users clusters and item clusters, such that and . Then, the group effects of the user and item clusters can be formalized as:
For the sake of convenience, in terms of matrix notations, we assume the useritem interaction data and its corresponding latent variable have been permuted such that the first rows corresponds user cluster 1, followed by rows corresponding to user cluster 2, …, and the last rows corresponding to user cluster . Similarly, the columns have been rearranged accordingly. After this alteration, the decomposition (6) can be written as the following matrix format:
(7) 
where
Here stands for dimensional (column) vector of all ‘1’s. In other words, instances of group effects matrix and have been duplicated in order to match the dimension of matrix and . For the convenience of the transformation between , and , , we define the following two matrices:
Thus, it is clear that , and , can be transformed to each other by:
(8) 
Then, (7) can be rewritten as:
(9) 
3.2 Objective and Optimization
Following the objective function of basic 1bit matrix completion method [2]
, the fundamental loss function is defined as:
where is the matrix operation of applying over elementwise, and is the all 1’s matrix. Here is the indicator function, i.e. when is true, else . can be implemented as two mask matrices and of the same size as , where if , otherwise , and if , otherwise . Then, the fundamental loss function can be transformed into:
(10) 
where means the elementwise product of two matrices. We notate and . After adding the regularization term, the new loss function can be formulated as:
(11) 
Our goal is to predict the missing entries of the rating matrix, which can be computed by:
(12) 
We solve the optimization problem (12) via the Alternating direction method of multipliers (ADMM). Firstly, to update the latent factors of users and user clusters, we fix and , and minimize (12) by estimating and :
(13) 
(14) 
Then for items and item clusters, we fix and , conducting following computations:
(15) 
(16) 
Each of subproblems (13)  (16) can be solved by the gradient descent algorithm. We can work out the gradient in the following way. First we take as the Sigmoid function defined in (3), then it is easy to check that:
Considering (7
), with the matrix differentiation chain rule, it can be proved that:
(17)  
(18) 
On the one hand, we have
On the other hand, according to (8), it is clear to state that:
According to the chain rules, we finally get:
In other words, the sum of the first rows of is the first row of , the sum of the next rows of is the second row of , …, and the sum of the last rows of becomes the th row (the last row) of . The similar way can be used to construct from .
4 Cluster Developing Matrix Completion (CDMC)
In this section, we intend to learn the cluster identities of users/items during the latent variable training process and integrate the clustering results with groupspecific matrix completion.
4.0.1 Problem Setting
The model (GS1MC) proposed in Section 3 takes cluster identities as preliminary information. However, in most practical scenarios, it might be inaccessible to such details, especially for the coldstart problem. Secondly, since the original binary useritem interaction data is extremely sparse, it is controversial to apply standard clustering techniques on it directly. Moreover, common clustering methods may take advantage of distance between points to divide the space into different partitions. Nevertheless, regarding a latent variable model, market segments may not necessarily congregate based on spatial proximity but lie in a subspace. Thus, found on GS1MC, we aim at clustering users/items that belong to a union of lowdimensional subspace respectively.
A common dilemma for most clustering techniques is the drawback that they might be decidedly sensitive to improper initialization, such as cluster size and centroids. As long as the size of user/item clusters is unrevealed and each data points can have an infinite number of expressions in terms of the other, we incorporate sparse subspace clustering (SSC) technique to optimize a sparse representation among these expressions through a convex realization approach.
4.0.2 Algorithm
Based on GS1MC, we extend the scope of the method to developing clusters during the latent variable training process.
In the last session, we deploy ADMM to optimize latent variables and in an iterative manner. Now, to develop clusters based on the gradually recovering matrix, after each iteration of updating latent variables and , we construct the rating likelihood matrix and via (9) and (3). We consider the rating likelihood matrix lies in disjoint subspaces while lies in . According to Theorems 2 and 3 from [12], we employ the norm relaxation of the selfexpressive matrix to obtain the sparse representation / for users/items’ features respectively, namely:
(19) 
Here, each column of and stands for an user/item’s hidden profile, and within each column, nonzero entries correspond to the other homogeneous points that lie in the same subspace with this point in the ideal case.
Next, a nondirectional weighted graph of is built as , where is the nodes regarding all sparse representations in , and is the weighted edges between each pair of
. A natural choice of the weighted matrix is that the nodes within the same subspace will share nonzero weighted edges while the other edges are zeroweighted. Alternatively speaking, an affinity matrix can be constructed by
, where the nonzero entries represents latent variable pairs that actually lie in the same subspace. Then, we apply spectral clustering method on
to procure item clusters. Similar method is conducted to build for user cluster developing.After observing new clusters from last step, we update group identities of each user/item. Then, to leverage group effects of the latest clusters into matrix completion, we estimate latent variables by (13) to (16) again. Thus, CDMC conducts sparse subspace clustering and GS1MC iteratively. The complete algorithm is shown in Algorithm 1.
5 Experiments
In this section, we evaluate the proposed GS1MC and its extension CDMC, separately. The experiments are based on simulation analysis as well as benchmark comparison on a realworld dataset.
5.1 Dataset and Experiment Settings
To start with, to verify the effectiveness of GS1MC, a synthetic dataset with group information was designed in the following way. Firstly, we set , , and . Then we generate and , where is an
order identity matrix. To include the group information, we design
and , where , , and . Then, we construct the latent variable matrix by and scale it so that . Now, we take the 1bit transformation and add the noise by . We keep a certain percentage of entries as observations, where is the observation rate.Notably, we also tested our methods on one of the most common recommender system benchmark dataset: Movielens [16]. This useritem interaction data consists of 100,000 ratings (15) from 943 users on 1682 movies. Following the problem settng of previous literature [2, 7, 9], the original observations, scaled from 1 to 5, have been quantized as ‘+1’ and ‘1’ according to whether they are above or below the average score.
The proposed method is implemented and tested in Matlab R2017b on a PC with Intel(R) Core(TM) i57600 CPU @ 3.500GHz and 8.00GB RAM.
5.2 Experiments on GS1MC
5.2.1 Simulation Analysis
We set and . Then we randomly split the data in terms of different training size, namely for crossvalidation. we assume the right group identities are preliminary and compare our method with a TraceNorm approach [9]. The tuning parameter for the proposed method is selected as 37 by minimizing the average relative error while the parameters search for the TraceNorm approach is embedded in the original implementation. The best results for both methods, shown in Table 1, are chosen among 100 replications. It is indicated that GS1MC has much smaller relative error compared to traditional 1bit matrix completion, especially when the observed data is sparse (coldstart problem) or when the latent variable have higher dimensions.
It is straightforward to comprehend the result: since group effects can be regarded as extra information compared to the observed sparse matrix, GS1MC can have a much more robust performance compared to the fundamental 1bit matrix completion when the observed information is limited or when the complexity of the latent variable is high.
No. of latent factors  Observation Rate:  10%  15%  20%  25% 

K = 3  The Proposed Method  1.00  0.85  0.78  0.73 
TraceNorm  1.89  1.74  1.67  1.59  
K = 6  The Proposed Method  1.00  0.92  0.81  0.74 
TraceNorm  2.53  2.27  2.15  2.02 
% Prediction accuracy  

% Training size  The proposed method  Exactrank  HL  Logit  Tracenorm  Maxnorm 
95  74.1  73.0  72.0  68.0  73.0  72.2 
10  66.3  61.0      59.0  59.0 
5  63.3  54.5      49.9  50.5 
5.2.2 Movielens Dataset
Since the fact that the cluster information of most useritem interaction data is not available, to provide GS1MC cluster information, we group the original dataset according to their implicit feedback. Implicit feedback refers to the density of items receiving comments or the frequency of people giving feedback. In other words, people tend not to choose items randomly but choose things they already expected [11]. Thus, implicit feedback only concerns about the identity of ratings irrespective of actual rating values. It is expected that people giving more ratings tend to be more curmudgeon while items with more feedback tend to have higher average ratings [3]. Thus, we group users and items according to the number of ratings they have given or received.
We compared GS1MC with the other existing 1bit matrix completion methods, namely: a) hinge loss with variational approximation (HL) [8], (b) Bayesian logistic model with variational approximation (Logit) [8], (c) the tracenorm frequentist logistic model (Tracenorm) [9], (d) the exact lowrank model (Exactrank) [2] and (e) a maxnorm constrained minimization approach (Maxnorm) [7]. Following their experiment setup, Movielens dataset has been split into different trainingtest size (Note: Here, the training size is not the observation rate in the simulation anaysis). Since some methods are not opensourced, we compared our results with the best results appeared in previous literature. The converged accuracy results are displayed in Table 2. It is easy to reveal that the proposed method has outperformed all the other baselines. Conspicuously, regarding the scenario when the training size is extremely small (5%), our method has greatly boosted traditional binary matrix completion method by utilizing the group information.
5.3 Experiments on CDMC
5.3.1 Robustness Analysis
As far as our knowledge, there is not any comparable baseline for clustering problems in recommender systems research. Thus, to evaluate the convergence performance of CDMC, we conduct the first experiment on Movielens dataset.
To start with, we split the data (95%, 5%) and initialize group identities of users/items randomly. Then we train the CDMC
model for a number of epochs until the clustering results tend to stabilize (200 epochs for 95% Movielens
dataset). So far, the produced cluster identities of each instance are stored as the baseline. Afterwards, we reconduct the process multiple times with completely random initialization. Namely, another (95%, 5%) entries of the dataset are split for crossvalidation, and all the cluster identities, as well as all latent variables, are determined arbitrarily. We use adjusted mutual information (AMI) [33] as the evaluation score to measure the degree of matching, regarding the clustering results from multiple crossvalidation processes. As Figure 0(a) shows, both user/item clusters converge to a highly similar distribution over the training epochs.Meanwhile, during each iteration of the optimization process, we construct and make the prediction on the test set. The recorded misclassification rate is shown in Figure 0(b). It is indicated that the misclassification rate gradually stabilize as the cluster developing process proceeds and the resulted prediction accuracy is highly comparable with GS1MC proposed in Session 3, even for this case the cluster information is totally unknown.
5.3.2 Clustering Outcome
For item clusters, we use three dimensions of itemrelated latent variable as axis. The learned clusters are visualized in Figure 1(a). Similarly, developed user clusters are plotted on userrelated latent variable . As Figure 2 shows, the item clusters are more dispersed and differentiable while the user clusters gather in closer proximity.
In order to validate the practical influence of CDMC, we project the actual profile features of each user/item onto the latent variable CDMC learned and discovered some noteworthy findings.
Firstly, since there are 19 categories of items available, and each movie can be labeled as multiple genres. We extracted this information and constructed a genre matrix , here means item can be classified in category. As items in share the 1to1 exact same index with
, we applied kmeans clustering method on this generic information and visualized its results corresponding to the latent variable that CDMC learned.
As shown in Figure 2(b), it is compelling that the learned latent variable have a clear discernible pattern regarding items’ generic features. In other words, even though the fact that our proposed CDMC method did not take any generic information, it has captured items’ factual profile based on only the sparse rating matrix. Besides, as CDMC conducts sparse subspace clustering and groupspecific matrix completion in an iterative manner, along with gradually learning the hidden profiles, the model can integrate this information immediately into matrix completion task, which in turn positively boost the next iteration’s clustering.
Similarly, we build a feature matrix of users based on their context profiles, including age, gender and occupations. The clustering result is projected on and shown in Figure 2(c). As we expected, understanding human’s preference is a much more complicated task, and the clustering result is visibly more confusing. But it is still noticeable in the plot that blue and purple nodes gather in the vertically higher part of the space while yellow and green ones are distributed below. As pointed out in the previous literature [34], it is a quite common issue that multiple individuals might share a single account, which has biased the accuracy of the profile information.
6 Conclusions and Future Works
In this paper, we introduced groupspecific matrix factorization into 1bit matrix completion task and proposed GS1MC. Then we first time integrated sparse subspace clustering with matrix completion task and proposed CDMC, extending the scope of GS1MC from passively receiving preliminary cluster information into positively developing clusters and leveraging their effects. Experiments show GS1MC outperforms existing methods on both synthetic and realworld data, especially for the coldstart problem. And CDMC successfully captures items’ hidden generic features from highly sparse binary rating matrix. It is noteworthy that GS1MC and CDMC provide a new insight to evaluate the quality of clusters or to detect undiscovered segments. For instance, when integrating implicit feedback clusters into GS1MC, the prediction accuracy was greatly boosted compared to previous methods. In terms of CDMC, our experiments show movies’ genres have a large impact on their popularity among certain audience while users’ age, gender and occupation tends to have slighter effects on their preference. For future work, it will be valuable to apply GS1MC and CDMC into more realworld applications and discover possible unrevealed social behavior and market phenomenon.
References
 [1] Bell, R., Koren, Y., Volinsky, C.: Modeling relationships at multiple scales to improve accuracy of large recommender systems. Proceedings of the 13th ACM SIGKDD pp. 95–104 (2007)
 [2] Bhaskar, S., Javanmard, A.: 1bit matrix completion under exact lowrank constraint. arXiv:1502.06689 (2015)
 [3] Bi, X., Qu, A., Wang, J., Shen, X.: A groupspecific recommender system. Journal of the American Statistical Association pp. 1344–1353 (2017)
 [4] Billsus, D., Pazzani, M.J.: Learning collaborative information filters. ICML pp. 46–54 (1998)
 [5] Billsus, D., Pazzani, M.J.: User modeling for adaptive news access. UMUAI pp. 147–180 (2000)
 [6] Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. UAI pp. 43–52 (1998)
 [7] Cai, T., Zhou, W.X.: A maxnorm constrained minimization approach to 1bit matrix completion. JMLR pp. 3619–3647 (2013)
 [8] Cottet, V., Alquier, P.: 1bit matrix completion: PACBayesian analysis of a variational approximation. arXiv preprint arXiv:1604.04191 (2016)
 [9] Davenport, M.A., Plan, Y., Van Den Berg, E., Wootters, M.: 1bit matrix completion. Information and Inference: A Journal of the IMA pp. 189–223 (2014)
 [10] Deshpande, M., Karypis, G.: Itembased topn recommendation algorithms. ACM TOIS pp. 143–177 (2004)
 [11] Devooght, R., Kourtellis, N., Mantrach, A.: Dynamic matrix factorization with priors on unknown values. Proceedings of the 21th ACM SIGKDD pp. 189–198 (2015)
 [12] Elhamifar, E., Vidal, R.: Sparse subspace clustering: Algorithm, theory, and applications. IEEE TPAMI pp. 2765–2781 (2013)
 [13] Funk, S.: Netflix update: Try this at home. https://sifter.org/simon/journal/20061211.html (2006)
 [14] Gori, M., Pucci, A., Roma, V., Siena, I.: Itemrank: A randomwalk based scoring algorithm for recommender engines. IJCAI pp. 2766–2771 (2007)

[15]
Grčar, M., Fortuna, B., Mladenič, D., Grobelnik, M.: KNN versus SVM in the collaborative filtering framework. pp. 251–260. Springer (2006)
 [16] Harper, F.M., Konstan, J.A.: The movielens datasets: History and context. ACM TIIS p. 19 (2016)
 [17] Joaquin, D., Naohiro, I.: Memorybased weightedmajority prediction for recommender systems. ACM SIGIR (1999)

[18]
Koren, Y.: Factorization meets the neighborhood: a multifaceted collaborative filtering model. Proceedings of the 14th ACM SIGKDD pp. 426–434 (2008)
 [19] Koren, Y., Bell, R.: Advances in collaborative filtering. pp. 77–118. Springer (2015)
 [20] Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer (2009)
 [21] Kotler, P.: Marketing management: A south Asian perspective. Pearson Education India (2009)
 [22] Linden, G., Smith, B., York, J.: Amazon. com recommendations: Itemtoitem collaborative filtering. IEEE Internet Computing pp. 76–80 (2003)
 [23] Paterek, A.: Improving regularized singular value decomposition for collaborative filtering. Proceedings of KDD Cup and Workshop pp. 5–8 (2007)
 [24] Pazzani, M., Billsus, D.: Learning and revising user profiles: The identification of interesting web sites. ML pp. 313–331 (1997)
 [25] Pirotte, A., Renders, J.M., Saerens, M., et al.: Randomwalk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE TKDE pp. 355–369 (2007)
 [26] Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: Grouplens: an open architecture for collaborative filtering of netnews. Proceedings of the 1994 ACM CSCW pp. 175–186 (1994)

[27]
Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted Boltzmann machines for collaborative filtering. Proceedings of the 24th ICML pp. 791–798 (2007)
 [28] Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Application of dimensionality reduction in recommender systema case study. Tech. Rep. (2000)
 [29] Sarwar, B.M., et al.: Itembased collaborative filtering recommendation algorithms. WWW pp. 285–295 (2001)
 [30] Shoham, Y.: Combining contentbased and collaborative recommendation. Communications of the ACM (1997)
 [31] Takács, G., Pilászy, I., Németh, B., Tikk, D.: Investigation of various matrix factorization methods for large recommender systems pp. 553–562 (2008)
 [32] Takács, G., Pilászy, I., Németh, B., Tikk, D.: Scalable collaborative filtering approaches for large recommender systems. JMLR pp. 623–656 (2009)
 [33] Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. JMLR pp. 2837–2854 (2010)
 [34] Zhang, A., Fawaz, N., Ioannidis, S., Montanari, A.: Guess who rated this movie: Identifying users through subspace clustering. arXiv eprints p. 1208.1544 (2012)
 [35] Zhu, Y., Shen, X., Ye, C.: Personalized prediction and sparsity pursuit in latent factor models. Journal of the American Statistical Association pp. 241–252 (2016)
 [36] Zitnick, C.L., Kanade, T.: Maximum entropy for collaborative filtering. Proceedings of the 20th Conference on UAI pp. 636–643 (2004)
2 Background
In this section, we discuss some preliminary knowledge of the research, including traditional SVDbased matrix completion, the framework of probabilistic 1bit matrix completion and sparse subspace clustering techniques.
2.1 Matrix Completion
Consider as the original utility matrix, where and are the number of users and items, respectively. Within , each is the explicit feedback given by user towards item of a scale, e.g., from to
, where the intervals probably differ as a result of personal bias. Regularized SVD (RSVD)
[13] predictor assumes as a lowrank matrix because of instance correlations and make the approximation (prediction) by:(1) 
where and are dimensional latent variables associated to user and item , respectively. RSVD estimates the latent variables by minimizing the sum of residuals of observed entries via gradient descent method with a regularization term:
and
where denotes all items rated by user and stands for all users who rated item .
As the most fundamental SVD method, RSVD has been extended in different directions. For instance, a variety of regularization terms were applied for specific considerations [35], and biased version of SVD methods [19, 20, 23] were also introduced. To take the advantages of the general preference of each user and discrimination of each item, a set of biasing variables were incorporated in biased SVD methods. Then, apart from taking individualspecific bias, users/items can also be allocated into clusters and aggregated with group effects. For instance, taking preliminary cluster identities as inputs, a set of latent variables representing the group bias [3] can be learned via the training process.
2.2 1Bit Matrix Completion
Though matrix completion methods have been used for recommender systems for long, 1bit matrix completion [9] has been officially introduced lately. Varied from the continuous model which applies numerical computation on discrete rating data directly, original observation is converted into a binary matrix by comparing each observed entry to the average rating score. Then, the objective of the task is formalized as learning an latent variable matrix . The predicted binary feedback is finally computed by:
(2) 
where is the set of all the observed entries and
can be the Sigmoid function defined as:
(3) 
Similar to other lowrank matrix completion methods, a wide variety of approaches have been applied to constrain the latent variable matrix. For instance, a tracenorm [9] was considered under the assumption of uniform sampling. Then, a maxnorm method as a convex relaxation [7] was explored under a general sampling model. Moreover, the theory has been extended further to discuss the exact lowrank constraint [2]. However, all these existing 1bit matrix completion methods treat every instance as autonomous individuals. In other words, predictions have been made generously, ignoring the ground truth that users/items tend to have a specific baseline or belong to certain clusters. Furthermore, as far as we know, there is not any methodology that can both learn the cluster identities and leverage their group effects for matrix completion at the same time.
2.3 Sparse Subspace Clustering
Sparse subspace clustering (SSC) [12] aims at clustering data points in their lowdimensional subspace via the self expressive matrix, which represents each instance by an affine combination of other points within the same subspace.
Nevertheless, in terms of the fact that representations for each data point by the other should be as sparse as possible, which results in an NPhard problem, a convex relaxation must be proposed to get around the NP difficulty. Thus, SSC formalizes the original problem as a norm optimization task. Take the most standard procedure as an example, SSC assumes the whole noisefree dataset can be separated into subspaces of dimensions . Alternatively speaking, the matrix of the whole dataset can be written as:
where is an unknown permutation matrix and is a subset of the data points lying in , namely a rank matrix of points (). Now, each data point can be reconstructed by a combination of other points within the same subspace as:
(4) 
Then, different norm functions can be applied for the estimation of (4). Finally the problem is defined as, under the norm constraint,
(5) 
where corresponds to the nontrivial subspacesparse representation for all the data points s.
Since useritem interaction data is exceedingly sparse and highdimensional, many dimensions are irrelevant and covered by noise. In the meantime, the correlation between individuals can be interpreted as similarities of their private latent variable, which is not strictly around any centroids. Thus, conventional clustering methods that utilizing the spatial proximity is not applicable in this case. Differently, subspace clustering methods aim at grouping the points that are not necessarily close but lie in the same subspace, which does not depend on the spatial characteristic of the data. Moreover, as sparse subspace clustering deploys a convex approach to pick out the sparse representation of each point, the optimization process automatically eliminates some common issues of clustering methods, such as sensitivity to the ideal cluster size and bordering matter of the overlapped subspace.
3 Groupspecific 1Bit Matrix Completion (GS1MC)
In this section, we integrate group effects into 1bit matrix completion task such that biases of clusters can be learned along with latent variable training process.
3.1 Model Framework
Suppose is the observed binary rating matrix with entries equal to ‘’ or ‘’, corresponding to “interested” or “not interested”, where is the number of users/items, the “not observed” entries are represented by ‘’. stands for the observed useritem pairs, i.e. entries with same indexes as ‘’ and ‘’ in . We construct the latent variable matrix as . To make predictions for missing entries by (2), our main objective is to find the estimation of that best explains the observed data.
Since it has been proved that the exact lowrank method results in a high convergence rate [2], especially when the fraction of revealed entries is small (coldstart problem), we choose to apply an exact lowrank constraint on
. We assume that every user/item is classified into one single user/item group, respectively. We formulate the latent variable matrix
by integrating group bias into matrix factorization. Then each entry in can be written as:(6) 
Here and are dimensional latent factors standing for user u’s preference and item i’s character, while and represent biases of clusters that individuals belong to. For instance, means the cluster effect of user cluster , i.e. the cluster user belongs to. Here we have assumed that there are users clusters and item clusters, such that and . Then, the group effects of the user and item clusters can be formalized as:
For the sake of convenience, in terms of matrix notations, we assume the useritem interaction data and its corresponding latent variable have been permuted such that the first rows corresponds user cluster 1, followed by rows corresponding to user cluster 2, …, and the last rows corresponding to user cluster . Similarly, the columns have been rearranged accordingly. After this alteration, the decomposition (6) can be written as the following matrix format:
(7) 
where
Here stands for dimensional (column) vector of all ‘1’s. In other words, instances of group effects matrix and have been duplicated in order to match the dimension of matrix and . For the convenience of the transformation between , and , , we define the following two matrices:
Thus, it is clear that , and , can be transformed to each other by:
(8) 
Then, (7) can be rewritten as:
(9) 
3.2 Objective and Optimization
Following the objective function of basic 1bit matrix completion method [2]
, the fundamental loss function is defined as:
where is the matrix operation of applying over elementwise, and is the all 1’s matrix. Here is the indicator function, i.e. when is true, else . can be implemented as two mask matrices and of the same size as , where if , otherwise , and if , otherwise . Then, the fundamental loss function can be transformed into:
(10) 
where means the elementwise product of two matrices. We notate and . After adding the regularization term, the new loss function can be formulated as:
(11) 
Our goal is to predict the missing entries of the rating matrix, which can be computed by:
(12) 
We solve the optimization problem (12) via the Alternating direction method of multipliers (ADMM). Firstly, to update the latent factors of users and user clusters, we fix and , and minimize (12) by estimating and :
(13) 
(14) 
Then for items and item clusters, we fix and , conducting following computations:
(15) 
(16) 
Each of subproblems (13)  (16) can be solved by the gradient descent algorithm. We can work out the gradient in the following way. First we take as the Sigmoid function defined in (3), then it is easy to check that:
Considering (7
), with the matrix differentiation chain rule, it can be proved that:
(17)  
(18) 
On the one hand, we have
Comments
There are no comments yet.