Cluster Developing 1-Bit Matrix Completion

04/07/2019 ∙ by Chengkun Zhang. Junbin Gao, et al. ∙ The University of Sydney 0

Matrix completion has a long-time history of usage as the core technique of recommender systems. In particular, 1-bit matrix completion, which considers the prediction as a "Recommended" or "Not Recommended" question, has proved its significance and validity in the field. However, while customers and products aggregate into interacted clusters, state-of-the-art model-based 1-bit recommender systems do not take the consideration of grouping bias. To tackle the gap, this paper introduced Group-Specific 1-bit Matrix Completion (GS1MC) by first-time consolidating group-specific effects into 1-bit recommender systems under the low-rank latent variable framework. Additionally, to empower GS1MC even when grouping information is unobtainable, Cluster Developing Matrix Completion (CDMC) was proposed by integrating the sparse subspace clustering technique into GS1MC. Namely, CDMC allows clustering users/items and to leverage their group effects into matrix completion at the same time. Experiments on synthetic and real-world data show that GS1MC outperforms the current 1-bit matrix completion methods. Meanwhile, it is compelling that CDMC can successfully capture items' genre features only based on sparse binary user-item interactive data. Notably, GS1MC provides a new insight to incorporate and evaluate the efficacy of clustering methods while CDMC can be served as a new tool to explore unrevealed social behavior or market phenomenon.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recommender systems aim at improving customers’ experience by maximizing the use of the available information, including user-item interactive data, such as ratings or clicking behavior, and attribute information, such as category or context profiles. Methods that utilize the interaction data are referred as collaborative filtering [10, 22, 26, 29] while the other methods that use the textual information are referred as content-based methods [5, 24, 30]

. In particular, collaborative filtering is a method predicting the missing ratings given by a specific user to a specific item. Based on the idea that users and items are highly correlated to each other, the unspecified ratings can be estimated via learning the hidden relations.

Collaborative filtering can be seen as a special case of matrix completion task. It has become a cornerstone of most powerful recommender systems while it is mainly founded on two main streams of methods: neighbourhood-based methods [10, 17, 22, 26, 29] and model-based methods [6, 1, 18, 23, 31, 32]. Though neighbourhood-based methods are easy to interpret and implement, they cannot extract enough information and suffer from low prediction accuracy when observed data is sparse. In this case, dimension reduction methods [1, 4, 28] and graphs [14, 25]

were tried to address the sensitivity issue. Alternatively, model-based methods define a parameterized model which can be optimized by the available data during the training process. Numerous model-based approaches were tested in previous research, including Support Vector Machines

[15], Maximum Entropy [36]

, Boltzmann Machines

[27]

and Singular Value Decomposition (SVD)

[18, 23, 31, 32].

Under the assumption that the continuation of data points is convincing and compelling, standard collaborative filtering methods take observed entries of a rating matrix as real numbers. However, the adequacy of this measurement is undoubtedly questionable when intervals between data points are different. For instance, personal judgments from different customers vary as a result of personality. Say, generous customers tend to give fairly higher ratings than curmudgeon customers. Thus, instead of taking data as continuous numbers, it is more feasible considering them as categories, especially the binary case. For instance, researchers [2, 7, 9] use a small part of binary subset generated by the real-valued entries, namely ‘’ for “Recommended” and ‘’ for “Not Recommended”. Experiments show their approaches perform significantly better than continuous matrix completion methods.

Although 1-bit matrix completion has proven its success in recommender systems, same as most other matrix completion methods, it suffers from a fundamental limitation: every user/item is treated merely as standalone individuals, which arrogantly ignores the homogeneity of products and the clustering characteristic of social behaviors. For instance, some fundamental management theory points out that people have a propensity of conformity nature based on demographic, psychographic and behavioral variables [21]. Some recent research was noticed focusing on integrating preliminary clusters into continuous matrix completion task [3] and experiments demonstrated that their approach outclassed traditional SVD methods. However, to the best of our knowledge, so far there is not any 1-bit matrix completion methods taking cluster information into consideration. Moreover, state-of-the-art recommender systems either take clustering as an independent task or treat clusters as preliminaries, there is not any existing method for summarizing clusters along with matrix completion. Since the clustering nature of individuals plays a vital role in social behavior research, it is consequently significant to introduce a new method that learns the clusters, on the other hand also to utilize the clustering effects for matrix completion. In this work, we focus on two tasks: (i) integrating group information into 1-bit matrix completion, namely group-specific 1-bit matrix completion (GS1MC), and (ii) proposing an efficient algorithm of cluster developing matrix completion on the binary case, viz. cluster developing matrix completion (CDMC).

To exploit the grouping effects, based on current latent variable model, we expand the scope of quantized matrix completion to developing clusters automatically as well as leveraging their effects. The proposed methods can be used to take advantages of preliminary known user/item clusters or learn the groups during the training process according to the subspace correlations of targets. Experimentally, we show that the proposed GS1MC outperforms existing known model-based 1-bit matrix completion methods. And more importantly, CDMC successfully captures targets’ generic features and achieves convergence of both user/item clusters.

The rest of the paper is constructed as follows: In Section 2, we discuss preliminary knowledge and background of the problem setting. In Section 3, we introduce group-specific 1-bit matrix completion(GS1MC). In Section 4, the method is further extended to positively learn the cluster identities: cluster developing matrix completion(CDMC). In Section 5, we evaluate our method on synthetic data as well as a real-world application. Section 6 presents the discussion and future aspiration.

2 Background

In this section, we discuss some preliminary knowledge of the research, including traditional SVD-based matrix completion, the framework of probabilistic 1-bit matrix completion and sparse subspace clustering techniques.

2.1 Matrix Completion

Consider as the original utility matrix, where and are the number of users and items, respectively. Within , each is the explicit feedback given by user towards item of a scale, e.g., from to

, where the intervals probably differ as a result of personal bias. Regularized SVD (RSVD)

[13] predictor assumes as a low-rank matrix because of instance correlations and make the approximation (prediction) by:

(1)

where and are -dimensional latent variables associated to user and item , respectively. RSVD estimates the latent variables by minimizing the sum of residuals of observed entries via gradient descent method with a regularization term:

and

where denotes all items rated by user and stands for all users who rated item .

As the most fundamental SVD method, RSVD has been extended in different directions. For instance, a variety of regularization terms were applied for specific considerations [35], and biased version of SVD methods [19, 20, 23] were also introduced. To take the advantages of the general preference of each user and discrimination of each item, a set of biasing variables were incorporated in biased SVD methods. Then, apart from taking individual-specific bias, users/items can also be allocated into clusters and aggregated with group effects. For instance, taking preliminary cluster identities as inputs, a set of latent variables representing the group bias [3] can be learned via the training process.

2.2 1-Bit Matrix Completion

Though matrix completion methods have been used for recommender systems for long, 1-bit matrix completion [9] has been officially introduced lately. Varied from the continuous model which applies numerical computation on discrete rating data directly, original observation is converted into a binary matrix by comparing each observed entry to the average rating score. Then, the objective of the task is formalized as learning an latent variable matrix . The predicted binary feedback is finally computed by:

(2)

where is the set of all the observed entries and

can be the Sigmoid function defined as:

(3)

Similar to other low-rank matrix completion methods, a wide variety of approaches have been applied to constrain the latent variable matrix. For instance, a trace-norm [9] was considered under the assumption of uniform sampling. Then, a max-norm method as a convex relaxation [7] was explored under a general sampling model. Moreover, the theory has been extended further to discuss the exact low-rank constraint [2]. However, all these existing 1-bit matrix completion methods treat every instance as autonomous individuals. In other words, predictions have been made generously, ignoring the ground truth that users/items tend to have a specific baseline or belong to certain clusters. Furthermore, as far as we know, there is not any methodology that can both learn the cluster identities and leverage their group effects for matrix completion at the same time.

2.3 Sparse Subspace Clustering

Sparse subspace clustering (SSC) [12] aims at clustering data points in their low-dimensional subspace via the self expressive matrix, which represents each instance by an affine combination of other points within the same subspace.

Nevertheless, in terms of the fact that representations for each data point by the other should be as sparse as possible, which results in an NP-hard problem, a convex relaxation must be proposed to get around the NP difficulty. Thus, SSC formalizes the original problem as a -norm optimization task. Take the most standard procedure as an example, SSC assumes the whole noise-free dataset can be separated into subspaces of dimensions . Alternatively speaking, the matrix of the whole dataset can be written as:

where is an unknown permutation matrix and is a subset of the data points lying in , namely a -rank matrix of points (). Now, each data point can be reconstructed by a combination of other points within the same subspace as:

(4)

Then, different norm functions can be applied for the estimation of (4). Finally the problem is defined as, under the -norm constraint,

(5)

where corresponds to the non-trivial subspace-sparse representation for all the data points s.

Since user-item interaction data is exceedingly sparse and high-dimensional, many dimensions are irrelevant and covered by noise. In the meantime, the correlation between individuals can be interpreted as similarities of their private latent variable, which is not strictly around any centroids. Thus, conventional clustering methods that utilizing the spatial proximity is not applicable in this case. Differently, subspace clustering methods aim at grouping the points that are not necessarily close but lie in the same subspace, which does not depend on the spatial characteristic of the data. Moreover, as sparse subspace clustering deploys a convex approach to pick out the sparse representation of each point, the optimization process automatically eliminates some common issues of clustering methods, such as sensitivity to the ideal cluster size and bordering matter of the overlapped subspace.

3 Group-specific 1-Bit Matrix Completion (GS1MC)

In this section, we integrate group effects into 1-bit matrix completion task such that biases of clusters can be learned along with latent variable training process.

3.1 Model Framework

Suppose is the observed binary rating matrix with entries equal to ‘’ or ‘’, corresponding to “interested” or “not interested”, where is the number of users/items, the “not observed” entries are represented by ‘’. stands for the observed user-item pairs, i.e. entries with same indexes as ‘’ and ‘’ in . We construct the latent variable matrix as . To make predictions for missing entries by (2), our main objective is to find the estimation of that best explains the observed data.

Since it has been proved that the exact low-rank method results in a high convergence rate [2], especially when the fraction of revealed entries is small (cold-start problem), we choose to apply an exact low-rank constraint on

. We assume that every user/item is classified into one single user/item group, respectively. We formulate the latent variable matrix

by integrating group bias into matrix factorization. Then each entry in can be written as:

(6)

Here and are -dimensional latent factors standing for user u’s preference and item i’s character, while and represent biases of clusters that individuals belong to. For instance, means the cluster effect of user cluster , i.e. the cluster user belongs to. Here we have assumed that there are users clusters and item clusters, such that and . Then, the group effects of the user and item clusters can be formalized as:

For the sake of convenience, in terms of matrix notations, we assume the user-item interaction data and its corresponding latent variable have been permuted such that the first rows corresponds user cluster 1, followed by rows corresponding to user cluster 2, …, and the last rows corresponding to user cluster . Similarly, the columns have been rearranged accordingly. After this alteration, the decomposition (6) can be written as the following matrix format:

(7)

where

Here stands for -dimensional (column) vector of all ‘1’s. In other words, instances of group effects matrix and have been duplicated in order to match the dimension of matrix and . For the convenience of the transformation between , and , , we define the following two matrices:

Thus, it is clear that , and , can be transformed to each other by:

(8)

Then, (7) can be rewritten as:

(9)

3.2 Objective and Optimization

Following the objective function of basic 1-bit matrix completion method [2]

, the fundamental loss function is defined as:

where is the matrix operation of applying over element-wise, and is the all 1’s matrix. Here is the indicator function, i.e. when is true, else . can be implemented as two mask matrices and of the same size as , where if , otherwise , and if , otherwise . Then, the fundamental loss function can be transformed into:

(10)

where means the element-wise product of two matrices. We notate and . After adding the regularization term, the new loss function can be formulated as:

(11)

Our goal is to predict the missing entries of the rating matrix, which can be computed by:

(12)

We solve the optimization problem (12) via the Alternating direction method of multipliers (ADMM). Firstly, to update the latent factors of users and user clusters, we fix and , and minimize (12) by estimating and :

(13)
(14)

Then for items and item clusters, we fix and , conducting following computations:

(15)
(16)

Each of sub-problems (13) - (16) can be solved by the gradient descent algorithm. We can work out the gradient in the following way. First we take as the Sigmoid function defined in (3), then it is easy to check that:

Considering (7

), with the matrix differentiation chain rule, it can be proved that:

(17)
(18)

On the one hand, we have

On the other hand, according to (8), it is clear to state that:

According to the chain rules, we finally get:

In other words, the sum of the first rows of is the first row of , the sum of the next rows of is the second row of , …, and the sum of the last rows of becomes the -th row (the last row) of . The similar way can be used to construct from .

4 Cluster Developing Matrix Completion (CDMC)

In this section, we intend to learn the cluster identities of users/items during the latent variable training process and integrate the clustering results with group-specific matrix completion.

4.0.1 Problem Setting

The model (GS1MC) proposed in Section 3 takes cluster identities as preliminary information. However, in most practical scenarios, it might be inaccessible to such details, especially for the cold-start problem. Secondly, since the original binary user-item interaction data is extremely sparse, it is controversial to apply standard clustering techniques on it directly. Moreover, common clustering methods may take advantage of distance between points to divide the space into different partitions. Nevertheless, regarding a latent variable model, market segments may not necessarily congregate based on spatial proximity but lie in a subspace. Thus, found on GS1MC, we aim at clustering users/items that belong to a union of low-dimensional subspace respectively.

A common dilemma for most clustering techniques is the drawback that they might be decidedly sensitive to improper initialization, such as cluster size and centroids. As long as the size of user/item clusters is unrevealed and each data points can have an infinite number of expressions in terms of the other, we incorporate sparse subspace clustering (SSC) technique to optimize a sparse representation among these expressions through a convex realization approach.

4.0.2 Algorithm

Based on GS1MC, we extend the scope of the method to developing clusters during the latent variable training process.

In the last session, we deploy ADMM to optimize latent variables and in an iterative manner. Now, to develop clusters based on the gradually recovering matrix, after each iteration of updating latent variables and , we construct the rating likelihood matrix and via (9) and (3). We consider the rating likelihood matrix lies in disjoint subspaces while lies in . According to Theorems 2 and 3 from [12], we employ the -norm relaxation of the self-expressive matrix to obtain the sparse representation / for users/items’ features respectively, namely:

(19)

Here, each column of and stands for an user/item’s hidden profile, and within each column, non-zero entries correspond to the other homogeneous points that lie in the same subspace with this point in the ideal case.

Next, a non-directional weighted graph of is built as , where is the nodes regarding all sparse representations in , and is the weighted edges between each pair of

. A natural choice of the weighted matrix is that the nodes within the same subspace will share non-zero weighted edges while the other edges are zero-weighted. Alternatively speaking, an affinity matrix can be constructed by

, where the non-zero entries represents latent variable pairs that actually lie in the same subspace. Then, we apply spectral clustering method on

to procure item clusters. Similar method is conducted to build for user cluster developing.

After observing new clusters from last step, we update group identities of each user/item. Then, to leverage group effects of the latest clusters into matrix completion, we estimate latent variables by (13) to (16) again. Thus, CDMC conducts sparse subspace clustering and GS1MC iteratively. The complete algorithm is shown in Algorithm 1.

1procedure CDMC
2     Randomly initialize user/item groups
3     Update latent variables by (13) to (16)
4     loop:
5     Construct Matrix by (9) and (3)
6     Build adjacency matrices , , weighted graphs and
7     Apply spectral clustering on and
8     Update cluster identities
9     Update latent variables by (13) to (16) in a smaller inner loop
10     If not converged, goto Step 4: loop.
Algorithm 1 Cluster Developing 1-bit Matrix Completion

5 Experiments

In this section, we evaluate the proposed GS1MC and its extension CDMC, separately. The experiments are based on simulation analysis as well as benchmark comparison on a real-world dataset.

5.1 Dataset and Experiment Settings

To start with, to verify the effectiveness of GS1MC, a synthetic dataset with group information was designed in the following way. Firstly, we set , , and . Then we generate and , where is an

-order identity matrix. To include the group information, we design

and , where , , and . Then, we construct the latent variable matrix by and scale it so that . Now, we take the 1-bit transformation and add the noise by . We keep a certain percentage of entries as observations, where is the observation rate.

Notably, we also tested our methods on one of the most common recommender system benchmark dataset: Movielens [16]. This user-item interaction data consists of 100,000 ratings (1-5) from 943 users on 1682 movies. Following the problem settng of previous literature [2, 7, 9], the original observations, scaled from 1 to 5, have been quantized as ‘+1’ and ‘-1’ according to whether they are above or below the average score.

The proposed method is implemented and tested in Matlab R2017b on a PC with Intel(R) Core(TM) i5-7600 CPU @ 3.500GHz and 8.00GB RAM.

5.2 Experiments on GS1MC

5.2.1 Simulation Analysis

We set and . Then we randomly split the data in terms of different training size, namely for cross-validation. we assume the right group identities are preliminary and compare our method with a Trace-Norm approach [9]. The tuning parameter for the proposed method is selected as 37 by minimizing the average relative error while the parameters search for the Trace-Norm approach is embedded in the original implementation. The best results for both methods, shown in Table 1, are chosen among 100 replications. It is indicated that GS1MC has much smaller relative error compared to traditional 1-bit matrix completion, especially when the observed data is sparse (cold-start problem) or when the latent variable have higher dimensions.

It is straightforward to comprehend the result: since group effects can be regarded as extra information compared to the observed sparse matrix, GS1MC can have a much more robust performance compared to the fundamental 1-bit matrix completion when the observed information is limited or when the complexity of the latent variable is high.

No. of latent factors Observation Rate: 10% 15% 20% 25%
K = 3 The Proposed Method 1.00 0.85 0.78 0.73
Trace-Norm 1.89 1.74 1.67 1.59
K = 6 The Proposed Method 1.00 0.92 0.81 0.74
Trace-Norm 2.53 2.27 2.15 2.02
Table 1: The relative error is computed by: . We have compared the results for synthetic data of different ranks and observation rates, namely , and
% Prediction accuracy
% Training size The proposed method Exact-rank HL Logit Trace-norm Max-norm
95 74.1 73.0 72.0 68.0 73.0 72.2
10 66.3 61.0 - - 59.0 59.0
5 63.3 54.5 - - 49.9 50.5
Table 2: Prediction accuracy of GS1MC versus methods in previous literature when the observation rate is 95%, 10% and 5%.

5.2.2 Movielens Dataset

Since the fact that the cluster information of most user-item interaction data is not available, to provide GS1MC cluster information, we group the original dataset according to their implicit feedback. Implicit feedback refers to the density of items receiving comments or the frequency of people giving feedback. In other words, people tend not to choose items randomly but choose things they already expected [11]. Thus, implicit feedback only concerns about the identity of ratings irrespective of actual rating values. It is expected that people giving more ratings tend to be more curmudgeon while items with more feedback tend to have higher average ratings [3]. Thus, we group users and items according to the number of ratings they have given or received.

We compared GS1MC with the other existing 1-bit matrix completion methods, namely: a) hinge loss with variational approximation (HL) [8], (b) Bayesian logistic model with variational approximation (Logit) [8], (c) the trace-norm frequentist logistic model (Trace-norm) [9], (d) the exact low-rank model (Exact-rank) [2] and (e) a max-norm constrained minimization approach (Max-norm) [7]. Following their experiment setup, Movielens dataset has been split into different training-test size (Note: Here, the training size is not the observation rate in the simulation anaysis). Since some methods are not open-sourced, we compared our results with the best results appeared in previous literature. The converged accuracy results are displayed in Table 2. It is easy to reveal that the proposed method has outperformed all the other baselines. Conspicuously, regarding the scenario when the training size is extremely small (5%), our method has greatly boosted traditional binary matrix completion method by utilizing the group information.

5.3 Experiments on CDMC

5.3.1 Robustness Analysis

As far as our knowledge, there is not any comparable baseline for clustering problems in recommender systems research. Thus, to evaluate the convergence performance of CDMC, we conduct the first experiment on Movielens dataset.

To start with, we split the data (95%, 5%) and initialize group identities of users/items randomly. Then we train the CDMC

model for a number of epochs until the clustering results tend to stabilize (200 epochs for 95% Movielens

dataset). So far, the produced cluster identities of each instance are stored as the baseline. Afterwards, we re-conduct the process multiple times with completely random initialization. Namely, another (95%, 5%) entries of the dataset are split for cross-validation, and all the cluster identities, as well as all latent variables, are determined arbitrarily. We use adjusted mutual information (AMI) [33] as the evaluation score to measure the degree of matching, regarding the clustering results from multiple cross-validation processes. As Figure 0(a) shows, both user/item clusters converge to a highly similar distribution over the training epochs.

Meanwhile, during each iteration of the optimization process, we construct and make the prediction on the test set. The recorded misclassification rate is shown in Figure 0(b). It is indicated that the misclassification rate gradually stabilize as the cluster developing process proceeds and the resulted prediction accuracy is highly comparable with GS1MC proposed in Session 3, even for this case the cluster information is totally unknown.

(a) Cluster convergence.
(b) Misclassification rate.
Figure 1: (a) AMI fitting score of clusters via cross-validation based on fully random initialization. Converged clusters tend to match with the other. (b) The misclassification rate decreases as training proceeds and the prediction accuracy converges to a comparable value with GS1MC even CDMC did not take any preliminary cluster information.

5.3.2 Clustering Outcome

For item clusters, we use three dimensions of item-related latent variable as axis. The learned clusters are visualized in Figure 1(a). Similarly, developed user clusters are plotted on user-related latent variable . As Figure 2 shows, the item clusters are more dispersed and differentiable while the user clusters gather in closer proximity.

(a) Item clusters.
(b) User clusters.
Figure 2: The clustering results of CDMC.

In order to validate the practical influence of CDMC, we project the actual profile features of each user/item onto the latent variable CDMC learned and discovered some noteworthy findings.

Firstly, since there are 19 categories of items available, and each movie can be labeled as multiple genres. We extracted this information and constructed a genre matrix , here means item- can be classified in category-. As items in share the 1-to-1 exact same index with

, we applied k-means clustering method on this generic information and visualized its results corresponding to the latent variable that CDMC learned.

As shown in Figure 2(b), it is compelling that the learned latent variable have a clear discernible pattern regarding items’ generic features. In other words, even though the fact that our proposed CDMC method did not take any generic information, it has captured items’ factual profile based on only the sparse rating matrix. Besides, as CDMC conducts sparse subspace clustering and group-specific matrix completion in an iterative manner, along with gradually learning the hidden profiles, the model can integrate this information immediately into matrix completion task, which in turn positively boost the next iteration’s clustering.

Similarly, we build a feature matrix of users based on their context profiles, including age, gender and occupations. The clustering result is projected on and shown in Figure 2(c). As we expected, understanding human’s preference is a much more complicated task, and the clustering result is visibly more confusing. But it is still noticeable in the plot that blue and purple nodes gather in the vertically higher part of the space while yellow and green ones are distributed below. As pointed out in the previous literature [34], it is a quite common issue that multiple individuals might share a single account, which has biased the accuracy of the profile information.

(a) CDMC: Items.
(b) Item Genres.
(c) User profile.
Figure 3: (a) Item clustering results of CDMC. (b) Items’ genre information is reflected on the latent variable in a clear differentiable pattern. (c) User clusters show much higher complexity, but it is still noticeable that blue and purple nodes aggregate on vertically higher part of the space while yellow and green points are distributed below.

6 Conclusions and Future Works

In this paper, we introduced group-specific matrix factorization into 1-bit matrix completion task and proposed GS1MC. Then we first time integrated sparse subspace clustering with matrix completion task and proposed CDMC, extending the scope of GS1MC from passively receiving preliminary cluster information into positively developing clusters and leveraging their effects. Experiments show GS1MC outperforms existing methods on both synthetic and real-world data, especially for the cold-start problem. And CDMC successfully captures items’ hidden generic features from highly sparse binary rating matrix. It is noteworthy that GS1MC and CDMC provide a new insight to evaluate the quality of clusters or to detect undiscovered segments. For instance, when integrating implicit feedback clusters into GS1MC, the prediction accuracy was greatly boosted compared to previous methods. In terms of CDMC, our experiments show movies’ genres have a large impact on their popularity among certain audience while users’ age, gender and occupation tends to have slighter effects on their preference. For future work, it will be valuable to apply GS1MC and CDMC into more real-world applications and discover possible unrevealed social behavior and market phenomenon.

References

  • [1] Bell, R., Koren, Y., Volinsky, C.: Modeling relationships at multiple scales to improve accuracy of large recommender systems. Proceedings of the 13th ACM SIGKDD pp. 95–104 (2007)
  • [2] Bhaskar, S., Javanmard, A.: 1-bit matrix completion under exact low-rank constraint. arXiv:1502.06689 (2015)
  • [3] Bi, X., Qu, A., Wang, J., Shen, X.: A group-specific recommender system. Journal of the American Statistical Association pp. 1344–1353 (2017)
  • [4] Billsus, D., Pazzani, M.J.: Learning collaborative information filters. ICML pp. 46–54 (1998)
  • [5] Billsus, D., Pazzani, M.J.: User modeling for adaptive news access. UMUAI pp. 147–180 (2000)
  • [6] Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. UAI pp. 43–52 (1998)
  • [7] Cai, T., Zhou, W.X.: A max-norm constrained minimization approach to 1-bit matrix completion. JMLR pp. 3619–3647 (2013)
  • [8] Cottet, V., Alquier, P.: 1-bit matrix completion: PAC-Bayesian analysis of a variational approximation. arXiv preprint arXiv:1604.04191 (2016)
  • [9] Davenport, M.A., Plan, Y., Van Den Berg, E., Wootters, M.: 1-bit matrix completion. Information and Inference: A Journal of the IMA pp. 189–223 (2014)
  • [10] Deshpande, M., Karypis, G.: Item-based top-n recommendation algorithms. ACM TOIS pp. 143–177 (2004)
  • [11] Devooght, R., Kourtellis, N., Mantrach, A.: Dynamic matrix factorization with priors on unknown values. Proceedings of the 21th ACM SIGKDD pp. 189–198 (2015)
  • [12] Elhamifar, E., Vidal, R.: Sparse subspace clustering: Algorithm, theory, and applications. IEEE TPAMI pp. 2765–2781 (2013)
  • [13] Funk, S.: Netflix update: Try this at home. https://sifter.org/simon/journal/20061211.html (2006)
  • [14] Gori, M., Pucci, A., Roma, V., Siena, I.: Itemrank: A random-walk based scoring algorithm for recommender engines. IJCAI pp. 2766–2771 (2007)
  • [15]

    Grčar, M., Fortuna, B., Mladenič, D., Grobelnik, M.: KNN versus SVM in the collaborative filtering framework. pp. 251–260. Springer (2006)

  • [16] Harper, F.M., Konstan, J.A.: The movielens datasets: History and context. ACM TIIS p. 19 (2016)
  • [17] Joaquin, D., Naohiro, I.: Memory-based weighted-majority prediction for recommender systems. ACM SIGIR (1999)
  • [18]

    Koren, Y.: Factorization meets the neighborhood: a multifaceted collaborative filtering model. Proceedings of the 14th ACM SIGKDD pp. 426–434 (2008)

  • [19] Koren, Y., Bell, R.: Advances in collaborative filtering. pp. 77–118. Springer (2015)
  • [20] Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer (2009)
  • [21] Kotler, P.: Marketing management: A south Asian perspective. Pearson Education India (2009)
  • [22] Linden, G., Smith, B., York, J.: Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing pp. 76–80 (2003)
  • [23] Paterek, A.: Improving regularized singular value decomposition for collaborative filtering. Proceedings of KDD Cup and Workshop pp. 5–8 (2007)
  • [24] Pazzani, M., Billsus, D.: Learning and revising user profiles: The identification of interesting web sites. ML pp. 313–331 (1997)
  • [25] Pirotte, A., Renders, J.M., Saerens, M., et al.: Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE TKDE pp. 355–369 (2007)
  • [26] Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: Grouplens: an open architecture for collaborative filtering of netnews. Proceedings of the 1994 ACM CSCW pp. 175–186 (1994)
  • [27]

    Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted Boltzmann machines for collaborative filtering. Proceedings of the 24th ICML pp. 791–798 (2007)

  • [28] Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Application of dimensionality reduction in recommender system-a case study. Tech. Rep. (2000)
  • [29] Sarwar, B.M., et al.: Item-based collaborative filtering recommendation algorithms. WWW pp. 285–295 (2001)
  • [30] Shoham, Y.: Combining content-based and collaborative recommendation. Communications of the ACM (1997)
  • [31] Takács, G., Pilászy, I., Németh, B., Tikk, D.: Investigation of various matrix factorization methods for large recommender systems pp. 553–562 (2008)
  • [32] Takács, G., Pilászy, I., Németh, B., Tikk, D.: Scalable collaborative filtering approaches for large recommender systems. JMLR pp. 623–656 (2009)
  • [33] Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. JMLR pp. 2837–2854 (2010)
  • [34] Zhang, A., Fawaz, N., Ioannidis, S., Montanari, A.: Guess who rated this movie: Identifying users through subspace clustering. arXiv e-prints p. 1208.1544 (2012)
  • [35] Zhu, Y., Shen, X., Ye, C.: Personalized prediction and sparsity pursuit in latent factor models. Journal of the American Statistical Association pp. 241–252 (2016)
  • [36] Zitnick, C.L., Kanade, T.: Maximum entropy for collaborative filtering. Proceedings of the 20th Conference on UAI pp. 636–643 (2004)

2 Background

In this section, we discuss some preliminary knowledge of the research, including traditional SVD-based matrix completion, the framework of probabilistic 1-bit matrix completion and sparse subspace clustering techniques.

2.1 Matrix Completion

Consider as the original utility matrix, where and are the number of users and items, respectively. Within , each is the explicit feedback given by user towards item of a scale, e.g., from to

, where the intervals probably differ as a result of personal bias. Regularized SVD (RSVD)

[13] predictor assumes as a low-rank matrix because of instance correlations and make the approximation (prediction) by:

(1)

where and are -dimensional latent variables associated to user and item , respectively. RSVD estimates the latent variables by minimizing the sum of residuals of observed entries via gradient descent method with a regularization term:

and

where denotes all items rated by user and stands for all users who rated item .

As the most fundamental SVD method, RSVD has been extended in different directions. For instance, a variety of regularization terms were applied for specific considerations [35], and biased version of SVD methods [19, 20, 23] were also introduced. To take the advantages of the general preference of each user and discrimination of each item, a set of biasing variables were incorporated in biased SVD methods. Then, apart from taking individual-specific bias, users/items can also be allocated into clusters and aggregated with group effects. For instance, taking preliminary cluster identities as inputs, a set of latent variables representing the group bias [3] can be learned via the training process.

2.2 1-Bit Matrix Completion

Though matrix completion methods have been used for recommender systems for long, 1-bit matrix completion [9] has been officially introduced lately. Varied from the continuous model which applies numerical computation on discrete rating data directly, original observation is converted into a binary matrix by comparing each observed entry to the average rating score. Then, the objective of the task is formalized as learning an latent variable matrix . The predicted binary feedback is finally computed by:

(2)

where is the set of all the observed entries and

can be the Sigmoid function defined as:

(3)

Similar to other low-rank matrix completion methods, a wide variety of approaches have been applied to constrain the latent variable matrix. For instance, a trace-norm [9] was considered under the assumption of uniform sampling. Then, a max-norm method as a convex relaxation [7] was explored under a general sampling model. Moreover, the theory has been extended further to discuss the exact low-rank constraint [2]. However, all these existing 1-bit matrix completion methods treat every instance as autonomous individuals. In other words, predictions have been made generously, ignoring the ground truth that users/items tend to have a specific baseline or belong to certain clusters. Furthermore, as far as we know, there is not any methodology that can both learn the cluster identities and leverage their group effects for matrix completion at the same time.

2.3 Sparse Subspace Clustering

Sparse subspace clustering (SSC) [12] aims at clustering data points in their low-dimensional subspace via the self expressive matrix, which represents each instance by an affine combination of other points within the same subspace.

Nevertheless, in terms of the fact that representations for each data point by the other should be as sparse as possible, which results in an NP-hard problem, a convex relaxation must be proposed to get around the NP difficulty. Thus, SSC formalizes the original problem as a -norm optimization task. Take the most standard procedure as an example, SSC assumes the whole noise-free dataset can be separated into subspaces of dimensions . Alternatively speaking, the matrix of the whole dataset can be written as:

where is an unknown permutation matrix and is a subset of the data points lying in , namely a -rank matrix of points (). Now, each data point can be reconstructed by a combination of other points within the same subspace as:

(4)

Then, different norm functions can be applied for the estimation of (4). Finally the problem is defined as, under the -norm constraint,

(5)

where corresponds to the non-trivial subspace-sparse representation for all the data points s.

Since user-item interaction data is exceedingly sparse and high-dimensional, many dimensions are irrelevant and covered by noise. In the meantime, the correlation between individuals can be interpreted as similarities of their private latent variable, which is not strictly around any centroids. Thus, conventional clustering methods that utilizing the spatial proximity is not applicable in this case. Differently, subspace clustering methods aim at grouping the points that are not necessarily close but lie in the same subspace, which does not depend on the spatial characteristic of the data. Moreover, as sparse subspace clustering deploys a convex approach to pick out the sparse representation of each point, the optimization process automatically eliminates some common issues of clustering methods, such as sensitivity to the ideal cluster size and bordering matter of the overlapped subspace.

3 Group-specific 1-Bit Matrix Completion (GS1MC)

In this section, we integrate group effects into 1-bit matrix completion task such that biases of clusters can be learned along with latent variable training process.

3.1 Model Framework

Suppose is the observed binary rating matrix with entries equal to ‘’ or ‘’, corresponding to “interested” or “not interested”, where is the number of users/items, the “not observed” entries are represented by ‘’. stands for the observed user-item pairs, i.e. entries with same indexes as ‘’ and ‘’ in . We construct the latent variable matrix as . To make predictions for missing entries by (2), our main objective is to find the estimation of that best explains the observed data.

Since it has been proved that the exact low-rank method results in a high convergence rate [2], especially when the fraction of revealed entries is small (cold-start problem), we choose to apply an exact low-rank constraint on

. We assume that every user/item is classified into one single user/item group, respectively. We formulate the latent variable matrix

by integrating group bias into matrix factorization. Then each entry in can be written as:

(6)

Here and are -dimensional latent factors standing for user u’s preference and item i’s character, while and represent biases of clusters that individuals belong to. For instance, means the cluster effect of user cluster , i.e. the cluster user belongs to. Here we have assumed that there are users clusters and item clusters, such that and . Then, the group effects of the user and item clusters can be formalized as:

For the sake of convenience, in terms of matrix notations, we assume the user-item interaction data and its corresponding latent variable have been permuted such that the first rows corresponds user cluster 1, followed by rows corresponding to user cluster 2, …, and the last rows corresponding to user cluster . Similarly, the columns have been rearranged accordingly. After this alteration, the decomposition (6) can be written as the following matrix format:

(7)

where

Here stands for -dimensional (column) vector of all ‘1’s. In other words, instances of group effects matrix and have been duplicated in order to match the dimension of matrix and . For the convenience of the transformation between , and , , we define the following two matrices:

Thus, it is clear that , and , can be transformed to each other by:

(8)

Then, (7) can be rewritten as:

(9)

3.2 Objective and Optimization

Following the objective function of basic 1-bit matrix completion method [2]

, the fundamental loss function is defined as:

where is the matrix operation of applying over element-wise, and is the all 1’s matrix. Here is the indicator function, i.e. when is true, else . can be implemented as two mask matrices and of the same size as , where if , otherwise , and if , otherwise . Then, the fundamental loss function can be transformed into:

(10)

where means the element-wise product of two matrices. We notate and . After adding the regularization term, the new loss function can be formulated as:

(11)

Our goal is to predict the missing entries of the rating matrix, which can be computed by:

(12)

We solve the optimization problem (12) via the Alternating direction method of multipliers (ADMM). Firstly, to update the latent factors of users and user clusters, we fix and , and minimize (12) by estimating and :

(13)
(14)

Then for items and item clusters, we fix and , conducting following computations:

(15)
(16)

Each of sub-problems (13) - (16) can be solved by the gradient descent algorithm. We can work out the gradient in the following way. First we take as the Sigmoid function defined in (3), then it is easy to check that:

Considering (7

), with the matrix differentiation chain rule, it can be proved that:

(17)
(18)

On the one hand, we have