With the development of online applications, recommender systems have been widely adopted by many online services for helping their users find desirable items. However, it is still challenging to accurately and efficiently match items to their potential users, particularly with the ever-growing scales of items and users .
In the past, Collaborative Filtering (CF), as exemplified by Matrix Factorization (MF) algorithms  have demonstrated great successes in both academia and industry. MF factorizes an user-item rating matrix to project both users and items into a -dimensional latent feature space, where the user’s preference scores for items are predicted by the inner product between their latent features. However, the time complexity for generating top-k items recommendation for all users is . Therefore, MF-based methods are often computational expensive and inefficient when handling the large-scale recommendation applications [3, 2].
Recent studies show that the hashing-based recommendation algorithms, which encode both users and items into binary codes in Hamming space, are promising to tackle the efficiency challenge [26, 29]. In these methods, the preference score could be efficiently computed by Hamming distance. However, learning binary codes is generally NP-hard  due to the discrete constraints. To tackle this problem, the researchers resort to a two-stage hash learning procedure [18, 29]: relaxed optimization and binary quantization. Continuous representations are first computed by the relaxed optimization, and subsequently the hash codes are generated by binary quantization. This learning strategy indeed simplifies the optimization challenge. However, it inevitably suffers from significant quantization loss according to . Hence, several solutions are developed to directly optimizing the binary hash codes from the matrix factorization with discrete constraints. Despite much progress has been achieved, they still suffer from two problems: 1) Their recommendation process mainly relies on the user-item interactions and single specific content feature. Under such circumstances, they cannot provide meaningful recommendations for new users (e.g. for the new users who have no interaction history with the items). 2) They learn the hash codes with Discrete Coordinate Descent (DCD) that learns the hash codes bit-by-bit, which results in significant quantization loss or consumes considerable computation time.
In this paper, we propose a fast cold-start recommendation method, called Multi-Feature Discrete Collaborative Filtering (MFDCF) to alleviate these problems. Specifically, we propose a low-rank self-weighted multi-feature fusion module to adaptively preserve the multiple content features of users into the compact yet informative hash codes by sufficiently exploiting their complementarity. Our method is inspired by the success of the multiple feature fusion in other relevant areas [25, 24, 19, 31]. Further, we develop an efficient discrete optimization approach to directly solve binary hash codes by simple efficient operations without quantization errors. Finally, we evaluate the proposed method on two public recommendation datasets, and demonstrate its superior performance over state-of-the-art competing baselines.
The main contributions of this paper are summarized as follows:
We propose a Multi-Feature Discrete Collaborative Filtering (MFDCF) method to alleviate the cold-start recommendation problem. MFDCF directly and adaptively projects the multiple content features of users into binary hash codes by sufficiently exploiting their complementarity. To the best of our knowledge, there is still no similar work.
We develop an efficient discrete optimization strategy to directly learn the binary hash codes without relaxed quantization. This strategy avoids performance penalties from both the widely adopted discrete coordinate descent and the storage cost of huge interaction matrix.
We design a feature-adaptive hash code generation strategy to generate user hash codes that accurately capture the dynamic variations of cold-start user features. Experiments on the public recommendation datasets demonstrate the superior performance of the proposed method over the state-of-the-arts.
In this paper, we investigate the hashing-based collaborative filtering at the presence of multiple content features for fast cold-start recommendation. Hence, in this section, we mainly review the recent advanced hashing-based recommendation and cold-start recommendation methods.
A pioneer work,  is proposed to exploit Locality-Sensitive Hashing (LSH)  to generate hash codes for Google new readers based on their item-sharing history similarity. Based on this, [10, 30] followed the idea of Iterative Quantization  to project real latent representations into hash codes. To enhance discriminative capability of hash codes, de-correlation constraint  and Constant Feature Norm (CFN) constraint  are imposed when learning user/item latent representations. The above works basically follow a two-step learning strategy: relaxed optimization and binary quantization. As indicated by , this two-step approach will suffer from significant quantization loss.
is the first binarized collaborative filtering method and demonstrates superior performance over aforementioned two-stage recommendation methods. However, it is not applicable to cold-start recommendation scenarios. To address cold-start problem, on the basis of DCF, Discrete Deep Learning (DDL)
applies Deep Belief Network (DBN) to extract item representation from item content information, and combines the DBN with DCF. Discrete content-aware matrix factorization methods[14, 15] develop discrete optimization algorithms to learn binary codes for users and items at the presence of their respective content information. Discrete Factorization Machines (DFM)  learns hash codes for any side feature and models the pair-wise interactions between feature codes. Besides, since the above binary cold-start recommendation frameworks solve the hash codes with bit-by-bit discrete optimization, they still consumes considerable computation time.
The Proposed Method
Throughout this paper, we utilize bold lowercase letters to represent vectors and bold uppercase letters to represent matrices. All of the vectors in this paper denote column vectors. Non-bold letters represent scalars. We denoteas the trace of a matrix and as the Frobenius norm of a matrix. We denote as the round-off function.
Low-rank Self-weighted Multi-Feature Fusion
Given a training dataset , which contains user’s multiple features information represented with different content features (e.g. demographic information such as age, gender, occupation, and interaction preference extracted from item side information). The -th content feature is , where is the dimensionality of the -th content feature. Since the user’s multiple content features are quite diverse and heterogeneous, in this paper, we aim at adaptively mapping multiple content features into a consensus multi-feature representation ( is the hash code length) in a shared homogeneous space. Specifically, it is important to consider the complementarity of multiple content features and the generalization ability of the fusion module. Motivated by these considerations, we introduce a self-weighted fusion strategy and then formulate the multi-feature fusion part as:
where is the Frobenius norm of the matrix. is the mapping matrix of the -th content feature, is the consensus multi-feature representation. According to , Eq.(1) is equivalent to
where is the weight of the -th content feature and it measures the importance of the current content feature. is the probabilistic simplex.
In real-world recommender systems, such as Taobao111www.taobao.com and Amazon222www.amazon.com, there are many different kinds of users and items, which have rich and diverse characteristics. However, a specific user only has a small number of interactions in the system with limited items. Consequently, the side information of users and items would be pretty sparse. We need to handle a very high-dimensional and sparse feature matrix. To avoid spurious correlations caused by the mapping matrix, we impose a low-rank constraint on :
where is a penalty parameter and is the rank operator of a matrix. The low-rank constraint on helps highlight the latent shared features across different users and handles the extremely spare observations. Meanwhile, the low-rank constraint on makes the optimization more difficult. To tackle this problem, we adopt an explicit form of low-rank constraint as follows:
is the total number of singular values ofand represents the -th singular value of . Note that
where consists of the singular vectors which correspond to the -smallest singular values of . Thus, the multiple content features fusion module can be rewritten as:
Multi-Feature Discrete Collaborative Filtering
In this paper, we fuse multiple content features into binary hash codes with matrix factorization, which has been proved to be accurate and scalable on addressing the collaborative filtering problems. Discrete collaborative filtering generally maps both users and items into a joint low-dimensional Hamming space where the user-item preference is measured by the Hamming similarity between the binary hash codes.
Given a user-item rating matrix of size , where and are the number of users and items, respectively. Each entry indicates the rating of a user for an item . Let denote the binary hash codes for the -th user, and denote the binary hash codes for the -th item, the rating of user for item is approximated by Hamming similarity . Thus, the goal is to learn user binary matrix and item binary matrix , where is the hash code length. Similar to the problem of conventional collaborative filtering, the basic discrete collaborative filtering can be formulated as:
To address the sparse and cold-start problem, we integrate multiple content features into the above model, by substituting the user binary feature matrix with the rotated multi-feature representation ( is rotation matrix) and keeping their consistency during the optimization process. The formula is given as follows:
This formulation has three advantages: 1) Only one of the decomposed variable is imposed with discrete constraint. As shown in the optimization part, the hash codes can be learned with a simple operation instead of bit-by-bit discrete optimization used by existing discrete recommendation methods. The second regularization term can guarantee the acceptable information loss. 2) The learned hash codes can reflect user’s multiple content features via and involve the latent interactive features in simultaneously. 3) We extract user’s interactive preference from the side information of their rated items as content features. This design not only avoids the approximation of item binary matrix , reduces the complexity of the proposed model, but also effectively captures the content features of items.
Overall Objective Formulation
By integrating the above two parts into a unified learning framework, we derive the overall objective formulation of Multi-Feature Discrete Collaborative Filtering (MFDCF) as:
where are balance parameters. The first term projects multiple content features of users into a shared homogeneous space. The second and third terms minimize the information loss during the process of integrating the multiple content features with the basic discrete CF. The last term is a low-rank constraint for , which can highlight the latent shared features across different users.
Fast Discrete Optimization
Solving hash codes in Eq.(9) is essentially an NP-hard problem due to the discrete constraint on binary feature matrix. Existing discrete recommendation methods always learn the hash codes bit-by-bit with DCD . Although this strategy alleviates the quantization loss problem caused by conventional two-step relaxing-rounding optimization strategy, it is still time-consuming.
In this paper, with the favorable support of objective formulation, we propose to directly learn the discrete hash codes with fast optimization. Specifically, different from existing discrete recommendation methods [26, 28, 14, 17], we avoid explicitly computing the user-item rating matrix , and achieve linear computation and storage efficiency. We propose an effective optimization algorithm based on augmented Lagrangian multiplier (ALM) [16, 21]. In particular, we introduce an auxiliary variable to separate the constraint on , and transform the objective function Eq.(9) to an equivalent one that can be tackled more easily. Then the Eq.(9) is transformed as:
where denotes the variables that need to be solved in the objective function, measures the difference between the target and auxiliary variable, is a balance parameter. With this transformation, we follow the alternative optimization process by updating each of and , given others fixed.
Step 1: learning . For convenience, we denote as . By fixing the other variables, we ignore the term that is irrelevant to . The original problem can be rewritten as:
With Cauchy-Schwarz inequality, we derive that
where (a) holds since and the equality in (b) holds when . Since , we can obtain the optimal in Eq.(11) by
Step 2: learning . Removing the terms that are irrelevant to the , the optimization formula is rewritten as
We calculate the derivative of Eq.(13) with respect to and set it to zero,
By using the following substitutions,
Eq.(14) can be rewritten as , which can be efficiently solved by Sylvester operation in Matlab.
Step 3: learning . Similarly, the optimization formula for updating can be represented as
We introduce an auxiliary variable and substitute with , the Eq.(16) can be transformed into the following form
The optimal is defined as , where and are comprised of left-singular and right-singular vectors of respectively .
Note that, the user-item rating matrix is included in the term when updating . In real-world retail giants, such as Taobao and Amazon, there are hundreds of millions of users and even more items. In consequence, the user-item rating matrix would be pretty enormous and sparse. If we compute directly, the computational complexity will be and it is extremely expensive to calculate and store
. In this paper, we apply the singular value decomposition to obtain the left singular and right singular vectors as well as the corresponding singular values of. We utilize a diagonal matrix to store the o-largest () singular values, and employ an matrix , an matrix to store the corresponding left singular and right singular vectors respectively. We substitute with and the computational complexity can be reduced to .
Thus, the calculation of can be transformed as
With Eq.(18), both the computation and storage cost can be decreased with the guarantee of accuracy.
Step 4: learning . We calculate the derivative of objective function with respect to and set it to zero, then we get
where is substituted with , and then we have
The time complexity of computing is reduced to .
Step 5: learning . We calculate the derivative of objective function with respect to and respectively, and set them to zero, then we can obtain the closed solutions of as
where is also substituted with , and update rule of is transformed as
Step 6: learning . As described in Eq.(5), is stacked by the singular vectors which correspond to the -smallest singular values of . Thus we can solve the eigen-decomposition problem to get :
Step 7: learning . The objective function with respect to can be represented as
where . The optimal is defined as , where and are comprised of left-singular and right-singular vectors of respectively.
Step 8: learning . By fixing other variables, the update rule of is
Feature-adaptive Hash Code Generation for Cold-start Users
In the process of online recommendation, we aim to map multiple content features of the target users into binary hash codes with the learned hash projection matrix . When cold-start users have no rating history in the training set and are only associated with initial demographic information, the fixed feature weights obtained from offline hash code learning cannot address the feature-missing problem.
In this paper, with the support of offline hash learning, we propose to generate hash codes for cold-start users with a self-weighting scheme. The objective function is formulated as
where is the linear projection matrix from Eq.(9), is content feature of target users, and is the number of target users. As proved by , Eq.(26) can be shown to be equivalent to
We employ alternating optimization to update and . The update rules are
We evaluate the proposed method on two public recommendation datasets: Movielens-1M333https://grouplens.org/datasets/movielens/ and BookCrossing444https://grouplens.org/datasets/book-crossing/. In these two datasets, each user has only one rating for an item.
Movielens-1M: This dataset is collected from the MovieLens website by GroupLens Research. It originally includes 1,000,000 ratings from 6040 users for 3952 movies. The rating score is from 1 to 5 with 1 granularity. The users in this dataset are associated with demographic information (e.g. gender, age, and occupation), and the movies are related to 3-5 labels from a dictionary of 18 genre labels.
BookCrossing: This dataset is collected by Cai-Nicolas Ziegler from the Book-Crossing community. It contains 278,858 users providing 1,149,780 ratings (contain implicit and explicit feedback) about 271,379 books. The rating score is from 1 to 10 with 1 interval for explicit feedback, or expressed by 0 for implicit feedback. Most users in this dataset are associated with demographic information (e.g. age and location).
Considering the extreme sparsity of the original BookCrossing dataset, we remove the users with less than 20 ratings and the items rated by less than 20 users. After the filtering, there are 2,151 users, 6,830 items, and 180,595 ratings left in the BookCrossing dataset. For the MovieLens-1M dataset, we keep all users and items without any filtering. The statistics of the datasets are summarized in Table 2. The bag-of-words encoding method is used to extract the side information of the item, and one-hot encoding approach is adopted to generate feature representation of user’s demographic information. To accelerate the running speed, we follow and perform PCA to reduce the interactive preference feature dimension to 128. In our experiments, we randomly select users as cold-start users, and their ratings are removed. We repeat the experiments with 5 random splits and report the average values as the experimental results.
The goal of our proposed method is to find out the top-
items that user may be interested in. In our experiment, we adopt the evaluation metric Accuracy[28, 5] to evaluate whether the target user’s favorite items appear in the top- recommendation list.
where is the number of test cases, and is the total number of hits in the test set.
In this paper, we compare our approach with two state-of-the-art continuous value based recommendation methods and two hashing based binary recommendation methods.
Discrete Factorization Machines (DFM)  is the first binarized factorization machines method that learns the hash codes for any side feature and models the pair-wise interaction between feature codes.
Discrete Deep Learning (DDL)  is a binary deep recommendation approach. It adopts Deep Belief Network to extract item representation from item side information, and combines the DBN with DCF to solve the cold-start recommendation problem.
In experiments, we adopt 5-fold cross validation method on random split of training data to tune the optimal hyper-parameters of all compared approaches. All the best hyper-parameters are found by grid search.
In this subseciton, we evaluate the recommendation accuracy of MFDCF and the baselines in cold-start recommendation scenario. Figure 1 and 2 demonstrate the Accuracy of the compared approaches on two real-world recommendation datasets for the cold-start recommendation task. Compared with existing hashing-based recommendation approaches, the proposed MFDCF consistently outperforms the compared baselines. DFM exploits the factorization machine to model the potential relevance between user characteristics and product features. However, it ignores the collaborative interaction. DDL is based on the discrete collaborative filtering. It adopts DBN to generate item feature representation from their side information. Nevertheless, the structure of DBN is independent with the overall optimization process, which limits the learning capability of DDL. Additionally, these experimental results show that the proposed MFDCF outperforms the compared continuous value based hybrid recommendation methods under the same cold-start settings. The better performance of MFDCF than CBFKNN and ZSR validates the effects of the proposed multiple feature fusion strategy.
Parameter and convergence sensitivity analysis. We conduct experiments to observe the performance variations with the involved parameters . We fix the hash code length as 128 bits and report results on MovieLens-1M. Similar results can be found on other datasets and hash code lengths. Since , and are equipped in the same objective function, we change their values from the range of while fixing other parameters. Detailed experimental results are presented in Figure 5. From it, we can observe that the performance is relatively better when is in the range of , is in the range of , and is in the range of . The performance variations with shows that the low-rank constraint is well on highlighting the latent shared features across different users. The convergence curves recording the objective function of MFDCF method with the number of iterations are shown in Figure 4(a). This experiment result indicates that our proposed method converges very fast.
Efficiency v.s. hash code length and data size. We conduct the experiments to investigate the efficiency variations of MFDCF with the increase of hash code length and training data size on two datasets. The average time cost of training iteration is shown in Figure 4(b-c). When the hash code length is fixed as 32, each round of training iteration costs several seconds and scales linearly with the increase of data size. When running MFDCF on 100% training data, each round of iteration scales quadratically with the increase of code length due to the time complexity of optimization process is .
Run time comparison. In this experiment, we compare the computation efficiency of our approach with two state-of-the-art hashing-based recommendation methods DMF and DDL. Table 2 demonstrates the training time of these methods on MovieLens-1M using a 3.4GHz Intel Core(TM) i7-6700 CPU. Compared with DDL and DFM, our MFDCF is about 50 and 3 times faster respectively. The superior performance of the proposed method is attributed to that both DDL and DFM iteratively learn the hash codes bit-by-bit with discrete coordinate descent. Additionally, DDL requires to update the parameters of DBN iteratively, which consumes more time.
In this paper, we design a unified multi-feature discrete collaborative filtering method that projects multiple content features of users into the binary hash codes to support fast cold-start recommendation. Our model has four advantages: 1) handles the data sparsity problem with low-rank constraint. 2) enhances the discriminative capability of hash codes with multi-feature binary embedding. 3) generates feature-adaptive hash codes for varied cold-start users. 4) achieves computation and storage efficient discrete binary optimization. Experiments on two public recommendation datasets demonstrate the state-of-the-art performance of the proposed method.
The authors would like to thank the anonymous reviewers for their constructive and helpful suggestions. The work is partially supported by the National Natural Science Foundation of China (61802236, 61902223, U1836216), in part by the Natural Science Foundation of Shandong, China (No. ZR2019QF002), in part by the Youth Innovation Project of Shandong Universities, China (No. 2019KJN040), and in part by Taishan Scholar Project of Shandong, China.
-  (2019) A review on deep learning for recommender systems: challenges and remedies. Artif. Intell. Rev. 52 (1), pp. 1–37. Cited by: Introduction.
-  (2019) MMALFM: explainable recommendation by leveraging reviews and images. TOIS 37 (2), pp. 1–28. Cited by: Introduction.
A^3ncf: an adaptive aspect attention model for rating prediction. In IJCAI, pp. 3748–3754. Cited by: Introduction.
-  (2007) Google news personalization: scalable online collaborative filtering. In WWW, pp. 271–280. Cited by: Related Work.
-  (2018) Personalized video recommendation using rich contents from videos. TKDE. Cited by: Evaluation Metrics, Evaluation Metrics.
-  (2010) Learning attribute-to-feature mappings for cold-start recommendations. In ICDM, pp. 176–185. Cited by: 1st item.
-  (1999) Similarity search in high dimensions via hashing. In VLDB, pp. 518–529. Cited by: Related Work.
Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. TPAMI 35 (12), pp. 2916–2929. Cited by: Related Work.
-  (2001) Some optimal inapproximability results. J. ACM 48 (4), pp. 798–859. Cited by: Introduction.
-  (2010) Collaborative filtering on a budget. In AISTATS, pp. 389–396. Cited by: Related Work.
-  (2009) Matrix factorization techniques for recommender systems. IEEE Computer 42 (8), pp. 30–37. Cited by: Introduction.
-  (2019) Leveraging the invariant side of generative zero-shot learning. In CVPR, pp. 7402–7411. Cited by: 4th item.
-  (2019) From zero-shot learning to cold-start recommendation. In AAAI, pp. 4189–4196. Cited by: 4th item.
-  (2017) Discrete content-aware matrix factorization. In KDD, pp. 325–334. Cited by: Related Work, Fast Discrete Optimization.
-  (2019) Discrete matrix factorization and extension for fast item recommendation. TKDE DOI: 10.1109/TKDE.2019.2951386 (), pp. . External Links: Cited by: Related Work.
-  (2010) The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. CoRR abs/1009.5055. Cited by: Fast Discrete Optimization.
-  (2018) Discrete factorization machines for fast feature-based recommendation. In IJCAI, pp. 3449–3455. Cited by: Related Work, Fast Discrete Optimization, 2nd item.
-  (2014) Collaborative hashing. In CVPR, pp. 2147–2154. Cited by: Introduction, Related Work.
-  (2019) Flexible online multi-modal hashing for large-scale multimedia retrieval. In MM, pp. 1129–1137. Cited by: Introduction.
-  (2019) Online multi-modal hashing with dynamic query-adaption. In SIGIR, pp. 715–724. Cited by: Low-rank Self-weighted Multi-Feature Fusion, Feature-adaptive Hash Code Generation for Cold-start Users.
-  (2007) Nonlinear programming theory and algorithms. Technometrics 49 (1), pp. 105. Cited by: Fast Discrete Optimization.
-  (2015) Supervised discrete hashing. In CVPR, pp. 37–45. Cited by: Related Work, Fast Discrete Optimization.
-  (2017) Learning on big graph: label inference and regularization with anchor hierarchy. TKDE 29 (5), pp. 1101–1114. Cited by: Evaluation Datasets.
-  (2012) Multimodal graph-based reranking for web image search. TIP 21 (11), pp. 4649–4661. Cited by: Introduction.
-  (2018) First-person daily activity recognition with manipulated object proposals and non-linear feature fusion. TCSVT 28 (10), pp. 2946–2955. Cited by: Introduction.
-  (2016) Discrete collaborative filtering. In SIGIR, pp. 325–334. Cited by: Introduction, Related Work, Related Work, Fast Discrete Optimization.
-  (2017) Discrete personalized ranking for fast collaborative filtering from implicit feedback. In AAAI, pp. 1669–1675. Cited by: Introduction.
-  (2018) Discrete deep learning for fast content-aware recommendation. In WSDM, pp. 717–726. Cited by: Related Work, Fast Discrete Optimization, 3rd item, Evaluation Metrics, Evaluation Metrics.
-  (2014) Preference preserving hashing for efficient recommendation. In SIGIR, pp. 183–192. Cited by: Introduction, Related Work.
-  (2012) Learning binary codes for collaborative filtering. In KDD, pp. 498–506. Cited by: Related Work.
-  (2017) Discrete multi-modal hashing with canonical views for robust mobile landmark search. TMM 19 (9), pp. 2066–2079. Cited by: Introduction.
-  (2017) Unsupervised visual hashing with semantic assistant for content-based image retrieval. TKDE 29 (2), pp. 472–486. Cited by: Fast Discrete Optimization.