Introduction
With the development of online applications, recommender systems have been widely adopted by many online services for helping their users find desirable items. However, it is still challenging to accurately and efficiently match items to their potential users, particularly with the evergrowing scales of items and users [1].
In the past, Collaborative Filtering (CF), as exemplified by Matrix Factorization (MF) algorithms [11] have demonstrated great successes in both academia and industry. MF factorizes an useritem rating matrix to project both users and items into a dimensional latent feature space, where the user’s preference scores for items are predicted by the inner product between their latent features. However, the time complexity for generating topk items recommendation for all users is [27]. Therefore, MFbased methods are often computational expensive and inefficient when handling the largescale recommendation applications [3, 2].
Recent studies show that the hashingbased recommendation algorithms, which encode both users and items into binary codes in Hamming space, are promising to tackle the efficiency challenge [26, 29]. In these methods, the preference score could be efficiently computed by Hamming distance. However, learning binary codes is generally NPhard [9] due to the discrete constraints. To tackle this problem, the researchers resort to a twostage hash learning procedure [18, 29]: relaxed optimization and binary quantization. Continuous representations are first computed by the relaxed optimization, and subsequently the hash codes are generated by binary quantization. This learning strategy indeed simplifies the optimization challenge. However, it inevitably suffers from significant quantization loss according to [26]. Hence, several solutions are developed to directly optimizing the binary hash codes from the matrix factorization with discrete constraints. Despite much progress has been achieved, they still suffer from two problems: 1) Their recommendation process mainly relies on the useritem interactions and single specific content feature. Under such circumstances, they cannot provide meaningful recommendations for new users (e.g. for the new users who have no interaction history with the items). 2) They learn the hash codes with Discrete Coordinate Descent (DCD) that learns the hash codes bitbybit, which results in significant quantization loss or consumes considerable computation time.
In this paper, we propose a fast coldstart recommendation method, called MultiFeature Discrete Collaborative Filtering (MFDCF) to alleviate these problems. Specifically, we propose a lowrank selfweighted multifeature fusion module to adaptively preserve the multiple content features of users into the compact yet informative hash codes by sufficiently exploiting their complementarity. Our method is inspired by the success of the multiple feature fusion in other relevant areas [25, 24, 19, 31]. Further, we develop an efficient discrete optimization approach to directly solve binary hash codes by simple efficient operations without quantization errors. Finally, we evaluate the proposed method on two public recommendation datasets, and demonstrate its superior performance over stateoftheart competing baselines.
The main contributions of this paper are summarized as follows:

We propose a MultiFeature Discrete Collaborative Filtering (MFDCF) method to alleviate the coldstart recommendation problem. MFDCF directly and adaptively projects the multiple content features of users into binary hash codes by sufficiently exploiting their complementarity. To the best of our knowledge, there is still no similar work.

We develop an efficient discrete optimization strategy to directly learn the binary hash codes without relaxed quantization. This strategy avoids performance penalties from both the widely adopted discrete coordinate descent and the storage cost of huge interaction matrix.

We design a featureadaptive hash code generation strategy to generate user hash codes that accurately capture the dynamic variations of coldstart user features. Experiments on the public recommendation datasets demonstrate the superior performance of the proposed method over the stateofthearts.
Related Work
In this paper, we investigate the hashingbased collaborative filtering at the presence of multiple content features for fast coldstart recommendation. Hence, in this section, we mainly review the recent advanced hashingbased recommendation and coldstart recommendation methods.
A pioneer work, [4] is proposed to exploit LocalitySensitive Hashing (LSH) [7] to generate hash codes for Google new readers based on their itemsharing history similarity. Based on this, [10, 30] followed the idea of Iterative Quantization [8] to project real latent representations into hash codes. To enhance discriminative capability of hash codes, decorrelation constraint [18] and Constant Feature Norm (CFN) constraint [29] are imposed when learning user/item latent representations. The above works basically follow a twostep learning strategy: relaxed optimization and binary quantization. As indicated by [26], this twostep approach will suffer from significant quantization loss.
To alleviate quantization loss, direct binary code learning by discrete optimization is proposed [22]. In the recommendation area, Discrete Collaborative Filtering (DCF) [26]
is the first binarized collaborative filtering method and demonstrates superior performance over aforementioned twostage recommendation methods. However, it is not applicable to coldstart recommendation scenarios. To address coldstart problem, on the basis of DCF, Discrete Deep Learning (DDL)
[28]applies Deep Belief Network (DBN) to extract item representation from item content information, and combines the DBN with DCF. Discrete contentaware matrix factorization methods
[14, 15] develop discrete optimization algorithms to learn binary codes for users and items at the presence of their respective content information. Discrete Factorization Machines (DFM) [17] learns hash codes for any side feature and models the pairwise interactions between feature codes. Besides, since the above binary coldstart recommendation frameworks solve the hash codes with bitbybit discrete optimization, they still consumes considerable computation time.The Proposed Method
Notations.
Throughout this paper, we utilize bold lowercase letters to represent vectors and bold uppercase letters to represent matrices. All of the vectors in this paper denote column vectors. Nonbold letters represent scalars. We denote
as the trace of a matrix and as the Frobenius norm of a matrix. We denote as the roundoff function.Lowrank Selfweighted MultiFeature Fusion
Given a training dataset , which contains user’s multiple features information represented with different content features (e.g. demographic information such as age, gender, occupation, and interaction preference extracted from item side information). The th content feature is , where is the dimensionality of the th content feature. Since the user’s multiple content features are quite diverse and heterogeneous, in this paper, we aim at adaptively mapping multiple content features into a consensus multifeature representation ( is the hash code length) in a shared homogeneous space. Specifically, it is important to consider the complementarity of multiple content features and the generalization ability of the fusion module. Motivated by these considerations, we introduce a selfweighted fusion strategy and then formulate the multifeature fusion part as:
(1) 
where is the Frobenius norm of the matrix. is the mapping matrix of the th content feature, is the consensus multifeature representation. According to [20], Eq.(1) is equivalent to
(2) 
where is the weight of the th content feature and it measures the importance of the current content feature. is the probabilistic simplex.
In realworld recommender systems, such as Taobao^{1}^{1}1www.taobao.com and Amazon^{2}^{2}2www.amazon.com, there are many different kinds of users and items, which have rich and diverse characteristics. However, a specific user only has a small number of interactions in the system with limited items. Consequently, the side information of users and items would be pretty sparse. We need to handle a very highdimensional and sparse feature matrix. To avoid spurious correlations caused by the mapping matrix, we impose a lowrank constraint on :
(3) 
where is a penalty parameter and is the rank operator of a matrix. The lowrank constraint on helps highlight the latent shared features across different users and handles the extremely spare observations. Meanwhile, the lowrank constraint on makes the optimization more difficult. To tackle this problem, we adopt an explicit form of lowrank constraint as follows:
(4) 
where
is the total number of singular values of
and represents the th singular value of . Note that(5) 
where consists of the singular vectors which correspond to the smallest singular values of . Thus, the multiple content features fusion module can be rewritten as:
(6) 
MultiFeature Discrete Collaborative Filtering
In this paper, we fuse multiple content features into binary hash codes with matrix factorization, which has been proved to be accurate and scalable on addressing the collaborative filtering problems. Discrete collaborative filtering generally maps both users and items into a joint lowdimensional Hamming space where the useritem preference is measured by the Hamming similarity between the binary hash codes.
Given a useritem rating matrix of size , where and are the number of users and items, respectively. Each entry indicates the rating of a user for an item . Let denote the binary hash codes for the th user, and denote the binary hash codes for the th item, the rating of user for item is approximated by Hamming similarity . Thus, the goal is to learn user binary matrix and item binary matrix , where is the hash code length. Similar to the problem of conventional collaborative filtering, the basic discrete collaborative filtering can be formulated as:
(7) 
To address the sparse and coldstart problem, we integrate multiple content features into the above model, by substituting the user binary feature matrix with the rotated multifeature representation ( is rotation matrix) and keeping their consistency during the optimization process. The formula is given as follows:
(8) 
This formulation has three advantages: 1) Only one of the decomposed variable is imposed with discrete constraint. As shown in the optimization part, the hash codes can be learned with a simple operation instead of bitbybit discrete optimization used by existing discrete recommendation methods. The second regularization term can guarantee the acceptable information loss. 2) The learned hash codes can reflect user’s multiple content features via and involve the latent interactive features in simultaneously. 3) We extract user’s interactive preference from the side information of their rated items as content features. This design not only avoids the approximation of item binary matrix , reduces the complexity of the proposed model, but also effectively captures the content features of items.
Overall Objective Formulation
By integrating the above two parts into a unified learning framework, we derive the overall objective formulation of MultiFeature Discrete Collaborative Filtering (MFDCF) as:
(9) 
where are balance parameters. The first term projects multiple content features of users into a shared homogeneous space. The second and third terms minimize the information loss during the process of integrating the multiple content features with the basic discrete CF. The last term is a lowrank constraint for , which can highlight the latent shared features across different users.
Fast Discrete Optimization
Solving hash codes in Eq.(9) is essentially an NPhard problem due to the discrete constraint on binary feature matrix. Existing discrete recommendation methods always learn the hash codes bitbybit with DCD [22]. Although this strategy alleviates the quantization loss problem caused by conventional twostep relaxingrounding optimization strategy, it is still timeconsuming.
In this paper, with the favorable support of objective formulation, we propose to directly learn the discrete hash codes with fast optimization. Specifically, different from existing discrete recommendation methods [26, 28, 14, 17], we avoid explicitly computing the useritem rating matrix , and achieve linear computation and storage efficiency. We propose an effective optimization algorithm based on augmented Lagrangian multiplier (ALM) [16, 21]. In particular, we introduce an auxiliary variable to separate the constraint on , and transform the objective function Eq.(9) to an equivalent one that can be tackled more easily. Then the Eq.(9) is transformed as:
(10) 
where denotes the variables that need to be solved in the objective function, measures the difference between the target and auxiliary variable, is a balance parameter. With this transformation, we follow the alternative optimization process by updating each of and , given others fixed.
Step 1: learning . For convenience, we denote as . By fixing the other variables, we ignore the term that is irrelevant to . The original problem can be rewritten as:
(11) 
With CauchySchwarz inequality, we derive that
where (a) holds since and the equality in (b) holds when . Since , we can obtain the optimal in Eq.(11) by
(12) 
Step 2: learning . Removing the terms that are irrelevant to the , the optimization formula is rewritten as
(13) 
We calculate the derivative of Eq.(13) with respect to and set it to zero,
(14) 
By using the following substitutions,
(15) 
Eq.(14) can be rewritten as , which can be efficiently solved by Sylvester operation in Matlab.
Step 3: learning . Similarly, the optimization formula for updating can be represented as
(16) 
We introduce an auxiliary variable and substitute with , the Eq.(16) can be transformed into the following form
(17) 
The optimal is defined as , where and are comprised of leftsingular and rightsingular vectors of respectively [32].
Note that, the useritem rating matrix is included in the term when updating . In realworld retail giants, such as Taobao and Amazon, there are hundreds of millions of users and even more items. In consequence, the useritem rating matrix would be pretty enormous and sparse. If we compute directly, the computational complexity will be and it is extremely expensive to calculate and store
. In this paper, we apply the singular value decomposition to obtain the left singular and right singular vectors as well as the corresponding singular values of
. We utilize a diagonal matrix to store the olargest () singular values, and employ an matrix , an matrix to store the corresponding left singular and right singular vectors respectively. We substitute with and the computational complexity can be reduced to .Thus, the calculation of can be transformed as
(18) 
With Eq.(18), both the computation and storage cost can be decreased with the guarantee of accuracy.
Step 4: learning . We calculate the derivative of objective function with respect to and set it to zero, then we get
(19) 
where is substituted with , and then we have
(20) 
The time complexity of computing is reduced to .
Step 5: learning . We calculate the derivative of objective function with respect to and respectively, and set them to zero, then we can obtain the closed solutions of as
(21) 
where is also substituted with , and update rule of is transformed as
(22) 
Step 6: learning . As described in Eq.(5), is stacked by the singular vectors which correspond to the smallest singular values of . Thus we can solve the eigendecomposition problem to get :
(23) 
Step 7: learning . The objective function with respect to can be represented as
(24) 
where . The optimal is defined as , where and are comprised of leftsingular and rightsingular vectors of respectively.
Step 8: learning . By fixing other variables, the update rule of is
(25) 
Featureadaptive Hash Code Generation for Coldstart Users
In the process of online recommendation, we aim to map multiple content features of the target users into binary hash codes with the learned hash projection matrix . When coldstart users have no rating history in the training set and are only associated with initial demographic information, the fixed feature weights obtained from offline hash code learning cannot address the featuremissing problem.
In this paper, with the support of offline hash learning, we propose to generate hash codes for coldstart users with a selfweighting scheme. The objective function is formulated as
(26) 
where is the linear projection matrix from Eq.(9), is content feature of target users, and is the number of target users. As proved by [20], Eq.(26) can be shown to be equivalent to
(27) 
We employ alternating optimization to update and . The update rules are
(28) 
Experiments
Evaluation Datasets
We evaluate the proposed method on two public recommendation datasets: Movielens1M^{3}^{3}3https://grouplens.org/datasets/movielens/ and BookCrossing^{4}^{4}4https://grouplens.org/datasets/bookcrossing/. In these two datasets, each user has only one rating for an item.

Movielens1M: This dataset is collected from the MovieLens website by GroupLens Research. It originally includes 1,000,000 ratings from 6040 users for 3952 movies. The rating score is from 1 to 5 with 1 granularity. The users in this dataset are associated with demographic information (e.g. gender, age, and occupation), and the movies are related to 35 labels from a dictionary of 18 genre labels.

BookCrossing: This dataset is collected by CaiNicolas Ziegler from the BookCrossing community. It contains 278,858 users providing 1,149,780 ratings (contain implicit and explicit feedback) about 271,379 books. The rating score is from 1 to 10 with 1 interval for explicit feedback, or expressed by 0 for implicit feedback. Most users in this dataset are associated with demographic information (e.g. age and location).
Considering the extreme sparsity of the original BookCrossing dataset, we remove the users with less than 20 ratings and the items rated by less than 20 users. After the filtering, there are 2,151 users, 6,830 items, and 180,595 ratings left in the BookCrossing dataset. For the MovieLens1M dataset, we keep all users and items without any filtering. The statistics of the datasets are summarized in Table 2. The bagofwords encoding method is used to extract the side information of the item, and onehot encoding approach is adopted to generate feature representation of user’s demographic information. To accelerate the running speed, we follow
[23] and perform PCA to reduce the interactive preference feature dimension to 128. In our experiments, we randomly select users as coldstart users, and their ratings are removed. We repeat the experiments with 5 random splits and report the average values as the experimental results.Evaluation Metrics
The goal of our proposed method is to find out the top
items that user may be interested in. In our experiment, we adopt the evaluation metric Accuracy
[28, 5] to evaluate whether the target user’s favorite items appear in the top recommendation list.Accuracy is to test whether the target user’s favorite items that appears in the top recommendation list. Given the value of , similar to [28, 5], we calculate Accuracy value as:
(29) 
where is the number of test cases, and is the total number of hits in the test set.
Dataset  #User  #Item  #Rating  Sparsity 

MovieLens1M  6,040  3,952  1,000,209  95.81% 
BookCrossing  2,151  6,830  180,595  98.77% 
Evaluation Baselines
In this paper, we compare our approach with two stateoftheart continuous value based recommendation methods and two hashing based binary recommendation methods.

Discrete Factorization Machines (DFM) [17] is the first binarized factorization machines method that learns the hash codes for any side feature and models the pairwise interaction between feature codes.

Discrete Deep Learning (DDL) [28] is a binary deep recommendation approach. It adopts Deep Belief Network to extract item representation from item side information, and combines the DBN with DCF to solve the coldstart recommendation problem.
In experiments, we adopt 5fold cross validation method on random split of training data to tune the optimal hyperparameters of all compared approaches. All the best hyperparameters are found by grid search.
Accuracy Comparison
In this subseciton, we evaluate the recommendation accuracy of MFDCF and the baselines in coldstart recommendation scenario. Figure 1 and 2 demonstrate the Accuracy of the compared approaches on two realworld recommendation datasets for the coldstart recommendation task. Compared with existing hashingbased recommendation approaches, the proposed MFDCF consistently outperforms the compared baselines. DFM exploits the factorization machine to model the potential relevance between user characteristics and product features. However, it ignores the collaborative interaction. DDL is based on the discrete collaborative filtering. It adopts DBN to generate item feature representation from their side information. Nevertheless, the structure of DBN is independent with the overall optimization process, which limits the learning capability of DDL. Additionally, these experimental results show that the proposed MFDCF outperforms the compared continuous value based hybrid recommendation methods under the same coldstart settings. The better performance of MFDCF than CBFKNN and ZSR validates the effects of the proposed multiple feature fusion strategy.
Model Analysis
Method/#Bits  8  16  32  64  128 

DDL  3148.86  3206.09  3289.81  3372.19  3855.81 
DFM  166.55  170.86  172.75  196.45  246.5 
Ours  58.87  60.49  62.55  68.41  90.17 
Parameter and convergence sensitivity analysis. We conduct experiments to observe the performance variations with the involved parameters . We fix the hash code length as 128 bits and report results on MovieLens1M. Similar results can be found on other datasets and hash code lengths. Since , and are equipped in the same objective function, we change their values from the range of while fixing other parameters. Detailed experimental results are presented in Figure 5. From it, we can observe that the performance is relatively better when is in the range of , is in the range of , and is in the range of . The performance variations with shows that the lowrank constraint is well on highlighting the latent shared features across different users. The convergence curves recording the objective function of MFDCF method with the number of iterations are shown in Figure 4(a). This experiment result indicates that our proposed method converges very fast.
Efficiency v.s. hash code length and data size. We conduct the experiments to investigate the efficiency variations of MFDCF with the increase of hash code length and training data size on two datasets. The average time cost of training iteration is shown in Figure 4(bc). When the hash code length is fixed as 32, each round of training iteration costs several seconds and scales linearly with the increase of data size. When running MFDCF on 100% training data, each round of iteration scales quadratically with the increase of code length due to the time complexity of optimization process is .
Run time comparison. In this experiment, we compare the computation efficiency of our approach with two stateoftheart hashingbased recommendation methods DMF and DDL. Table 2 demonstrates the training time of these methods on MovieLens1M using a 3.4GHz Intel Core(TM) i76700 CPU. Compared with DDL and DFM, our MFDCF is about 50 and 3 times faster respectively. The superior performance of the proposed method is attributed to that both DDL and DFM iteratively learn the hash codes bitbybit with discrete coordinate descent. Additionally, DDL requires to update the parameters of DBN iteratively, which consumes more time.
Conclusion
In this paper, we design a unified multifeature discrete collaborative filtering method that projects multiple content features of users into the binary hash codes to support fast coldstart recommendation. Our model has four advantages: 1) handles the data sparsity problem with lowrank constraint. 2) enhances the discriminative capability of hash codes with multifeature binary embedding. 3) generates featureadaptive hash codes for varied coldstart users. 4) achieves computation and storage efficient discrete binary optimization. Experiments on two public recommendation datasets demonstrate the stateoftheart performance of the proposed method.
Acknowledgements
The authors would like to thank the anonymous reviewers for their constructive and helpful suggestions. The work is partially supported by the National Natural Science Foundation of China (61802236, 61902223, U1836216), in part by the Natural Science Foundation of Shandong, China (No. ZR2019QF002), in part by the Youth Innovation Project of Shandong Universities, China (No. 2019KJN040), and in part by Taishan Scholar Project of Shandong, China.
References
 [1] (2019) A review on deep learning for recommender systems: challenges and remedies. Artif. Intell. Rev. 52 (1), pp. 1–37. Cited by: Introduction.
 [2] (2019) MMALFM: explainable recommendation by leveraging reviews and images. TOIS 37 (2), pp. 1–28. Cited by: Introduction.

[3]
(2018)
A^3ncf: an adaptive aspect attention model for rating prediction
. In IJCAI, pp. 3748–3754. Cited by: Introduction.  [4] (2007) Google news personalization: scalable online collaborative filtering. In WWW, pp. 271–280. Cited by: Related Work.
 [5] (2018) Personalized video recommendation using rich contents from videos. TKDE. Cited by: Evaluation Metrics, Evaluation Metrics.
 [6] (2010) Learning attributetofeature mappings for coldstart recommendations. In ICDM, pp. 176–185. Cited by: 1st item.
 [7] (1999) Similarity search in high dimensions via hashing. In VLDB, pp. 518–529. Cited by: Related Work.

[8]
(2013)
Iterative quantization: A procrustean approach to learning binary codes for largescale image retrieval
. TPAMI 35 (12), pp. 2916–2929. Cited by: Related Work.  [9] (2001) Some optimal inapproximability results. J. ACM 48 (4), pp. 798–859. Cited by: Introduction.
 [10] (2010) Collaborative filtering on a budget. In AISTATS, pp. 389–396. Cited by: Related Work.
 [11] (2009) Matrix factorization techniques for recommender systems. IEEE Computer 42 (8), pp. 30–37. Cited by: Introduction.
 [12] (2019) Leveraging the invariant side of generative zeroshot learning. In CVPR, pp. 7402–7411. Cited by: 4th item.
 [13] (2019) From zeroshot learning to coldstart recommendation. In AAAI, pp. 4189–4196. Cited by: 4th item.
 [14] (2017) Discrete contentaware matrix factorization. In KDD, pp. 325–334. Cited by: Related Work, Fast Discrete Optimization.
 [15] (2019) Discrete matrix factorization and extension for fast item recommendation. TKDE DOI: 10.1109/TKDE.2019.2951386 (), pp. . External Links: Document Cited by: Related Work.
 [16] (2010) The augmented lagrange multiplier method for exact recovery of corrupted lowrank matrices. CoRR abs/1009.5055. Cited by: Fast Discrete Optimization.
 [17] (2018) Discrete factorization machines for fast featurebased recommendation. In IJCAI, pp. 3449–3455. Cited by: Related Work, Fast Discrete Optimization, 2nd item.
 [18] (2014) Collaborative hashing. In CVPR, pp. 2147–2154. Cited by: Introduction, Related Work.
 [19] (2019) Flexible online multimodal hashing for largescale multimedia retrieval. In MM, pp. 1129–1137. Cited by: Introduction.
 [20] (2019) Online multimodal hashing with dynamic queryadaption. In SIGIR, pp. 715–724. Cited by: Lowrank Selfweighted MultiFeature Fusion, Featureadaptive Hash Code Generation for Coldstart Users.
 [21] (2007) Nonlinear programming theory and algorithms. Technometrics 49 (1), pp. 105. Cited by: Fast Discrete Optimization.
 [22] (2015) Supervised discrete hashing. In CVPR, pp. 37–45. Cited by: Related Work, Fast Discrete Optimization.
 [23] (2017) Learning on big graph: label inference and regularization with anchor hierarchy. TKDE 29 (5), pp. 1101–1114. Cited by: Evaluation Datasets.
 [24] (2012) Multimodal graphbased reranking for web image search. TIP 21 (11), pp. 4649–4661. Cited by: Introduction.
 [25] (2018) Firstperson daily activity recognition with manipulated object proposals and nonlinear feature fusion. TCSVT 28 (10), pp. 2946–2955. Cited by: Introduction.
 [26] (2016) Discrete collaborative filtering. In SIGIR, pp. 325–334. Cited by: Introduction, Related Work, Related Work, Fast Discrete Optimization.
 [27] (2017) Discrete personalized ranking for fast collaborative filtering from implicit feedback. In AAAI, pp. 1669–1675. Cited by: Introduction.
 [28] (2018) Discrete deep learning for fast contentaware recommendation. In WSDM, pp. 717–726. Cited by: Related Work, Fast Discrete Optimization, 3rd item, Evaluation Metrics, Evaluation Metrics.
 [29] (2014) Preference preserving hashing for efficient recommendation. In SIGIR, pp. 183–192. Cited by: Introduction, Related Work.
 [30] (2012) Learning binary codes for collaborative filtering. In KDD, pp. 498–506. Cited by: Related Work.
 [31] (2017) Discrete multimodal hashing with canonical views for robust mobile landmark search. TMM 19 (9), pp. 2066–2079. Cited by: Introduction.
 [32] (2017) Unsupervised visual hashing with semantic assistant for contentbased image retrieval. TKDE 29 (2), pp. 472–486. Cited by: Fast Discrete Optimization.
Comments
There are no comments yet.