1 Introduction
Nowadays, recommender systerms are widely used in people’s daily life [Liu et al.2011, Lian et al.2016, Li et al.2018a], but a growing scale of users and products renders recommendation challenging. Because implicit feedback is more common and easier to collect than explicit feedback, we concentrate on how to accelerate topK recommendation based on implicit feedback. However, there are two challengings to address. Firstly, compared with explicit feedback, implicit feedback is more difficult to utilize because of the lack of negative samples [Pan et al.2008]. Secondly, generating topk preferred items for each user is extremely timeconsuming.
For the first problem, recently, SpectralCF [Zheng et al.2018] combined collaborative filtering model with graph convolutional network [Henaff et al.2015] to mine hidden interactions between users and items from spectral domain, which showed enormous potential for implicit feedback problem [Zheng et al.2018]. However, SpectralCF ignores highorder feature interaction.
For the second problem, for extracting topK preferred items for each user, the time complexity of recommendation is when there are users, items and
dimensions in the latent space. Therefore, this is a critical efficiency bottleneck. However, it is necessary to timely update recommendation algorithms and the recommendation list because user interest evolves frequently. Fortunately, hash technique, encoding realvalued vectors/matrices into binary codes(
), is promising to address this challenge because inner product can be efficiently computed between binary codes via bit operation. Finding approximate topK items can be even finished in sublinear or logarithmic time [Wang et al.2012, Muja and Lowe2009] by making use of index technique.Several methods applied hash techniques to recommendation. Some twostage approximation methods like BCCF [Zhou and Zha2012], PPH [Zhang et al.2014], CH [Liu et al.2014] incur large quantization loss [Zhang et al.2016], and a direct optimization model DCF [Zhang et al.2016] is easy to fall into a local optimum because it is based on local search. To this end, to improve the accuracy of hashingbased recommender systems for implicit feedback, we propose a binarized collaborative filtering framework with distilling graph convolutional network. In the framework, we firstly train a CFbased GCN model (GCNCF) which can capture highorder feature interaction via cross operation. Following that, we distill the ranking information from the trained GCNCF model into a binarized model (DGCNBinCF) with knowledge distillation technique (KD [Hinton et al.2015]). To be more specific, we introduce a novel distillation loss, which penalizes not only the discrepancy between distributions of positive items defined by GCNCF and that defined by BinCF, but also the the discrepancy between distributions of sampled negative items. Noting that learning hash codes is generally NPhard [Håstad2001], approximation methods are appropriate choices but it may incur the loss of information during the training process. To this end, inspired by [Dai et al.2016], we transform the binary optimization problem to an equivalent continuous optimization problem by imposing a stochastic penalty term. Therefore, any gradientbased solver can optimize the overall loss with rankingbased loss with knowledge distillation loss.
Our contributions are summarized as follows:

We propose a novel framework DGCNBinCF to distill the ranking information from the proposed GCNCF model into the binary model. To the best of our knowledge, DGCNBinCF is the first model utilizing knowledge distilling to improve the performance of binarized model. We also improve GCN via adding a cross operation to aggregate users and items’ own highorder features.

We propose a generic method to relax the binary constraint problem to an equivalent boundconstrained continuous optimization problem. Hence, we can optimize the original problem by popular solvers directly.

Through extensive experiments performed on three realworld datasets, we show the superiority of the proposed framework to the stateoftheart baselines.
2 Related Work
In this section, we review several works related to our task including GCN for recommender systems, recent hashingbased collaborative filtering methods and distilling knowledge techniques for ranking.
2.1 GCN for Recommender Systems
How to take advantage of the rich linkage information from the useritem bipartite graph is crucial for implicit feedback. Some work used GCN to solve it such as SpectralCF [Zheng et al.2018], GCMC [Berg et al.2017], RMGCNN [Monti et al.2017], GCNWSRS [Ying et al.2018], LGCN [Gao et al.2018]
, etc. (1)SpectralCF was the first model to learn from the Spectral domain of the useritem bipartite graph directly based on collaborative filtering. Because it could discover deep connections between users and items, it may alleviate coldstart problem. (2)GCMC combined GCN model with a graph autoencoder to learn users’ and items’ latent factors. (3)RMGCNN proposed a matrix completion architecture combining a multigraph convolutional neural network with a recurrent neural network. (4)GCNWSRS focused on how to apply GCN model for webscale recommendation tasks effectively, like billion of items and hundreds of millions of users. (5)LGCN proposed a subgraph training strategy to save memory and computational resource requirements greatly. Its experiments showed it was more efficient as compared to prior approaches.
2.2 Discrete Hashing for Collaborative Filtering
A pioneer work was to exploit LocalitySensitive Hashing [Datar et al.2004] to generate binary codes for Google News readers according to their click history [Das et al.2007]. Then [Karatzoglou et al.2010] proposed a method mapping users and items’ latent factors into Hamming space to obtain binary representation. Later, following this, some two stage methods [Zhou and Zha2012, Zhang et al.2014] which relaxed binary constraints at first and then quantified binary codes. Nonetheless, [Zhang et al.2016] proposed that those twostage methods suffered from large quantization loss. Therefore, DCF proposed a method which could optimize binary codes directly. However, DCF optimizes binary codes via searching neighborhoods with the distance one. So it is easy to fall into local optima.
2.3 Distilling Knowledge for Ranking
[Hinton et al.2015] was the first one that proposed method ”Knowledge Distilling”, which trained a complex neural network firstly and then transferred the complex model to a small model. The role of the complex model is similar to a teacher, and the role of the small model is similar to a student. Following this, DarkRank [Chen et al.2018]
proposed a method combining deep metric learning and ”Learning to rank” technique with KD to solve pedestrian reidentification, image retrieval and image clustering tasks. In addition,
[Tang and Wang2018] applied KD with pointwise ranking on recommendation task. Unfortunately, it did not focus on implicit feedback problem and how to transfer unobserved interaction information.3 Definitions and Preliminaries
Throughout the paper, we denote vectors by boldfaced lowercase letters and matrices by boldfaced uppercase letters. All vectors are considered as column vectors. Next, we define the following definitions in this paper:
Definition 1
(Bipartite Graph)A useritem bipartite graph with vertices and edges is defined as , where and are two disjoint vertex sets of user and item, and , . For each edge , it has the form that , where and , which shows that there exists an interaction between user and item in the training set.
Definition 2
(Laplacian Matrix)Given a bipartite graph with vertices and edges, the laplacian matrix L is defined as , where is the adjacent matrix and D is the diagonal degree matrix defined as .
Our work focuses on recommendation based on implicit feedback, where we only observe whether a user has viewed or clicked an item. We denote as the set of all items clicked by user and denote as the set of remaining items.
3.1 Binary Collaborative Filtering
Matrix factorization maps users and items onto a joint dimensional latent space, where user embedding matrix is represented by and item embedding matrix is represented by . However, binary collaborative filtering (BinCF) maps users and items onto a joint dimensional Hamming space. Denoting and as user and item’s binary codes respectively, for implicit feedback, the BinCF problem is formulated as follows:
(1) 
where is a hash function:
3.2 BinaryContinuous Equivalent Transformation
Let us consider the following generic binary program firstly,
(2) 
and a transformed problem,
(3) 
where : is a penalty term for and is its penalty coefficient. [Giannessi and Tardella1998, Lucidi and Rinaldi2010] show that the above two problems are equivalent when certain conditions hold.
Lemma 1
Denote be a chosen norm. Suppose the following conditions hold:

When , is bounded. In addition, there exists an open set and real positive number , such that for , the following condition is satisfied:
(4) 
satisfies:

is continuous on


, there exits a neighborhood of y and a real positive number , such that:
(5)

Then there exits a real value , such that , problem 3.2 and problem 3.2 are equivalent.
It can be verified that satisfies above conditions, and we adopt it as the penalty term.
4 Binarized Collaborative Filtering with Distilling Graph Convolutional Network
For binarized collaborative filtering for implicit feedback problem as shown in Eqn.3.1, there are three problems to solve. Firstly, the interaction information between users and items is extremely sparse. Secondly, a lot of information is lost during learning binary codes. Thirdly, binary optimization is general NPhard, so we must adopt an efficient approximate method to solve it. We propose a novel frameworkBinarized Collaborative Filtering with Distilling Graph Convolutional Network to deal with the aforementioned problems. Because GCN model can mine hidden connection information between users and items in useritem graph spectral domain, we train a GCNbased collaborative filtering model (GCNCF) to solve the first problem. Then we utilize knowledge distillation to transfer the ranking information from GCNCF into the binary model to make up for information loss. Finally, we propose a method to transform the binary optimization to a continuous optimization problem to solve the binary optimization problem.
4.1 GCNbased Collaborative Filtering
Following SpectralCF, our graph convolutional operation is shown as the following:
(6) 
where L is the Laplacian matrix of the bipartite graph .
is an identity matrix,
is an activation function and
is a layerspecific trainable filter parameter. The proposed convolution operation as shown in Eqn. (6) is denoted as .In this model, we set it as a twolayer GCN. According to Eqn. (1), similarity to matrix factorization (MF) methods, it does not take advantage of the user’s own and the item’s own highorder interaction, which limits the performance of GCN. Inspired by CrossNet [Wang et al.2017], we define the cross operation() to fix the problem. The cross operation can be formulated as(7) 
where is a user or item’s embedding vector and is a parameter vector. The term takes the place of the term in CrossNet, which leads to obtaining higherorder interactions than CrossNet when setting the same iterations. In addition, the time complexity of the proposed cross operation is still the same as CrossNet’s. The improved GCN model can be vectorized as follows:
(8)  
(9)  
(10)  
(11) 
where ”” represents Hadamard product, ”1” is a column vector whose elements are all 1 and
are weight matrices. Moreover, we add batch normalization
[Ioffe and Szegedy2015] before Eqn.8 and Eqn.11.In order to make full use of features from every layer of GCN, we follow SpectralCF and concatenate them into the final latent factors of users and items as:
(12)  
(13) 
In terms of the loss function, we employ the popular and effective BPR loss
[Rendle et al.2009]. In particular, given a user matrix U and an item matrix V as shown in Eqn.12 and Eqn.13, the loss function of GCNCF is given as(14) 
where and denote th and th rows of U and V respectively; is the regularization coefficient. Negative sample is sampled from randomly and the training data is generated as .
4.1.1 Distilling GCN into Binarized Collaborative Filtering
In this model, we distill the ranking information in GCNCF model and transfer it to a simple binary collaborative filtering model via a mixed objective function. The model is denoted by DGCNBinCF for short. The key motivation
of the distillation in DGCNBinCF is twofold. On one hand, we consider the distribution of positive (negative) samples in the binary model should be close to that in the GCNCF. On the other hand, the differences between positive and negative samples should become far enough in DGCNBinCF. However, because the BPR model can guarantee that positive samples have higher scores than negative samples’, negative samples are assigned much lower probability than positive samples’ if we consider the distribution of both positive and negative samples at the same time. Thus, we consider positive and negative samples in GCNCF separately.
Specifically, to distill the ranking information in GCNCF, we hope the positive (negative) items of one user have the approximately same order in binary model and continuous model. For instance, if user ’s preference for the three items is ranked as in GCNCF model, we hope that the rank keeps in the binary model. According to ListNet [Cao et al.2007], we can characterize sorting information in the following ways
(15) 
where , and is the temperature parameter. In Eqn.4.1.1
, we convert the items’ score list to probability distributions via softmax function, and utilize cross entropy for penalizing the discrepancy. According to
[Hinton et al.2015], combining with as a multitask learing problem can transfer the ranking knowledge to the binary model. It’s worth mentioning that since the magnitudes of the gradients produced by the scale as , it is necessary to multiply them by when mixing and . So the loss function of DGCNBinCF is formulated as(16) 
is denoted as for short. P and Q are user and item embedding matrices in DGCNBinCF respectively, and U, V are trained user and item embedding matrices of GCNCF. The loss encodes the first motivation and the loss encodes the second motivation.
For the binary optimization problem, it is a direct method to use to approximate sign function, where is a small temperature. But [Li et al.2018b] points that setting a small temperature will harm the optimization process. [Courbariaux et al.2015]
mentions that generating binary codes stochastically is a finer and more correct averaging process than generating binary codes via sign function. Hence, we generate binary codes via sampling from the Bernoulli distribution. More specifically, given
, its corresponding binary code is , whereis a random variable which only can be
and , and , . Here, is temperature parameter. To force the noise to be small, we add the expectation of noise as a penalty term. Therefore, the DGCNBinCF can be transformed into the following optimization problem:(17) 
where , . We use the tanh function to bound the value of between 1 and 1.
5 Experiments
In this section, we evaluate our proposed DGCNBinCF framework with the aim of answering the following research questions.

Does the recommendation performance of the proposed DGCNBinCF framework outperforms the stateoftheart hashingbased recommendation methods?

Whether our proposed GCNCF is effective?

Whether distilling ranking information helps learning binary model?

Whether this proposed framework can converge well?
We introduce the experimental settings firstly and then answer the above questions in following sections.
5.1 Experiment Settings
5.1.1 Dataset
We use three public real datasets including MovieLens1M, MovieLens10M and Yelp to evaluate the proposed algorithm. Because the three datasets are explicit feedback data, to convert them into implicit feedback data, we set all ratings as positive samples. In addition, due to the extreme sparsity of them, we then filter users who have less than 20 ratings and remove items that are rated by less than 20 users. Table 1 summaries the filtered datasets. For each user, we sampled randomly 50 positive samples as training and the remaining as test. We repeated five random splits and reported the averaged results.
Dataset  #User  #Item  #Rating  Density 

MovieLens1M  6,022  3,043  995,154  5.43% 
MovieLens10M  69,878  10,681  10,000,054  1.34% 
Yelp  9,235  7,353  423,354  0.62% 
5.1.2 Comparison Methods
To evaluate the performance of DGCNBinCF for hashingbased recommender systems, we compare DGCNBinCF with 3 very popular and stateofart methods: DCF, BCCF and PPH. DCF solves the binary optimization problem directly via bitwise optimization. BCCF and PPH are twostage methods.
To measure the effectiveness of the improved GCN model, we compare GCNCF with SpectralCF. And we compare DGCNBinCF with the binary model . To show the role of KD loss in binary optimization, is optimized by our proposed relaxation method as well.
5.1.3 Evaluation Metric
To evaluate the recommendation system performance, we choose four widely used rankingbased metric: (1) NDCG (Normalized Discounted Cumulative Gain), (2) Recall, , and (3) MAP (Mean Average Precision). We predicted the topK preferred items from test set for each user in our experiments.
5.1.4 Parameter Settings
In our experiments, we set the regularization coefficient in GCNCF in all dataset. For DGCNBinCF, we set temperature , , the weight , and the penalty coefficients , in the three datasets. In addition, we set the dimension of users and items’ latent factor of GCNCF 16 in MovieLens10M and 64 in the other two datasets. The learned matrices U ,V in GCNCF are used as the initialization of P ,Q in DGCNBinCF. All parameters of SpectralCF are set according to [Zheng et al.2018].
Besides, for DCF, BCCF and PPH, we heldout evaluation means on splits of training data randomly to tune the optimal hyperparmenters via grid search. and in DCF are tuned among the set . in BCCF is tuned among the set and in PPH is tuned among the set .
Recall@100  MAP@100  NDCG@100  

DCF  0.0416  0.0101  0.0558 
BCCF  0.1234  0.0720  0.1724 
PPH  0.0277  0.0027  0.0249 
DGCNBinCF  0.3059  0.1187  0.3061 
Recall@100  MAP@100  NDCG@100  

DCF  0.0791  0.0180  0.0809 
BCCF  0.0789  0.0267  0.0869 
PPH  0.0978  0.0123  0.0695 
DGCNBinCF  0.2405  0.0537  0.1895 
Recall@100  MAP@100  NDCG@100  

DCF  0.0661  0.0053  0.0382 
BCCF  0.0966  0.0122  0.0658 
PPH  0.0627  0.0051  0.0361 
DGCNBinCF  0.1738  0.0192  0.1008 
5.2 Comparison with Baselines
Although hashingbased recommendation has significant advantages of both time and storage, it often incurs low accuracy recommendation because binary codes have limited representation ability and lose a lot of information compared with realvalued recommender systems. DGCNBinCF is to improve the accuracy of recommendation.
In this part we will answer the first question. We compare the recommendation accuracy of DGCNBinCF with three stateofart binary recommendation methods including DCF, BCCF and PPH on the three datasets. Table 2, Table 3 and Table 4 summary the results.
The three tables show that DGCNBinCF has much better performance than all baselines on the three datasets. This is because we train the improved GCN model GCNCF firstly to discover the deep interactions between users and items and then transfer the ranking information to a binary model, DGCNBinCF loses less information than baseline models. In addition, the binary optimization problem is optimized directly by the proposed penalty terms, which leads to less quantization loss. Therefore, DGCNBinCF has great advantages over DCF, BCCF and PPH.
5.3 The Effectiveness of GCNCF
It is mentioned that SpectralCF did not consider aggregating users and items’ own highorder feature, which may limit its representation ability. In this part, we will answer the second question.
We implement the experiment in MovieLens1M dataset. We utilize three metrics to evaluate the performance of SpectralCF and GCNCF respectively. Figure 1 shows the results. In two histograms, the orange column is the performance of GCNCF and the blue one represents the results of SpectralCF. From the histograms, it is clear to observe that GCNCF has great improvement (over 20%) for every metric compared with SpectralCF, which shows the effectiveness of GCNCF.
5.4 The Effectiveness of Distillation
Because lots of useful information loses during learning the binary representation, it is vital to utilize the ranking information from the trained GCNCF model as supplements for learning discrete codes. In this part, we investigate the role of ranking information for binary optimization. To implement the experiment, we consider optimizing Eqn.3.1 directly by adding the proposed penalty term. In the other word, we set in , and compare its results with DGCNBinCF. We test the two methods in the MovieLens1M dataset and evaluate them via the four ranking metrics.
Figure 2 reports the comparison results. The blue bar represents BinCF and the orange bar means DGCNBinCF model. The two histograms show that for all evaluation indicators, DGCNBinCF outperforms BinCF by 10%. Thus we conclude that the distillation method helps the model learn high quality binary representation.
The left one is the lossepoch figure of GCNCF model;the right one is the lossepoch figure of DGCNBinCF model.
5.5 Convergence
In this section, we will answer the forth question. Because deep models and discrete optimization may diverge, we test the convergence of GCNCF and DGCNBinCF model.
To test the convergence of our proposed model GCNCF and DGCNBinCF, we implement the experiment on MovieLens1M. We record the value of Eqn.4.1 and Eqn.4.1.1 with the change of epoch respectively. In this experiment, we set the maximum number of iterations 200. Figure 3 shows the convergence of two models. It is observed that loss value of GCNCF decreases and the DGCNBinCF converges greatly.
6 Conclusion
In this paper, we propose a hashbased method DGCNBinCF to accelerate implicit feedback recommendation. Because implicit feedback lacks negative samples and learning binary codes loses a lot of information, we train the model GCNCF, which aggregates users’ and items’ own highorder feature, to mine rich connection information, and then distill the ranking information from GCNCF into the binary model. In addition, we propose a method utilizing penalty terms to learning binary codes based on gradient descent directly. The experiments on three realworld datasets show the great superiority of our framework.
Acknowledgements
The work was supported in part by grants from the National Natural Science Foundation of China (Grant No. U1605251, 61832017, 61631005 and 61502077).
References
 [Berg et al.2017] Rianne van den Berg, Thomas N Kipf, and Max Welling. Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263, 2017.
 [Cao et al.2007] Zhe Cao, Tao Qin, TieYan Liu, MingFeng Tsai, and Hang Li. Learning to rank: from pairwise approach to listwise approach. In Proceedings of ICML’07, pages 129–136. ACM, 2007.
 [Chen et al.2018] Yuntao Chen, Naiyan Wang, and Zhaoxiang Zhang. Darkrank: Accelerating deep metric learning via cross sample similarities transfer. In Proceedings of AAAI’18, 2018.
 [Courbariaux et al.2015] Matthieu Courbariaux, Yoshua Bengio, and JeanPierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. In Proceedings of NeurIPS’15, pages 3123–3131, 2015.
 [Dai et al.2016] Qi Dai, Jianguo Li, Jingdong Wang, and YuGang Jiang. Binary optimized hashing. In Proceedings of MM’16, pages 1247–1256. ACM, 2016.
 [Das et al.2007] Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. Google news personalization: scalable online collaborative filtering. In Proceedings of WWW’07, pages 271–280. ACM, 2007.
 [Datar et al.2004] Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni. Localitysensitive hashing scheme based on pstable distributions. In Proceedings of the twentieth annual symposium on Computational geometry, pages 253–262. ACM, 2004.
 [Gao et al.2018] Hongyang Gao, Zhengyang Wang, and Shuiwang Ji. Largescale learnable graph convolutional networks. In Proceedings of KDD’18, pages 1416–1424. ACM, 2018.

[Giannessi and
Tardella1998]
Franco Giannessi and Fabio Tardella.
Connections between nonlinear programming and discrete optimization.
In
Handbook of combinatorial optimization
, pages 149–188. Springer, 1998.  [Håstad2001] Johan Håstad. Some optimal inapproximability results. Journal of the ACM (JACM), 48(4):798–859, 2001.
 [Henaff et al.2015] Mikael Henaff, Joan Bruna, and Yann LeCun. Deep convolutional networks on graphstructured data. arXiv preprint arXiv:1506.05163, 2015.
 [Hinton et al.2015] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
 [Ioffe and Szegedy2015] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.

[Karatzoglou et al.2010]
Alexandros Karatzoglou, Alex Smola, and Markus Weimer.
Collaborative filtering on a budget.
In
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
, pages 389–396, 2010.  [Li et al.2018a] Zhi Li, Hongke Zhao, Qi Liu, Zhenya Huang, Tao Mei, and Enhong Chen. Learning from history and present: Nextitem recommendation via discriminatively exploiting user behaviors. In Proceedings of KDD’18, pages 1734–1743. ACM, 2018.
 [Li et al.2018b] Zhuohan Li, Di He, Fei Tian, Wei Chen, Tao Qin, Liwei Wang, and TieYan Liu. Towards binaryvalued gates for robust lstm training. arXiv preprint arXiv:1806.02988, 2018.
 [Lian et al.2016] Defu Lian, Yuyang Ye, Wenya Zhu, Qi Liu, Xing Xie, and Hui Xiong. Mutual reinforcement of academic performance prediction and library book recommendation. In Proceedings of ICDM’16, pages 1023–1028. IEEE, 2016.
 [Liu et al.2011] Qi Liu, Yong Ge, Zhongmou Li, Enhong Chen, and Hui Xiong. Personalized travel package recommendation. In Proceedings of ICDM’11, pages 407–416. IEEE, 2011.
 [Liu et al.2014] Xianglong Liu, Junfeng He, Cheng Deng, and Bo Lang. Collaborative hashing. In Proceedings of CVPR’14, pages 2139–2146, 2014.
 [Lucidi and Rinaldi2010] Stefano Lucidi and Francesco Rinaldi. Exact penalty functions for nonlinear integer programming problems. Journal of optimization theory and applications, 145(3):479–488, 2010.
 [Monti et al.2017] Federico Monti, Michael Bronstein, and Xavier Bresson. Geometric matrix completion with recurrent multigraph neural networks. In Advances in Neural Information Processing Systems, pages 3697–3707, 2017.
 [Muja and Lowe2009] Marius Muja and David G Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP (1), 2(331340):2, 2009.
 [Pan et al.2008] Rong Pan, Yunhong Zhou, Bin Cao, Nathan N Liu, Rajan Lukose, Martin Scholz, and Qiang Yang. Oneclass collaborative filtering. In Proceedings of ICDM’08, pages 502–511. IEEE, 2008.
 [Rendle et al.2009] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars SchmidtThieme. Bpr: Bayesian personalized ranking from implicit feedback. In Proceedings of UAI’09, pages 452–461. AUAI Press, 2009.
 [Tang and Wang2018] Jiaxi Tang and Ke Wang. Ranking distillation: Learning compact ranking models with high performance for recommender system. In Proceedings of KDD’18, pages 2289–2298. ACM, 2018.
 [Wang et al.2012] Jun Wang, Sanjiv Kumar, and ShihFu Chang. Semisupervised hashing for largescale search. IEEE Transactions on Pattern Analysis & Machine Intelligence, (12):2393–2406, 2012.
 [Wang et al.2017] Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. Deep & cross network for ad click predictions. In Proceedings of the ADKDD’17, page 12. ACM, 2017.
 [Ying et al.2018] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. Graph convolutional neural networks for webscale recommender systems. In Proceedings of KDD’18, pages 974–983. ACM, 2018.
 [Zhang et al.2014] Zhiwei Zhang, Qifan Wang, Lingyun Ruan, and Luo Si. Preference preserving hashing for efficient recommendation. In Proceedings of SIGIR’14, pages 183–192. ACM, 2014.
 [Zhang et al.2016] Hanwang Zhang, Fumin Shen, Wei Liu, Xiangnan He, Huanbo Luan, and TatSeng Chua. Discrete collaborative filtering. In Proceedings of SIGIR’16, pages 325–334. ACM, 2016.
 [Zheng et al.2018] Lei Zheng, ChunTa Lu, Fei Jiang, Jiawei Zhang, and Philip S Yu. Spectral collaborative filtering. In Proceedings of RecSys’18, pages 311–319. ACM, 2018.
 [Zhou and Zha2012] Ke Zhou and Hongyuan Zha. Learning binary codes for collaborative filtering. In Proceedings of KDD’18, pages 498–506. ACM, 2012.