DeepAI
Log In Sign Up

Graph Factorization Machines for Cross-Domain Recommendation

Recently, graph neural networks (GNNs) have been successfully applied to recommender systems. In recommender systems, the user's feedback behavior on an item is usually the result of multiple factors acting at the same time. However, a long-standing challenge is how to effectively aggregate multi-order interactions in GNN. In this paper, we propose a Graph Factorization Machine (GFM) which utilizes the popular Factorization Machine to aggregate multi-order interactions from neighborhood for recommendation. Meanwhile, cross-domain recommendation has emerged as a viable method to solve the data sparsity problem in recommender systems. However, most existing cross-domain recommendation methods might fail when confronting the graph-structured data. In order to tackle the problem, we propose a general cross-domain recommendation framework which can be applied not only to the proposed GFM, but also to other GNN models. We conduct experiments on four pairs of datasets to demonstrate the superior performance of the GFM. Besides, based on general cross-domain recommendation experiments, we also demonstrate that our cross-domain framework could not only contribute to the cross-domain recommendation task with the GFM, but also be universal and expandable for various existing GNN models.

READ FULL TEXT VIEW PDF
01/26/2023

Cross-domain recommendation via user interest alignment

Cross-domain recommendation aims to leverage knowledge from multiple dom...
09/14/2020

A Deep Framework for Cross-Domain and Cross-System Recommendations

Cross-Domain Recommendation (CDR) and Cross-System Recommendations (CSR)...
06/15/2022

RecBole 2.0: Towards a More Up-to-Date Recommendation Library

In order to support the study of recent advances in recommender systems,...
03/02/2021

Cross-Domain Recommendation: Challenges, Progress, and Prospects

To address the long-standing data sparsity problem in recommender system...
05/24/2019

Learning Cross-Domain Representation with Multi-Graph Neural Network

Learning effective embedding has been proved to be useful in many real-w...
10/26/2021

Privacy-Preserving Multi-Target Multi-Domain Recommender Systems with Assisted AutoEncoders

A long-standing challenge in Recommender Systems (RCs) is the data spars...
08/07/2021

A Survey on Cross-domain Recommendation: Taxonomies, Methods, and Future Directions

Traditional recommendation systems are faced with two long-standing obst...

1 Introduction

With the explosively growing of personalized online applications, recommender systems have been widely used in various real scenarios, such as recommending movies to watch at MovieLens and products to purchase at Amazon. Data collected by these online applications have been effectively leveraged for studying users’ online activities and patterns, which provides unparalleled opportunities to built personalized recommender systems.

Collaborative filtering is a widely adopted method [15, 9] in recommender systems, which assumes that users and items that are similar in history will also be similar in future. Recently, more and more studies have found that the graph-structured data are of great benefit to improve the performance of recommender systems [31, 16]. However, it is hard for the traditional collaborative filtering methods to utilize the graph-structured data. In recent years, considerable efforts have been made to learn from the graph-structured data via the Graph Neural Network (GNN) [14, 28, 7], and we have also witnessed the rapid development and popularity of graph neural networks in recommender systems [31, 16]. The main intuition behind this line of methods is that the latent representation of a node could be integrated by iteratively transforming, propagating, and aggregating node features from its local neighborhood [3].

In the recommendation field, as is known to all, data sparsity is a major problem, and it is important to effectively exploit the sparse data [25, 8, 6, 17] to capture multi-factor interaction information. However, the aggregating schemes of these GNN-based methods are too simplistic, e.g., the mean or pooling [7], which is difficult to capture sufficient interaction information from neighborhood.

In order to tackle the above problem, we propose a novel Graph Factorization Machine (GFM) with the advantage of popular Factorization Machine (FM) [25]. FM has been successful to effectively exploit sparse data and capture feature interactions for recommender systems [25, 8, 6], but it cannot work with graph-structure data. To this end, we utilize FM in our GFM to aggregate the second-order neighbor messages, and the superposition of multiple GFM layers to aggregate the higher-order neighbor messages.

Besides, to address the data sparsity in recommender systems, leveraging auxiliary data from other domains, cross-domain recommendation  [23, 27, 21, 1, 10, 34] is an effective method. However, most existing cross-domain recommendation methods might fail when confronting the graph-structured data. In order to tackle the problem, we propose a general cross-domain recommendation framework which can be applied not only to the proposed GFM to form the cross-domain GFM (CD-GFM), but also to other GNN models, e.g., the GCN [14], GAT [28], GraphSAGE [7] and so on. On the one hand, the framework utilizes shared node representations to initialize the graph nodes in the source and target domains for learning domain-shared features, and these shared nodes can be users or items, or both, which no longer has to assume specific sharing patterns. On the other hand, the framework uses the graph structure data of each domain to learn the domain-specific features and cooperates the learning of the graph topology through sharing the graph parameters. Finally, domain-shared and domain-specific features are combined in each domain and used as prediction tasks.

To summarize, the contributions of this paper are as follows:

  • We propose a novel GNN model called Graph Factorization Machine (GFM) to capture features more effectively with graph-structure data than existing GNN-based methods.

  • We propose a general cross-domain recommendation framework, which can be naturally applied not only to the proposed GFM to form the cross-domain GFM (CD-GFM), but also to other GNN models.

  • We perform experiments on four pairs real-world datasets to demonstrate the effectiveness of the GFM. In addition, we demonstrate our cross-domain recommendation framework is general for various existing GNN models on both user-shared and item-shared cross-domain tasks.

2 Related Work

In this section, we present the related work in three-folds: Factorization Machines, Graph Neural Networks and Cross-Domain Recommendation.

2.1 Factorization Machines

Many prediction tasks need to address categorical variables (e.g., user IDs, item IDs, etc) for obtaining excellent performance, a popular solution is to convert them to binary features via one-hot encoding, but the encoding is high-dimension and sparse. Factorization Machine (FM)

[25]

is a widely used method to model second-order feature interactions automatically from such high-dimension and sparse one-hot features via the inner product of raw embedding vectors. For combining the advantages of the FM on modeling the second-order and the neural network on modeling the higher-order feature interactions, some studies have extended the FM to neural networks

[35, 24, 8, 6]. For example, Factorization-machine supported Neural Networks (FNN) [35] utilizes the pre-trained factorization machine as the bottom layer of a multi-layer neural networks. Product-based Neural Networks (PNN) [24]

utilizes an embedding layer to learn a distributed representation of the categorical data, a product layer to capture interactive patterns between inter-field categories, and further fully connected layers to explore high-order feature interactions. A method called Neural Factorization Machine (NFM)

[8] has also been proposed to use a Hadamard product based FM followed by the MLP. Guo et al. proposed a Factorization Machine based Neural Network (DeepFM) [6], whose “wide” (FM) and “deep” (MLP) parts have a shared input and are fed to the output layer in parallel. Some other approaches have also attempted to learn higher-order feature interactions explicitly instead of the implicit “deep” part [29, 17]. However, the above methods might fail when confronting the graph-structured data innate in recommender systems.

2.2 Graph Neural Networks

Graph neural networks (GNNs) are deep learning based methods that operate on graph-structured data. Due to its convincing performance and high interpretability, GNN has been a widely applied graph learning method and achieved remarkable performance

[14, 7, 28, 3, 5, 31] recently. The concept of GNN was first proposed in [26], which is the pioneer work to learn graph node representations based on neural networks. By designing different schemes for the graph convolutional layer, a lot of graph convolutional networks (GCNs) have emerged and demonstrated superior on learning node representations based on the graph spectral theory. The simplified GCN [14] utilized a linear filter and achieved better performance. The most of the prevailing GNN models followed the neighborhood aggregation strategy, and proposed different schemes for message aggregation. Among them, Graph Attention Networks (GATs) [28] have been proposed to learn different weights via attention mechanism for the neighbor messages when aggregating neighbors. GraphSAGE [7] designed mean/LSTM/pooling three different message aggregators to aggregate the neighbor messages. However, the aggregating schemes of these GNN methods are too simplistic and are not suitable in recommender systems, in which multiple factor interactions are more efficient.

2.3 Cross-Domain Recommendation

Cross-domain recommendation can take the advantage of existing large scale data in the source domain and improve the data sparsity and recommendation quality in the related target domain. Traditional methods such as the Coordinate System Transfer (CST) [23] aimed to discover the principle coordinates of both users and items in the source data matrices, and transfer them to the target domain in order to reduce the effect of data sparsity. Some works [27, 19, 18] extended the classical Collaborative Filtering (CF) to the cross-domain scenario. Recently, neural networks have been used to implement cross-domain recommendation. For example, Chen et al. proposed to introduce attention mechanisms to automatically assign a personalized transfer scheme for each user [1]. There are various sharing scenarios in these cross-domain recommendation researches, such as the sharing users [11, 33, 34, 10], sharing items [4], sharing accounts [20] and so on. However, most of these existing cross-domain recommendation methods were designed for traditional structured data. They might fail when encountering massive graph data in recommender systems.

3 Methodology

Fig. 1: Two neighborhood examples in user-item interaction graph. In the , the user watched movies , and in the , the movie was watched by the users .
Fig. 2: The proposed GFM model.

In this section, we first formulate the problem, then we present the details of the proposed model GFM and the cross-domain framework as shown in Figure 2 and 3, respectively.

3.1 Problem Formulation

Assuming is a set of users, is a set of items. There is now a source domain containing a user set and an item set , and a target domain containing a user set and an item set . In each domain, there is a explicit or implicit feedback matrix from users and items (such as rating, watching, clicking, buying, etc.). The task of single-domain recommendation is to improve the recommendation performance on the target domain via utilizing its feedback matrix. Now assuming some information is shared between these two domains, such as overlapped users or items. The task of cross-domain recommendation is to combine the data and knowledge of the source domain to help improve the recommendation performance of the target domain .

3.2 Graph Factorization Machines

In this subsection, we present the proposed GFM as illustrated in Figure 2. The GFM first samples neighbors for each node, then aggregates messages from neighbors via the Equation (4) and finally makes prediction with the aggregated messages.

3.2.1 Factorization Machines

In recommender systems, the users and items features usually are one-hot encoding categorical features, the dimension is usually high and the vectors are sparse. FM is an effective method to address such high-dimension and sparse problems and it can be seen as performing the latent relation modeling between any two features. In the graph-structure scenario we focus on, the features can be node features, e.g., user or item nodes.

Firstly, we project each non-zero node to a low dimension dense vector representation . Embedding is a popular solution in the neural network over various application scenarios. It learns one embedding vector for each node , where is the dimension of embedding vectors.

Different from the traditional FM, which uses inner product to get a scalar, in neural network, we need to get a vector representation via the Hadamard product as done in [8]:

(1)

where indicate the presence or absence of the node and as shown in Figure 1, The Hadamard product denotes the element-wise product of two vectors:

(2)

The computing complexity of the above Equation (1) is in , where is the number of nodes in , since all pairwise relations need to be computed. Actually, the Hadamard product based FM can be reformulated to linear runtime [8] just like the inner product based FM [25] as follows:

(3)

where denotes . Besides, in sparse settings, the sums only need to be computed over the non-zero pairs . Therefore, the actual computing complexity is , where is the number of non-zero nodes in . By adding the MLP, the FM can model higher-order feature interactions by [8]:

(4)

3.2.2 Graph Factorization Machine Layer

While FM is an effective method to address high-dimension and sparse features in recommender systems, it is not designed for graph-structure data and cannot consider the graph topology information, i.e., the multi-order neighbor information. Here, we extend the FM to the graph-structure data to form a new GNN model, which is more suitable for recommender systems. Similar with the GraphSAGE [7]

, the GFM  parameters can also be learned using standard stochastic gradient descent and backpropagation techniques.

Sampling Neighborhood. In this work, we first uniformly sample a fixed-size set of neighbors for each node . If the neighbor number of node is large than the sampling threshold , the sampling without replacement is used, otherwise the sampling with replacement is used. It should be mentioned that designing a different neighbor sampling scheme is not the focus of this paper as we aim at designing a powerful message aggregator to learn efficient node representations. In fact, any advanced neighbor sampling scheme can be easily integrated into our framework, making the proposed GFM general and flexible.

Aggregating Messages. Most of the prevailing GNN models follow the neighborhood aggregation strategy and are analogous to Weisfeiler-Lehman (WL) graph isomorphism test [32]. The representation of a node is obatined by iteratively aggregating messages from its neighbors. We adopt the Equation (4) as the neighborhood aggregator in graph neural networks, the node representation in the -th layer of node is relevant to itself and its neighbor representations in the -th layer. Note that the node can be a user or an item.

(5)

Note that we add self-loop for all nodes before sampling neighborhood, so the node itself may be sampled in the . By stacking multiple GFM layers, the message aggregator can capture higher-order neighbor messages. Specifically, for the user-item feedback graph (i.e., if a user has feedback on an item, there is an edge between the user node and the item node), the GFM uses the same aggregation scheme for all users and items. For example, if the user has feedback on sampled items , the aggregator can model the feature interactions among for obtaining the representation of user . Similarly, if sampled users have feedback on the item , the aggregator can model the feature interactions among for obtaining the representation of item . By stacking GFM layers, the representations of users and items can be iteratively aggregated from their multi-order (multi-hop) neighbors.

Making prediction.

To predict the interaction probability between a given pair of user and item

, we adopt a simple but widely-used inner product predictive function to estimate

. The inner product acts on the user representation and item representation learned via the GFM according to the Equation (5):

(6)

where is the sigmoid function.

In our implicit feedback recommendation scenario, we can observe the implicit interactions between users and items. Thus, to train an end-to-end GFM, we use the negative logarithm of joint probability as the loss function (i.e.,

logloss), which is widely used to optimize implicit feedback recommendation tasks [22, 12, 9, 4, 10]:

(7)

where denotes the set of observed implicit feedback, and denotes the set of negative samples sampled from unobserved implicit feedback. is the parameter set which contains the all embedding vectors and the parameters in the MLP in Equation (4).

To construct a mini-batch, we follow the existing works [4, 10] to first sample a batch of user-item interaction paris . For each , we then adopt negative sampling to randomly select unobserved items for user with a sampling ratio of . Finally, we obtain triplets

for each user in a batch. Note that we do not perform a predefined negative sampling in advance since this can only generate a fixed training set of negative samples. Instead, we generate negative samples during each epoch, enabling diverse and augmented training sets of negative examples to be used

[10].

3.3 General Cross-Domain Framework

Fig. 3: The proposed general cross-domain framework. For shared nodes, we initialize the node representations as , for unshared nodes, the represents and in the source and target domains, respectively.

In this subsection, we present the proposed cross-domain recommendation framework as illustrated in Figure 3 and apply the framework to other GNN models.

3.3.1 Cross-Domain Graph Factorization Machines

As mentioned before, cross-domain recommendation [23, 27, 21, 1, 10, 34] is an effective method to leverage auxiliary data from other domains to improve the data sparsity and recommendation quality of the target domain. In order to apply the proposed GFM to cross-domain recommendation on the graph-structured data, we propose a general cross-domain recommendation framework which can be applied not only to the proposed GFM to form the cross-domain GFM (CD-GFM), but also to other GNN models.

First, for the graphs and in the source domain and target domain , respectively, we assume that and have some shared nodes, which can be users or items, or both in the user-item feedback graphs. For these shared nodes, we initialize the node representations in Equation (1) with the same embedding vectors in both source and target domains. These representations can be seen as the domain-shared features and learned automatically during the training stage by collaborating with the GFM in their respective domains. For unshared nodes, we initialize the node representations in Equation (1) with different embedding vectors and in source and target domains.

Then, the source and target domains use the graph-structure data in their respective domains for multi-layer GFM learning. This is equivalent to the fact that nodes in each domain learn the domain-specific representations based on the topology information in their own domains. Besides, in order to further integrate the knowledge in the two domains, the GFM in the two domains can learn the node representation cooperatively via sharing the parameters in the MLP in Equation (4). Thus, we obtain the domain-specific node representations and of both source and target domains simultaneously.

Next, the domain-shared and domain-specific representations in each domain are combined together as the final node representations:

(8)
(9)

where and are same when the node is shared in the source and target domains, and denotes the concatenation of two node representations.

Finally, for cross-domain recommendation tasks, the framework can be design in an end-to-end scheme to make prediction via inner product between a given pair of user and item in each domain:

(10)
(11)

where and are the user node representations in the source and target domains, respectively, and and are the item node representations in the source and target domains, respectively. They are all learned from Equations (8) and (9). The is the sigmoid function.

The loss function of the CD-GFM combines two components to a unified multi-task learning framework:

(12)

where and are the parameter sets in the source and target domains, respectively. The and are the loss functions defined in the Equation (7). The tunable hyper-parameter controls the different strength of the two components.

The mini-batch construction and training scheme are the same as the single GFM model as described in the Section 3.2.

3.3.2 Apply the Framework to Other GNN Models

The general cross-domain framework as shown in Figure 3 could be applied upon various existing GNN models. The most important thing is to apply the framework to define the domain-shared and domain-specific representations. For our CD-GFM, the domain-shared representations are learned from the initialized shared node representations, which is consistent with other GNN models. So in other GNN models, the shared nodes should be randomly initialized as the same vectors. The domain-specific representations are learned from the graph topology information in each domain based on the respective GNN models. Besides, for our CD-GFM, the model learns the node representation cooperatively via sharing the parameters in the MLP in Equation (4). For other GNN models, the parameters within the each GNN model can be shared for further integrating the knowledge in the two domains. The loss function and training process are consistent with the CD-GFM. Based on the above strategies, the proposed general cross-domain framework can be applied to GCN [14], GAT [28], GraphSAGE [7] and so on.

4 Experiments

Dataset Shared (#) Source Domain Target Domain
Unshared (#) #Feedback Unshared (#) #Feedback
TCIQI Item (5,568) User (35,398) 314,621 User (19,999) 78,429
MLNF Item (5,565) user (30,279) 11,555,621 User (11,498) 199,765
MOMU User (27,898) Item (15,465) 7,366,992 Item (14,521) 3,784,331
MUBO User (27,898) Item (14,521) 3,784,331 Item (15,774) 1,936,754
TABLE I: Statistics of the datasets. “” means the number of the corresponding items.

In this section, we perform experiments to evaluate the proposed model and framework against various baselines on real-world datasets. We first introduce the datasets, evaluation protocol, implementation details and baseline methods of our experiments. Finally, we present our experimental results and analysis.

4.1 Datasets

We utilize four pairs frequently used real-world datasets, which contain two pairs user-shared datasets and two pairs item-shared datasets. For all datasets, we only use the user IDs, item IDs and their implicit feedback information. For simplicity, we intentionally transform the rating data into binary (1/0, indicating whether a user has interacted with an item or not) to fit the problem setting of implicit feedback following [4]. The statistics of the four pairs datasets are listed in Table I.

  • TCIQI [33] are from two mainstream video websites Tencent (TC)111https://v.qq.com and iQIYI (IQI)222https://www.iqiyi.com in China. There are a lot of overlapped items (movies) in the two websites. We take TC and IQI as the source and target domains, respectively. We got the processed dataset pair directly from [33].

  • MLNF333https://grouplens.org/datasets/movielens444https://www.kaggle.com/laowingkin/netflix-movie-recommendation/data are from two popular movie recommendation platforms MovieLens and Netflix, in which there are a lot of overlapped items (movies). We take MovieLens (ML) as the source domain and the Netflix (NF) as the target domain. We identify the same movies with their names (case insensitive) and years to avoid wrong identifications as possible, which is similar data processing method with [4].

  • MOMU are from the famous social network platform Douban555https://www.douban.com in China. Overlapped users have feedback on both Movie (MO) and Music (MU). We take MO as the source domain and the MU as the target domain.

  • MUBO are also from the famous social network platform Douban5 in China. Overlapped users have feedback on both Music (MU) and Book (BO). We take MU as the source domain and the BO as the target domain.

Dataset Model HR(NDCG)@1 HR@10 HR@50 NDCG@10 NDCG@50 Average
IQI NCF 0.15450.0029 0.50040.0039 0.91530.0015 0.29860.0020 0.41850.0088 0.4575
GCN 0.08770.0040 0.47470.0233 0.66200.0323 0.29370.0116 0.33610.0137 0.3708
GAT 0.14970.0545 0.58780.0765 0.95890.0100 0.33590.0797 0.43680.0632 0.4938
GraphSAGE-mean 0.09120.0243 0.56710.0388 0.96180.0013 0.31450.0298 0.39430.0234 0.4658
GraphSAGE-pooling 0.11220.0217 0.57960.0522 0.95080.0041 0.30830.0346 0.39560.0231 0.4693
GFM 0.15910.0278 0.58210.0486 0.96710.0060 0.33760.0315 0.43910.0228 0.4970
NF NCF 0.21020.0038 0.58400.004 0.87060.0025 0.38040.0036 0.44460.0034 0.4980
GCN 0.10480.0141 0.16880.0141 0.49810.0212 0.13280.0144 0.20090.0159 0.2211
GAT 0.19180.0045 0.55640.0027 0.90280.0030 0.35540.0021 0.43180.0026 0.4876
GraphSAGE-mean 0.19200.0053 0.55250.0008 0.88740.0025 0.35420.0025 0.42800.0030 0.4828
GraphSAGE-pooling 0.20590.0027 0.60540.0034 0.92170.0014 0.39060.0027 0.46960.0023 0.5186
GFM 0.21400.0042 0.60770.0131 0.91840.0054 0.39180.0072 0.46130.0055 0.5186
MU NCF 0.20460.0043 0.60780.0026 0.95900.0007 0.38350.0036 0.50930.0031 0.5328
GCN 0.15940.0002 0.49840.0019 0.75890.0034 0.29460.0006 0.39810.0008 0.4219
GAT 0.23350.0159 0.68330.0072 0.95450.0005 0.44630.0128 0.50020.0112 0.5636
GraphSAGE-mean 0.19270.0121 0.59230.0196 0.89010.0220 0.37420.0161 0.44060.0167 0.4980
GraphSAGE-pooling 0.22150.0193 0.62100.0190 0.94840.0026 0.41450.0208 0.49650.0171 0.5404
GFM 0.23990.0026 0.68870.0009 0.95070.0028 0.44700.0011 0.50550.0028 0.5664
BO NCF 0.25670.0081 0.67330.007 0.94220.0024 0.45580.0081 0.51640.007 0.5689
GCN 0.18990.0004 0.50070.0017 0.69910.001 0.35580.0002 0.39000.0002 0.4271
GAT 0.28050.0258 0.70340.0365 0.93690.0202 0.47760.0321 0.53030.0286 0.5857
GraphSAGE-mean 0.21370.0009 0.60360.0007 0.87410.0022 0.39200.0007 0.45250.001 0.5072
GraphSAGE-pooling 0.27160.0148 0.69870.0143 0.93510.0051 0.46530.0155 0.51660.0136 0.5775
GFM 0.28670.005 0.70550.0063 0.94310.0042 0.47570.0061 0.53920.0058 0.5900
TABLE II:

The experimental results evaluated by HR@K and NDCG@K on single domain recommendation task with 95% confidence intervals.

4.2 Evaluation Protocol

Following existing works [9, 10], we adopt the Leave-One-Out (LOO) evaluation. We randomly sample one interaction for each user as the validation and test sets, respectively. We also follow the common strategy [10, 4] to randomly sample 99 unobserved (negative) items for each user and then evaluate how well the model can rank the test item against these negative ones. Then, we adopt two standard metrics, HR@K and NDCG@K, which are widely used in recommendation [4, 10, 9, 30, 2], to evaluate the ranking performance of each methods. The HR@K is computed as follows:

(13)

where is the hit position for the user ’s test item, and is the indicator function. The NDCG@K is computed as follows:

(14)

We report HR@K and NDCG@K with K = 1, 10 and 50. The larger the value, the better the performance for all the evaluation metrics. For all experiments, we report the metrics with

95% confidence intervals on five runs.

4.3 Implementation Details

If a user has feedback on an item, there is an edge between the user node and the item node. Thus, we construct the feedback graph utilized in our experiments.

For single domain recommendation task, we perform experiments on the four target domain datasets (i.e., IQI, NF, MU, BO). For all datasets we use: embedding dimension , neighbor sampling threshold with two GFM layers, negative sampling ratio , mini-batch size of 256 and learning rate of 0.001. We also use dropout whose probability is 0.4.

For cross-domain recommendation task, we perform experiments on the four pairs cross-domain datasets. For all datasets we use: embedding dimension , neighbor sampling threshold with one GFM layer, negative sampling ratio , tunable hyper-parameter to control the different strength in Equation (12), mini-batch size of 256 and learning rate of 0.001. We also use dropout whose probability is 0.4.

All these values and hyper-parameters of all baselines are chosen via a grid search on the IQI validation set. We do not perform any datasets-specific tuning except early stopping on validation sets. All models are implemented using TensorFlow

666https://www.tensorflow.org and trained on GTX 1080ti GPU. Training is finished through stochastic gradient descent over shuffled mini-batches with the Adam [13] update rule.

4.4 Baseline Methods

We construct three groups of experiments to demonstrate the effectiveness of the proposed model and framework.

4.4.1 Single Domain Recommendation

We compare the proposed GFM model with the following baseline models.

  • NCF [9]: Neural Collaborative Filtering (NCF) is the state-of-the-art solution for recommendation tasks with implicit feedback. We use one of the variants of NCF, which is also called Generalized Matrix Factorization (GMF).

  • GCN [14]: The vanilla GCN learns latent node representations based on the first-order approximation of spectral graph convolutions.

  • GAT [28]: It applies the attention mechanism to learn different weights for aggregating node features from neighbors.

  • GraphSAGE-mean [7]: It learns to aggregate node messages from a node’s local neighborhood by the mean aggregator.

  • GraphSAGE-pooling [7]: It learns to aggregate node messages from a node’s local neighborhood by the pooling aggregator.

For GCN, GAT, GraphSAGE-mean and GraphSAGE-pooling, We apply the inner product on the user and item node representations as the output.

4.4.2 Cross-Domain Recommendation

We compare the proposed CD-GFM model with the following baseline models.

  • CST [23]: Coordinate System Transfer (CST) assumes that both users and items are overlapped and adds two regularization terms in its objective function. Here, we adapt the CST to our datasets by only reserving single-side (i.e., the user-side or item-side) regularization term.

  • CD-NCF [9]: Neural Collaborative Filtering (NCF) is the state-of-the-art solution for single domain recommendation tasks with implicit feedback. Here, we adapt it to our cross-domain recommendation task via sharing the overlapped user or item embeddings.

  • EMCDR [21]: This is an embedding and mapping framework for cross-domain recommendation. The framework contains Latent Factor Model, Latent Space Mapping and Cross-domain Recommendation, and it is not an end-to-end method.

  • EATNN [1]: This is the state-of-the-art solution for cross-domain recommendation tasks. By introducing attention mechanisms, the model automatically assigns a personalized transfer scheme for each user.

4.4.3 General Cross-Domain Recommendation

We apply the proposed cross-domain framework to other baseline GNN models.

  • CD-GCN [14]: It applies the proposed general framework to the GCN as described in Section 3.3.2.

  • CD-GAT [28]: It applies the proposed general framework to the GAT.

  • CD-GraphSAGE-mean [7]: It applies the proposed general framework to the GraphSAGE-mean.

  • CD-GraphSAGE-pooling [7]: It applies the proposed general framework to the GraphSAGE-pooling.

4.5 Performance Comparison

Dataset Model HR(NDCG)@1 HR@10 HR@50 NDCG@10 NGDCG@50 Average
TCIQI CST 0.19480.0039 0.66780.0136 0.94550.0028 0.41780.0099 0.48580.0030 0.5423
CD-NCF 0.17010.0314 0.54080.0445 0.87020.0402 0.33920.0411 0.41310.0396 0.4667
EMCDR 0.20580.0239 0.39620.0628 0.74380.0436 0.28970.0394 0.36400.0358 0.3999
EATNN 0.19590.0102 0.64730.0089 0.93140.0026 0.41030.0100 0.49060.0087 0.5351
CD-GFM 0.21050.0089 0.65360.0159 0.97580.0088 0.42220.0108 0.49630.0080 0.5517
MLNF CST 0.18780.0058 0.54130.0024 0.85510.0007 0.34860.0015 0.41780.0023 0.4701
CD-NCF 0.19970.0260 0.55400.0457 0.85390.0246 0.36000.0353 0.42660.0310 0.4788
EMCDR 0.09680.0260 0.34060.0240 0.65220.0730 0.20270.0170 0.27080.0070 0.3126
EATNN 0.21030.0018 0.58920.0038 0.87450.0016 0.38350.0015 0.44720.0013 0.5009
CD-GFM 0.22430.0047 0.62470.0069 0.92280.0033 0.40620.0055 0.47320.0043 0.5302
MOMU CST 0.23780.0085 0.59340.0024 0.90510.0073 0.39860.0115 0.47750.0035 0.5225
CD-NCF 0.25990.0200 0.72320.0430 0.94800.0261 0.47470.0315 0.52810.0281 0.5868
EMCDR 0.22900.0290 0.56100.0703 0.84300.0560 0.38340.0320 0.42340.0410 0.4880
EATNN 0.26800.0021 0.72530.0035 0.94570.0026 0.48810.0013 0.52820.0014 0.5911
CD-GFM 0.27280.0054 0.73140.0072 0.96710.002 0.48510.0060 0.53890.0049 0.5991
MUBO CST 0.25240.0089 0.69730.0102 0.93550.0098 0.45750.0105 0.51430.0068 0.5714
CD-NCF 0.27700.0158 0.71840.0332 0.94720.0261 0.48410.0215 0.53340.0836 0.5920
EMCDR 0.20040.2972 0.48640.5881 0.76120.4115 0.33240.4423 0.39200.4082 0.4345
EATNN 0.27310.0015 0.70640.0036 0.92770.0026 0.46340.0013 0.50700.0017 0.5755
CD-GFM 0.29780.0481 0.72670.0688 0.94240.0295 0.48720.0609 0.55020.0523 0.6009
TABLE III: The experimental results evaluated by HR@K and NDCG@K on cross-domain recommendation task with 95% confidence intervals.
Fig. 4: The HR@K results of the general cross-domain framework on 4 (datastes) 10 (models) = 40 tasks.

4.5.1 Single Domain Recommendation Task

We demonstrate the effectiveness of our GFM on four target domain datasets. The experimental results evaluated by HR@K and NDCG@K on IQI, NF, MU and Bo are presented in Table II. From these results, we have the following insightful observations.

  • Among these GNN baselines, the GCN has acceptable performances on multiple datasets. The GraphSAGE-mean improves the results comparing with GCN via introducing the mean aggregator to aggregate messages from each node’s local neighborhood. The GraphSAGE-pooling achieves further improvement over GraphSAGE-mean by replacing the mean aggregator with the more complex pooling aggregator, which applies the element-wise max-pooling operation on the transformed neighbor messages through a fully-connected neural network. The GAT obtains further performance improvement via assigning different learnable weights to neighbor messages.

  • NCF also obtains competitive recommendation performance, which further validates why the simple collaborative filtering methods can be widely used in recommender systems. On most tasks, our GFM outperforms the NCF, which demonstrates the graph-structured data are useful for recommender systems.

  • Our GFM almost obtains the best performance on multiple datasets. It outperforms the GNN baselines on multi-pair metrics. Besides, although the improvement of the GFM  compared with the GAT is marginal on a few metrics and datasets, the Average values of these metrics of the GFM are better on all four datasets, which indicates that the GFM has better generalization performance than the GAT.

The essence of recommender systems is to find similarity, and local neighbor nodes often contain such similarity. Our GFM aggregates local neighbor messages via high-order feature interactions. Therefore, the GFM  can achieve better performance and is more suitable on recommendation tasks. Overall, these improvements indicate the fact that our GFM can effectively integrate neighbor messages to generate more effective node representations and is more suitable when confronting the graph-structured data.

4.5.2 Cross-Domain Recommendation Task

We also demonstrate the effectiveness of our CD-GFM on four pairs cross-domain datasets. The experimental results evaluated by HR@K and NDCG@K are presented in Table III. From these results, we have the following findings.

  • The collaborative filtering based CD-NCF still obtains competitive recommendation performance via sharing the embedding of overlapped users or items, and it improves the recommendation performance of the CST on all datasets except the TCIQI. We conjecture that collaborative filtering methods need a lot of data to obtain good performance, while the TCIQI have less feedback data. It also demonstrates that collaborative filtering is indeed a simple and efficient method in recommender systems.

  • EMCDR is not an end-to-end method, and the poor performance may result from the accumulation of errors at each step.

  • EATNN is the state-of-the-art cross-domain recommendation baseline, and it achieves nearly the best results across multiple datasets among these baselines.

  • By utilizing the graph topology, our CD-GFM  improves the recommendation performance compared with various methods. It demonstrates that the proposed cross-domain framework combined with the proposed GFM  is more suitable for the graph-structured data in cross-domain recommendation.

4.5.3 General Cross-Domain Recommendation Task

Our cross-domain framework is a general framework that can be applied upon various existing GNN models. Here we apply the cross-domain framework to GCN, GAT, GraphSAGE-mean and GraphSAGE-pooling. In order to prove that our cross domain framework is applicable to various GNN models. We conduct experiments on 40 tasks (, 4 pairs datasets, 10 models). The results are shown in Figure 4. The red lines are the baselines which only use the target training set to train model, also shown in Table II, and the blue lines are the cross-domain models which applied the general cross-domain framework. From the results, we have the following findings:

  • On most tasks, our cross-domain framework is effective to improve the performance of the single domain models which also demonstrates the cross-domain framework can be applied upon various existing GNN models.

  • The improvement on GCN is larger than the other four GNN models. The main reason might be that the single domain GCN is significantly weaker than other improved GNN models as showed in Table II, so the improvement of other GNN models brought by the cross-domain framework is relatively less than GCN.

  • The performance of the GraphSAGE-mean and GraphSAGE-pooling is unsatisfying on several datasets, the reason might be that the mean and pooling aggregators are too simple and fewer shared parameters make them difficult to coordinately train in two domains.

Overall, we observe that the performance improvement of the cross-domain framework is significant and it is able to improve the performance of base GNN models on different datasets, which proves that the cross-domain framework is compatible with many GNN models.

4.6 Ablation Study

Model HR@1 HR@10 HR@50 HR@1 HR@10 HR@50
TCIQI MOMU
CD-GFM-base 0.1681 0.5914 0.9362 0.2445 0.6989 0.9054
CD-GFM 0.2105* 0.6536* 0.9758* 0.2728* 0.7314* 0.9671*
MLNF MUBO
CD-GFM-base 0.2178 0.6196 0.9182 0.2756 0.6963 0.9395
CD-GFM 0.2243* 0.6247* 0.9228 0.2978* 0.7267* 0.9424
TABLE IV: Results of ablation study on cross-domain recommendation task based on CD-GFM. “*” indicates that the improvement is statistically significant with the p-value

0.05 on independent samples t-tests.

Moreover, for understanding the contribution of the shared node initialization in CD-GFM. we construct ablation experiments over CD-GFM-base and CD-GFM on four pairs datastes. CD-GFM-base only uses the domain-specific node representations and output directly from the GFM and not to concatenate the initialized input in Equation (8) and (9), i.e., The results are presented in Table IV. We conduct independent samples t-tests and the p-value 0.05 indicates that the improvement of CD-GFM over the CD-GFM-base is statistically significant. The improvement demonstrates that CD-GFM model can efficiently take advantage of the domain-shared and domain-specific node representations simultaneously, and obtain the best performance on all datasets, which indicates both two representations matter for the cross-domain recommendation performance.

5 Conclusion

In this paper, we first proposed a novel graph neural network model called Graph Factorization Machine (GFM), which utilizes the popular Factorization Machines (FMs) to aggregate multi-order neighbor messages to overcome the shortcomings of the existing GNN models that integrate neighbor messages too simplistic. Then, we proposed a general cross-domain framework, which can be applied not only to the proposed GFM to form the cross-domain GFM (CD-GFM), but also to other GNN models. The extensive experimental results on real-world datasets demonstrate the superior performance of the proposed GFM model and the general cross domain framework compared with various state-of-the-art baseline methods.

Acknowledgments

The research work is supported by the National Key Research and Development Program of China under Grant No. 2018YFB1004300, the National Natural Science Foundation of China under Grant No. U1836206, U1811461, 61773361, the Project of Youth Innovation Promotion Association CAS under Grant No. 2017146.

References

  • [1] C. Chen, M. Zhang, C. Wang, W. Ma, M. Li, Y. Liu, and S. Ma (2019) An efficient adaptive transfer neural network for social-aware recommendation. In Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 225–234. Cited by: §1, §2.3, §3.3.1, 4th item.
  • [2] J. Ding, G. Yu, X. He, Y. Quan, Y. Li, T. Chua, D. Jin, and J. Yu (2018) Improving implicit recommender systems with view data.. In

    International Joint Conference on Artificial Intelligence (IJCAI)

    ,
    pp. 3343–3349. Cited by: §4.2.
  • [3] K. Ding, Y. Li, J. Li, C. Liu, and H. Liu (2019) Graph neural networks with high-order feature interactions. arXiv preprint arXiv:1908.07110. Cited by: §1, §2.2.
  • [4] C. Gao, X. Chen, F. Feng, K. Zhao, X. He, Y. Li, and D. Jin (2019) Cross-domain recommendation without sharing user-relevant data. In International Conference on World Wide Web (WWW), pp. 491–502. Cited by: §2.3, §3.2.2, §3.2.2, 2nd item, §4.1, §4.2.
  • [5] A. Grover and J. Leskovec (2016) Node2vec: scalable feature learning for networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 855–864. Cited by: §2.2.
  • [6] H. Guo, R. Tang, Y. Ye, Z. Li, and X. He (2017) DeepFM: a factorization-machine based neural network for ctr prediction. In International Joint Conference on Artificial Intelligence (IJCAI), pp. 1725–1731. Cited by: §1, §1, §2.1.
  • [7] W. Hamilton, Z. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In Annual Conference on Neural Information Processing Systems (NeurIPS), pp. 1024–1034. Cited by: §1, §1, §1, §2.2, §3.2.2, §3.3.2, 4th item, 5th item, 3rd item, 4th item.
  • [8] X. He and T. Chua (2017) Neural factorization machines for sparse predictive analytics. In Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 355–364. Cited by: §1, §1, §2.1, §3.2.1.
  • [9] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T. Chua (2017) Neural collaborative filtering. In International Conference on World Wide Web (WWW), pp. 173–182. Cited by: §1, §3.2.2, 1st item, 2nd item, §4.2.
  • [10] G. Hu, Y. Zhang, and Q. Yang (2019) Transfer meets hybrid: a synthetic approach for cross-domain collaborative filtering with text. In International Conference on World Wide Web (WWW), pp. 2822–2829. Cited by: §1, §2.3, §3.2.2, §3.2.2, §3.3.1, §4.2.
  • [11] L. Hu, J. Cao, G. Xu, L. Cao, Z. Gu, and C. Zhu (2013) Personalized recommendation via cross-domain triadic factorization. In International Conference on World Wide Web (WWW), pp. 595–606. Cited by: §2.3.
  • [12] S. Kabbur, X. Ning, and G. Karypis (2013) Fism: factored item similarity models for top-n recommender systems. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 659–667. Cited by: §3.2.2.
  • [13] D. P. Kingma and J. Ba (2015) Adam: a method for stochastic optimization. In International Conference on Learning Representations (ICLR), Cited by: §4.3.
  • [14] T. N. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), Cited by: §1, §1, §2.2, §3.3.2, 2nd item, 1st item.
  • [15] Y. Koren, R. Bell, and C. Volinsky (2009) Matrix factorization techniques for recommender systems. Computer 42 (8), pp. 30–37. Cited by: §1.
  • [16] Z. Li, Z. Cui, S. Wu, X. Zhang, and L. Wang (2019-10) Fi-gnn: modeling feature interactions via graph neural networks for ctr prediction. In ACM International Conference on Information and Knowledge Management (CIKM), pp. . Cited by: §1.
  • [17] J. Lian, X. Zhou, F. Zhang, Z. Chen, X. Xie, and G. Sun (2018) Xdeepfm: combining explicit and implicit feature interactions for recommender systems. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 1754–1763. Cited by: §1, §2.1.
  • [18] Y. Liu, C. Hsu, and S. Wu (2015) Non-linear cross-domain collaborative filtering via hyper-structure transfer. In

    International Conference on Machine Learning (ICML)

    ,
    pp. 1190–1198. Cited by: §2.3.
  • [19] B. Loni, Y. Shi, M. Larson, and A. Hanjalic (2014) Cross-domain collaborative filtering with factorization machines. In European Conference on Information Retrieval (ECIR), pp. 656–661. Cited by: §2.3.
  • [20] M. Ma, P. Ren, Y. Lin, Z. Chen, J. Ma, and M. d. Rijke (2019) -Net: a parallel information-sharing network for shared-account cross-domain sequential recommendations. In Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 685–694. Cited by: §2.3.
  • [21] T. Man, H. Shen, X. Jin, and X. Cheng (2017) Cross-domain recommendation: an embedding and mapping approach.. In International Joint Conference on Artificial Intelligence (IJCAI), pp. 2464–2470. Cited by: §1, §3.3.1, 3rd item.
  • [22] A. Mnih and R. R. Salakhutdinov (2008) Probabilistic matrix factorization. In Annual Conference on Neural Information Processing Systems (NeurIPS), pp. 1257–1264. Cited by: §3.2.2.
  • [23] W. Pan, E. W. Xiang, N. N. Liu, and Q. Yang (2010) Transfer learning in collaborative filtering for sparsity reduction. In AAAI Conference on Artificial Intelligence (AAAI), Cited by: §1, §2.3, §3.3.1, 1st item.
  • [24] Y. Qu, H. Cai, K. Ren, W. Zhang, Y. Yu, Y. Wen, and J. Wang (2016) Product-based neural networks for user response prediction. In IEEE International Conference on Data Mining (ICDM), pp. 1149–1154. Cited by: §2.1.
  • [25] S. Rendle (2010) Factorization machines. In IEEE International Conference on Data Mining (ICDM), pp. 995–1000. Cited by: §1, §1, §2.1, §3.2.1.
  • [26] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini (2008) The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. Cited by: §2.2.
  • [27] J. Tang, S. Wu, J. Sun, and H. Su (2012) Cross-domain collaboration recommendation. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 1285–1293. Cited by: §1, §2.3, §3.3.1.
  • [28] P. Velićković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio (2018) Graph attention networks. In International Conference on Learning Representations (ICLR), Cited by: §1, §1, §2.2, §3.3.2, 3rd item, 2nd item.
  • [29] R. Wang, B. Fu, G. Fu, and M. Wang (2017) Deep & cross network for ad click predictions. In Proceedings of the ADKDD, pp. 12. Cited by: §2.1.
  • [30] X. Wang, X. He, F. Feng, L. Nie, and T. Chua (2018) Tem: tree-enhanced embedding model for explainable recommendation. In International Conference on World Wide Web (WWW), pp. 1543–1552. Cited by: §4.2.
  • [31] X. Wang, X. He, M. Wang, F. Feng, and T. Chua (2019) Neural graph collaborative filtering. In Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 165–174. Cited by: §1, §2.2.
  • [32] K. Xu, W. Hu, J. Leskovec, and S. Jegelka (2019) How powerful are graph neural networks?. In International Conference on Learning Representations (ICLR), Cited by: §3.2.2.
  • [33] H. Yan, C. Yang, D. Yu, Y. Li, D. Jin, and D. Chiu (2019) Multi-site user behavior modeling and its application in video recommendation. IEEE Transactions on Knowledge and Data Engineering. Cited by: §2.3, 1st item.
  • [34] F. Yuan, L. Yao, and B. Benatallah (2019) DARec: deep domain adaptation for cross-domain recommendation via transferring rating patterns. In International Joint Conference on Artificial Intelligence (IJCAI), Cited by: §1, §2.3, §3.3.1.
  • [35] W. Zhang, T. Du, and J. Wang (2016) Deep learning over multi-field categorical data. In European Conference on Information Retrieval (ECIR), pp. 45–57. Cited by: §2.1.