1 Introduction
With the explosively growing of personalized online applications, recommender systems have been widely used in various real scenarios, such as recommending movies to watch at MovieLens and products to purchase at Amazon. Data collected by these online applications have been effectively leveraged for studying users’ online activities and patterns, which provides unparalleled opportunities to built personalized recommender systems.
Collaborative filtering is a widely adopted method [15, 9] in recommender systems, which assumes that users and items that are similar in history will also be similar in future. Recently, more and more studies have found that the graphstructured data are of great benefit to improve the performance of recommender systems [31, 16]. However, it is hard for the traditional collaborative filtering methods to utilize the graphstructured data. In recent years, considerable efforts have been made to learn from the graphstructured data via the Graph Neural Network (GNN) [14, 28, 7], and we have also witnessed the rapid development and popularity of graph neural networks in recommender systems [31, 16]. The main intuition behind this line of methods is that the latent representation of a node could be integrated by iteratively transforming, propagating, and aggregating node features from its local neighborhood [3].
In the recommendation field, as is known to all, data sparsity is a major problem, and it is important to effectively exploit the sparse data [25, 8, 6, 17] to capture multifactor interaction information. However, the aggregating schemes of these GNNbased methods are too simplistic, e.g., the mean or pooling [7], which is difficult to capture sufficient interaction information from neighborhood.
In order to tackle the above problem, we propose a novel Graph Factorization Machine (GFM) with the advantage of popular Factorization Machine (FM) [25]. FM has been successful to effectively exploit sparse data and capture feature interactions for recommender systems [25, 8, 6], but it cannot work with graphstructure data. To this end, we utilize FM in our GFM to aggregate the secondorder neighbor messages, and the superposition of multiple GFM layers to aggregate the higherorder neighbor messages.
Besides, to address the data sparsity in recommender systems, leveraging auxiliary data from other domains, crossdomain recommendation [23, 27, 21, 1, 10, 34] is an effective method. However, most existing crossdomain recommendation methods might fail when confronting the graphstructured data. In order to tackle the problem, we propose a general crossdomain recommendation framework which can be applied not only to the proposed GFM to form the crossdomain GFM (CDGFM), but also to other GNN models, e.g., the GCN [14], GAT [28], GraphSAGE [7] and so on. On the one hand, the framework utilizes shared node representations to initialize the graph nodes in the source and target domains for learning domainshared features, and these shared nodes can be users or items, or both, which no longer has to assume specific sharing patterns. On the other hand, the framework uses the graph structure data of each domain to learn the domainspecific features and cooperates the learning of the graph topology through sharing the graph parameters. Finally, domainshared and domainspecific features are combined in each domain and used as prediction tasks.
To summarize, the contributions of this paper are as follows:

We propose a novel GNN model called Graph Factorization Machine (GFM) to capture features more effectively with graphstructure data than existing GNNbased methods.

We propose a general crossdomain recommendation framework, which can be naturally applied not only to the proposed GFM to form the crossdomain GFM (CDGFM), but also to other GNN models.

We perform experiments on four pairs realworld datasets to demonstrate the effectiveness of the GFM. In addition, we demonstrate our crossdomain recommendation framework is general for various existing GNN models on both usershared and itemshared crossdomain tasks.
2 Related Work
In this section, we present the related work in threefolds: Factorization Machines, Graph Neural Networks and CrossDomain Recommendation.
2.1 Factorization Machines
Many prediction tasks need to address categorical variables (e.g., user IDs, item IDs, etc) for obtaining excellent performance, a popular solution is to convert them to binary features via onehot encoding, but the encoding is highdimension and sparse. Factorization Machine (FM)
[25]is a widely used method to model secondorder feature interactions automatically from such highdimension and sparse onehot features via the inner product of raw embedding vectors. For combining the advantages of the FM on modeling the secondorder and the neural network on modeling the higherorder feature interactions, some studies have extended the FM to neural networks
[35, 24, 8, 6]. For example, Factorizationmachine supported Neural Networks (FNN) [35] utilizes the pretrained factorization machine as the bottom layer of a multilayer neural networks. Productbased Neural Networks (PNN) [24]utilizes an embedding layer to learn a distributed representation of the categorical data, a product layer to capture interactive patterns between interfield categories, and further fully connected layers to explore highorder feature interactions. A method called Neural Factorization Machine (NFM)
[8] has also been proposed to use a Hadamard product based FM followed by the MLP. Guo et al. proposed a Factorization Machine based Neural Network (DeepFM) [6], whose “wide” (FM) and “deep” (MLP) parts have a shared input and are fed to the output layer in parallel. Some other approaches have also attempted to learn higherorder feature interactions explicitly instead of the implicit “deep” part [29, 17]. However, the above methods might fail when confronting the graphstructured data innate in recommender systems.2.2 Graph Neural Networks
Graph neural networks (GNNs) are deep learning based methods that operate on graphstructured data. Due to its convincing performance and high interpretability, GNN has been a widely applied graph learning method and achieved remarkable performance
[14, 7, 28, 3, 5, 31] recently. The concept of GNN was first proposed in [26], which is the pioneer work to learn graph node representations based on neural networks. By designing different schemes for the graph convolutional layer, a lot of graph convolutional networks (GCNs) have emerged and demonstrated superior on learning node representations based on the graph spectral theory. The simplified GCN [14] utilized a linear filter and achieved better performance. The most of the prevailing GNN models followed the neighborhood aggregation strategy, and proposed different schemes for message aggregation. Among them, Graph Attention Networks (GATs) [28] have been proposed to learn different weights via attention mechanism for the neighbor messages when aggregating neighbors. GraphSAGE [7] designed mean/LSTM/pooling three different message aggregators to aggregate the neighbor messages. However, the aggregating schemes of these GNN methods are too simplistic and are not suitable in recommender systems, in which multiple factor interactions are more efficient.2.3 CrossDomain Recommendation
Crossdomain recommendation can take the advantage of existing large scale data in the source domain and improve the data sparsity and recommendation quality in the related target domain. Traditional methods such as the Coordinate System Transfer (CST) [23] aimed to discover the principle coordinates of both users and items in the source data matrices, and transfer them to the target domain in order to reduce the effect of data sparsity. Some works [27, 19, 18] extended the classical Collaborative Filtering (CF) to the crossdomain scenario. Recently, neural networks have been used to implement crossdomain recommendation. For example, Chen et al. proposed to introduce attention mechanisms to automatically assign a personalized transfer scheme for each user [1]. There are various sharing scenarios in these crossdomain recommendation researches, such as the sharing users [11, 33, 34, 10], sharing items [4], sharing accounts [20] and so on. However, most of these existing crossdomain recommendation methods were designed for traditional structured data. They might fail when encountering massive graph data in recommender systems.
3 Methodology
In this section, we first formulate the problem, then we present the details of the proposed model GFM and the crossdomain framework as shown in Figure 2 and 3, respectively.
3.1 Problem Formulation
Assuming is a set of users, is a set of items. There is now a source domain containing a user set and an item set , and a target domain containing a user set and an item set . In each domain, there is a explicit or implicit feedback matrix from users and items (such as rating, watching, clicking, buying, etc.). The task of singledomain recommendation is to improve the recommendation performance on the target domain via utilizing its feedback matrix. Now assuming some information is shared between these two domains, such as overlapped users or items. The task of crossdomain recommendation is to combine the data and knowledge of the source domain to help improve the recommendation performance of the target domain .
3.2 Graph Factorization Machines
In this subsection, we present the proposed GFM as illustrated in Figure 2. The GFM first samples neighbors for each node, then aggregates messages from neighbors via the Equation (4) and finally makes prediction with the aggregated messages.
3.2.1 Factorization Machines
In recommender systems, the users and items features usually are onehot encoding categorical features, the dimension is usually high and the vectors are sparse. FM is an effective method to address such highdimension and sparse problems and it can be seen as performing the latent relation modeling between any two features. In the graphstructure scenario we focus on, the features can be node features, e.g., user or item nodes.
Firstly, we project each nonzero node to a low dimension dense vector representation . Embedding is a popular solution in the neural network over various application scenarios. It learns one embedding vector for each node , where is the dimension of embedding vectors.
Different from the traditional FM, which uses inner product to get a scalar, in neural network, we need to get a vector representation via the Hadamard product as done in [8]:
(1) 
where indicate the presence or absence of the node and as shown in Figure 1, The Hadamard product denotes the elementwise product of two vectors:
(2) 
The computing complexity of the above Equation (1) is in , where is the number of nodes in , since all pairwise relations need to be computed. Actually, the Hadamard product based FM can be reformulated to linear runtime [8] just like the inner product based FM [25] as follows:
(3) 
where denotes . Besides, in sparse settings, the sums only need to be computed over the nonzero pairs . Therefore, the actual computing complexity is , where is the number of nonzero nodes in . By adding the MLP, the FM can model higherorder feature interactions by [8]:
(4) 
3.2.2 Graph Factorization Machine Layer
While FM is an effective method to address highdimension and sparse features in recommender systems, it is not designed for graphstructure data and cannot consider the graph topology information, i.e., the multiorder neighbor information. Here, we extend the FM to the graphstructure data to form a new GNN model, which is more suitable for recommender systems. Similar with the GraphSAGE [7]
, the GFM parameters can also be learned using standard stochastic gradient descent and backpropagation techniques.
Sampling Neighborhood. In this work, we first uniformly sample a fixedsize set of neighbors for each node . If the neighbor number of node is large than the sampling threshold , the sampling without replacement is used, otherwise the sampling with replacement is used. It should be mentioned that designing a different neighbor sampling scheme is not the focus of this paper as we aim at designing a powerful message aggregator to learn efficient node representations. In fact, any advanced neighbor sampling scheme can be easily integrated into our framework, making the proposed GFM general and flexible.
Aggregating Messages. Most of the prevailing GNN models follow the neighborhood aggregation strategy and are analogous to WeisfeilerLehman (WL) graph isomorphism test [32]. The representation of a node is obatined by iteratively aggregating messages from its neighbors. We adopt the Equation (4) as the neighborhood aggregator in graph neural networks, the node representation in the th layer of node is relevant to itself and its neighbor representations in the th layer. Note that the node can be a user or an item.
(5)  
Note that we add selfloop for all nodes before sampling neighborhood, so the node itself may be sampled in the . By stacking multiple GFM layers, the message aggregator can capture higherorder neighbor messages. Specifically, for the useritem feedback graph (i.e., if a user has feedback on an item, there is an edge between the user node and the item node), the GFM uses the same aggregation scheme for all users and items. For example, if the user has feedback on sampled items , the aggregator can model the feature interactions among for obtaining the representation of user . Similarly, if sampled users have feedback on the item , the aggregator can model the feature interactions among for obtaining the representation of item . By stacking GFM layers, the representations of users and items can be iteratively aggregated from their multiorder (multihop) neighbors.
Making prediction.
To predict the interaction probability between a given pair of user and item
, we adopt a simple but widelyused inner product predictive function to estimate
. The inner product acts on the user representation and item representation learned via the GFM according to the Equation (5):(6) 
where is the sigmoid function.
In our implicit feedback recommendation scenario, we can observe the implicit interactions between users and items. Thus, to train an endtoend GFM, we use the negative logarithm of joint probability as the loss function (i.e.,
logloss), which is widely used to optimize implicit feedback recommendation tasks [22, 12, 9, 4, 10]:(7) 
where denotes the set of observed implicit feedback, and denotes the set of negative samples sampled from unobserved implicit feedback. is the parameter set which contains the all embedding vectors and the parameters in the MLP in Equation (4).
To construct a minibatch, we follow the existing works [4, 10] to first sample a batch of useritem interaction paris . For each , we then adopt negative sampling to randomly select unobserved items for user with a sampling ratio of . Finally, we obtain triplets
for each user in a batch. Note that we do not perform a predefined negative sampling in advance since this can only generate a fixed training set of negative samples. Instead, we generate negative samples during each epoch, enabling diverse and augmented training sets of negative examples to be used
[10].3.3 General CrossDomain Framework
In this subsection, we present the proposed crossdomain recommendation framework as illustrated in Figure 3 and apply the framework to other GNN models.
3.3.1 CrossDomain Graph Factorization Machines
As mentioned before, crossdomain recommendation [23, 27, 21, 1, 10, 34] is an effective method to leverage auxiliary data from other domains to improve the data sparsity and recommendation quality of the target domain. In order to apply the proposed GFM to crossdomain recommendation on the graphstructured data, we propose a general crossdomain recommendation framework which can be applied not only to the proposed GFM to form the crossdomain GFM (CDGFM), but also to other GNN models.
First, for the graphs and in the source domain and target domain , respectively, we assume that and have some shared nodes, which can be users or items, or both in the useritem feedback graphs. For these shared nodes, we initialize the node representations in Equation (1) with the same embedding vectors in both source and target domains. These representations can be seen as the domainshared features and learned automatically during the training stage by collaborating with the GFM in their respective domains. For unshared nodes, we initialize the node representations in Equation (1) with different embedding vectors and in source and target domains.
Then, the source and target domains use the graphstructure data in their respective domains for multilayer GFM learning. This is equivalent to the fact that nodes in each domain learn the domainspecific representations based on the topology information in their own domains. Besides, in order to further integrate the knowledge in the two domains, the GFM in the two domains can learn the node representation cooperatively via sharing the parameters in the MLP in Equation (4). Thus, we obtain the domainspecific node representations and of both source and target domains simultaneously.
Next, the domainshared and domainspecific representations in each domain are combined together as the final node representations:
(8)  
(9) 
where and are same when the node is shared in the source and target domains, and denotes the concatenation of two node representations.
Finally, for crossdomain recommendation tasks, the framework can be design in an endtoend scheme to make prediction via inner product between a given pair of user and item in each domain:
(10)  
(11) 
where and are the user node representations in the source and target domains, respectively, and and are the item node representations in the source and target domains, respectively. They are all learned from Equations (8) and (9). The is the sigmoid function.
The loss function of the CDGFM combines two components to a unified multitask learning framework:
(12) 
where and are the parameter sets in the source and target domains, respectively. The and are the loss functions defined in the Equation (7). The tunable hyperparameter controls the different strength of the two components.
The minibatch construction and training scheme are the same as the single GFM model as described in the Section 3.2.
3.3.2 Apply the Framework to Other GNN Models
The general crossdomain framework as shown in Figure 3 could be applied upon various existing GNN models. The most important thing is to apply the framework to define the domainshared and domainspecific representations. For our CDGFM, the domainshared representations are learned from the initialized shared node representations, which is consistent with other GNN models. So in other GNN models, the shared nodes should be randomly initialized as the same vectors. The domainspecific representations are learned from the graph topology information in each domain based on the respective GNN models. Besides, for our CDGFM, the model learns the node representation cooperatively via sharing the parameters in the MLP in Equation (4). For other GNN models, the parameters within the each GNN model can be shared for further integrating the knowledge in the two domains. The loss function and training process are consistent with the CDGFM. Based on the above strategies, the proposed general crossdomain framework can be applied to GCN [14], GAT [28], GraphSAGE [7] and so on.
4 Experiments
Dataset  Shared (#)  Source Domain  Target Domain  

Unshared (#)  #Feedback  Unshared (#)  #Feedback  
TCIQI  Item (5,568)  User (35,398)  314,621  User (19,999)  78,429 
MLNF  Item (5,565)  user (30,279)  11,555,621  User (11,498)  199,765 
MOMU  User (27,898)  Item (15,465)  7,366,992  Item (14,521)  3,784,331 
MUBO  User (27,898)  Item (14,521)  3,784,331  Item (15,774)  1,936,754 
In this section, we perform experiments to evaluate the proposed model and framework against various baselines on realworld datasets. We first introduce the datasets, evaluation protocol, implementation details and baseline methods of our experiments. Finally, we present our experimental results and analysis.
4.1 Datasets
We utilize four pairs frequently used realworld datasets, which contain two pairs usershared datasets and two pairs itemshared datasets. For all datasets, we only use the user IDs, item IDs and their implicit feedback information. For simplicity, we intentionally transform the rating data into binary (1/0, indicating whether a user has interacted with an item or not) to fit the problem setting of implicit feedback following [4]. The statistics of the four pairs datasets are listed in Table I.

TCIQI [33] are from two mainstream video websites Tencent (TC)^{1}^{1}1https://v.qq.com and iQIYI (IQI)^{2}^{2}2https://www.iqiyi.com in China. There are a lot of overlapped items (movies) in the two websites. We take TC and IQI as the source and target domains, respectively. We got the processed dataset pair directly from [33].

MLNF^{3}^{3}3https://grouplens.org/datasets/movielens^{4}^{4}4https://www.kaggle.com/laowingkin/netflixmovierecommendation/data are from two popular movie recommendation platforms MovieLens and Netflix, in which there are a lot of overlapped items (movies). We take MovieLens (ML) as the source domain and the Netflix (NF) as the target domain. We identify the same movies with their names (case insensitive) and years to avoid wrong identifications as possible, which is similar data processing method with [4].

MOMU are from the famous social network platform Douban^{5}^{5}5https://www.douban.com in China. Overlapped users have feedback on both Movie (MO) and Music (MU). We take MO as the source domain and the MU as the target domain.

MUBO are also from the famous social network platform Douban^{5} in China. Overlapped users have feedback on both Music (MU) and Book (BO). We take MU as the source domain and the BO as the target domain.
Dataset  Model  HR(NDCG)@1  HR@10  HR@50  NDCG@10  NDCG@50  Average 

IQI  NCF  0.15450.0029  0.50040.0039  0.91530.0015  0.29860.0020  0.41850.0088  0.4575 
GCN  0.08770.0040  0.47470.0233  0.66200.0323  0.29370.0116  0.33610.0137  0.3708  
GAT  0.14970.0545  0.58780.0765  0.95890.0100  0.33590.0797  0.43680.0632  0.4938  
GraphSAGEmean  0.09120.0243  0.56710.0388  0.96180.0013  0.31450.0298  0.39430.0234  0.4658  
GraphSAGEpooling  0.11220.0217  0.57960.0522  0.95080.0041  0.30830.0346  0.39560.0231  0.4693  
GFM  0.15910.0278  0.58210.0486  0.96710.0060  0.33760.0315  0.43910.0228  0.4970  
NF  NCF  0.21020.0038  0.58400.004  0.87060.0025  0.38040.0036  0.44460.0034  0.4980 
GCN  0.10480.0141  0.16880.0141  0.49810.0212  0.13280.0144  0.20090.0159  0.2211  
GAT  0.19180.0045  0.55640.0027  0.90280.0030  0.35540.0021  0.43180.0026  0.4876  
GraphSAGEmean  0.19200.0053  0.55250.0008  0.88740.0025  0.35420.0025  0.42800.0030  0.4828  
GraphSAGEpooling  0.20590.0027  0.60540.0034  0.92170.0014  0.39060.0027  0.46960.0023  0.5186  
GFM  0.21400.0042  0.60770.0131  0.91840.0054  0.39180.0072  0.46130.0055  0.5186  
MU  NCF  0.20460.0043  0.60780.0026  0.95900.0007  0.38350.0036  0.50930.0031  0.5328 
GCN  0.15940.0002  0.49840.0019  0.75890.0034  0.29460.0006  0.39810.0008  0.4219  
GAT  0.23350.0159  0.68330.0072  0.95450.0005  0.44630.0128  0.50020.0112  0.5636  
GraphSAGEmean  0.19270.0121  0.59230.0196  0.89010.0220  0.37420.0161  0.44060.0167  0.4980  
GraphSAGEpooling  0.22150.0193  0.62100.0190  0.94840.0026  0.41450.0208  0.49650.0171  0.5404  
GFM  0.23990.0026  0.68870.0009  0.95070.0028  0.44700.0011  0.50550.0028  0.5664  
BO  NCF  0.25670.0081  0.67330.007  0.94220.0024  0.45580.0081  0.51640.007  0.5689 
GCN  0.18990.0004  0.50070.0017  0.69910.001  0.35580.0002  0.39000.0002  0.4271  
GAT  0.28050.0258  0.70340.0365  0.93690.0202  0.47760.0321  0.53030.0286  0.5857  
GraphSAGEmean  0.21370.0009  0.60360.0007  0.87410.0022  0.39200.0007  0.45250.001  0.5072  
GraphSAGEpooling  0.27160.0148  0.69870.0143  0.93510.0051  0.46530.0155  0.51660.0136  0.5775  
GFM  0.28670.005  0.70550.0063  0.94310.0042  0.47570.0061  0.53920.0058  0.5900 
The experimental results evaluated by HR@K and NDCG@K on single domain recommendation task with 95% confidence intervals.
4.2 Evaluation Protocol
Following existing works [9, 10], we adopt the LeaveOneOut (LOO) evaluation. We randomly sample one interaction for each user as the validation and test sets, respectively. We also follow the common strategy [10, 4] to randomly sample 99 unobserved (negative) items for each user and then evaluate how well the model can rank the test item against these negative ones. Then, we adopt two standard metrics, HR@K and NDCG@K, which are widely used in recommendation [4, 10, 9, 30, 2], to evaluate the ranking performance of each methods. The HR@K is computed as follows:
(13) 
where is the hit position for the user ’s test item, and is the indicator function. The NDCG@K is computed as follows:
(14) 
We report HR@K and NDCG@K with K = 1, 10 and 50. The larger the value, the better the performance for all the evaluation metrics. For all experiments, we report the metrics with
95% confidence intervals on five runs.4.3 Implementation Details
If a user has feedback on an item, there is an edge between the user node and the item node. Thus, we construct the feedback graph utilized in our experiments.
For single domain recommendation task, we perform experiments on the four target domain datasets (i.e., IQI, NF, MU, BO). For all datasets we use: embedding dimension , neighbor sampling threshold with two GFM layers, negative sampling ratio , minibatch size of 256 and learning rate of 0.001. We also use dropout whose probability is 0.4.
For crossdomain recommendation task, we perform experiments on the four pairs crossdomain datasets. For all datasets we use: embedding dimension , neighbor sampling threshold with one GFM layer, negative sampling ratio , tunable hyperparameter to control the different strength in Equation (12), minibatch size of 256 and learning rate of 0.001. We also use dropout whose probability is 0.4.
All these values and hyperparameters of all baselines are chosen via a grid search on the IQI validation set. We do not perform any datasetsspecific tuning except early stopping on validation sets. All models are implemented using TensorFlow
^{6}^{6}6https://www.tensorflow.org and trained on GTX 1080ti GPU. Training is finished through stochastic gradient descent over shuffled minibatches with the Adam [13] update rule.4.4 Baseline Methods
We construct three groups of experiments to demonstrate the effectiveness of the proposed model and framework.
4.4.1 Single Domain Recommendation
We compare the proposed GFM model with the following baseline models.

NCF [9]: Neural Collaborative Filtering (NCF) is the stateoftheart solution for recommendation tasks with implicit feedback. We use one of the variants of NCF, which is also called Generalized Matrix Factorization (GMF).

GCN [14]: The vanilla GCN learns latent node representations based on the firstorder approximation of spectral graph convolutions.

GAT [28]: It applies the attention mechanism to learn different weights for aggregating node features from neighbors.

GraphSAGEmean [7]: It learns to aggregate node messages from a node’s local neighborhood by the mean aggregator.

GraphSAGEpooling [7]: It learns to aggregate node messages from a node’s local neighborhood by the pooling aggregator.
For GCN, GAT, GraphSAGEmean and GraphSAGEpooling, We apply the inner product on the user and item node representations as the output.
4.4.2 CrossDomain Recommendation
We compare the proposed CDGFM model with the following baseline models.

CST [23]: Coordinate System Transfer (CST) assumes that both users and items are overlapped and adds two regularization terms in its objective function. Here, we adapt the CST to our datasets by only reserving singleside (i.e., the userside or itemside) regularization term.

CDNCF [9]: Neural Collaborative Filtering (NCF) is the stateoftheart solution for single domain recommendation tasks with implicit feedback. Here, we adapt it to our crossdomain recommendation task via sharing the overlapped user or item embeddings.

EMCDR [21]: This is an embedding and mapping framework for crossdomain recommendation. The framework contains Latent Factor Model, Latent Space Mapping and Crossdomain Recommendation, and it is not an endtoend method.

EATNN [1]: This is the stateoftheart solution for crossdomain recommendation tasks. By introducing attention mechanisms, the model automatically assigns a personalized transfer scheme for each user.
4.4.3 General CrossDomain Recommendation
We apply the proposed crossdomain framework to other baseline GNN models.
4.5 Performance Comparison
Dataset  Model  HR(NDCG)@1  HR@10  HR@50  NDCG@10  NGDCG@50  Average 

TCIQI  CST  0.19480.0039  0.66780.0136  0.94550.0028  0.41780.0099  0.48580.0030  0.5423 
CDNCF  0.17010.0314  0.54080.0445  0.87020.0402  0.33920.0411  0.41310.0396  0.4667  
EMCDR  0.20580.0239  0.39620.0628  0.74380.0436  0.28970.0394  0.36400.0358  0.3999  
EATNN  0.19590.0102  0.64730.0089  0.93140.0026  0.41030.0100  0.49060.0087  0.5351  
CDGFM  0.21050.0089  0.65360.0159  0.97580.0088  0.42220.0108  0.49630.0080  0.5517  
MLNF  CST  0.18780.0058  0.54130.0024  0.85510.0007  0.34860.0015  0.41780.0023  0.4701 
CDNCF  0.19970.0260  0.55400.0457  0.85390.0246  0.36000.0353  0.42660.0310  0.4788  
EMCDR  0.09680.0260  0.34060.0240  0.65220.0730  0.20270.0170  0.27080.0070  0.3126  
EATNN  0.21030.0018  0.58920.0038  0.87450.0016  0.38350.0015  0.44720.0013  0.5009  
CDGFM  0.22430.0047  0.62470.0069  0.92280.0033  0.40620.0055  0.47320.0043  0.5302  
MOMU  CST  0.23780.0085  0.59340.0024  0.90510.0073  0.39860.0115  0.47750.0035  0.5225 
CDNCF  0.25990.0200  0.72320.0430  0.94800.0261  0.47470.0315  0.52810.0281  0.5868  
EMCDR  0.22900.0290  0.56100.0703  0.84300.0560  0.38340.0320  0.42340.0410  0.4880  
EATNN  0.26800.0021  0.72530.0035  0.94570.0026  0.48810.0013  0.52820.0014  0.5911  
CDGFM  0.27280.0054  0.73140.0072  0.96710.002  0.48510.0060  0.53890.0049  0.5991  
MUBO  CST  0.25240.0089  0.69730.0102  0.93550.0098  0.45750.0105  0.51430.0068  0.5714 
CDNCF  0.27700.0158  0.71840.0332  0.94720.0261  0.48410.0215  0.53340.0836  0.5920  
EMCDR  0.20040.2972  0.48640.5881  0.76120.4115  0.33240.4423  0.39200.4082  0.4345  
EATNN  0.27310.0015  0.70640.0036  0.92770.0026  0.46340.0013  0.50700.0017  0.5755  
CDGFM  0.29780.0481  0.72670.0688  0.94240.0295  0.48720.0609  0.55020.0523  0.6009 
4.5.1 Single Domain Recommendation Task
We demonstrate the effectiveness of our GFM on four target domain datasets. The experimental results evaluated by HR@K and NDCG@K on IQI, NF, MU and Bo are presented in Table II. From these results, we have the following insightful observations.

Among these GNN baselines, the GCN has acceptable performances on multiple datasets. The GraphSAGEmean improves the results comparing with GCN via introducing the mean aggregator to aggregate messages from each node’s local neighborhood. The GraphSAGEpooling achieves further improvement over GraphSAGEmean by replacing the mean aggregator with the more complex pooling aggregator, which applies the elementwise maxpooling operation on the transformed neighbor messages through a fullyconnected neural network. The GAT obtains further performance improvement via assigning different learnable weights to neighbor messages.

NCF also obtains competitive recommendation performance, which further validates why the simple collaborative filtering methods can be widely used in recommender systems. On most tasks, our GFM outperforms the NCF, which demonstrates the graphstructured data are useful for recommender systems.

Our GFM almost obtains the best performance on multiple datasets. It outperforms the GNN baselines on multipair metrics. Besides, although the improvement of the GFM compared with the GAT is marginal on a few metrics and datasets, the Average values of these metrics of the GFM are better on all four datasets, which indicates that the GFM has better generalization performance than the GAT.
The essence of recommender systems is to find similarity, and local neighbor nodes often contain such similarity. Our GFM aggregates local neighbor messages via highorder feature interactions. Therefore, the GFM can achieve better performance and is more suitable on recommendation tasks. Overall, these improvements indicate the fact that our GFM can effectively integrate neighbor messages to generate more effective node representations and is more suitable when confronting the graphstructured data.
4.5.2 CrossDomain Recommendation Task
We also demonstrate the effectiveness of our CDGFM on four pairs crossdomain datasets. The experimental results evaluated by HR@K and NDCG@K are presented in Table III. From these results, we have the following findings.

The collaborative filtering based CDNCF still obtains competitive recommendation performance via sharing the embedding of overlapped users or items, and it improves the recommendation performance of the CST on all datasets except the TCIQI. We conjecture that collaborative filtering methods need a lot of data to obtain good performance, while the TCIQI have less feedback data. It also demonstrates that collaborative filtering is indeed a simple and efficient method in recommender systems.

EMCDR is not an endtoend method, and the poor performance may result from the accumulation of errors at each step.

EATNN is the stateoftheart crossdomain recommendation baseline, and it achieves nearly the best results across multiple datasets among these baselines.

By utilizing the graph topology, our CDGFM improves the recommendation performance compared with various methods. It demonstrates that the proposed crossdomain framework combined with the proposed GFM is more suitable for the graphstructured data in crossdomain recommendation.
4.5.3 General CrossDomain Recommendation Task
Our crossdomain framework is a general framework that can be applied upon various existing GNN models. Here we apply the crossdomain framework to GCN, GAT, GraphSAGEmean and GraphSAGEpooling. In order to prove that our cross domain framework is applicable to various GNN models. We conduct experiments on 40 tasks (, 4 pairs datasets, 10 models). The results are shown in Figure 4. The red lines are the baselines which only use the target training set to train model, also shown in Table II, and the blue lines are the crossdomain models which applied the general crossdomain framework. From the results, we have the following findings:

On most tasks, our crossdomain framework is effective to improve the performance of the single domain models which also demonstrates the crossdomain framework can be applied upon various existing GNN models.

The improvement on GCN is larger than the other four GNN models. The main reason might be that the single domain GCN is significantly weaker than other improved GNN models as showed in Table II, so the improvement of other GNN models brought by the crossdomain framework is relatively less than GCN.

The performance of the GraphSAGEmean and GraphSAGEpooling is unsatisfying on several datasets, the reason might be that the mean and pooling aggregators are too simple and fewer shared parameters make them difficult to coordinately train in two domains.
Overall, we observe that the performance improvement of the crossdomain framework is significant and it is able to improve the performance of base GNN models on different datasets, which proves that the crossdomain framework is compatible with many GNN models.
4.6 Ablation Study
Model  HR@1  HR@10  HR@50  HR@1  HR@10  HR@50 

TCIQI  MOMU  
CDGFMbase  0.1681  0.5914  0.9362  0.2445  0.6989  0.9054 
CDGFM  0.2105*  0.6536*  0.9758*  0.2728*  0.7314*  0.9671* 
MLNF  MUBO  
CDGFMbase  0.2178  0.6196  0.9182  0.2756  0.6963  0.9395 
CDGFM  0.2243*  0.6247*  0.9228  0.2978*  0.7267*  0.9424 
0.05 on independent samples ttests.
Moreover, for understanding the contribution of the shared node initialization in CDGFM. we construct ablation experiments over CDGFMbase and CDGFM on four pairs datastes. CDGFMbase only uses the domainspecific node representations and output directly from the GFM and not to concatenate the initialized input in Equation (8) and (9), i.e., The results are presented in Table IV. We conduct independent samples ttests and the pvalue 0.05 indicates that the improvement of CDGFM over the CDGFMbase is statistically significant. The improvement demonstrates that CDGFM model can efficiently take advantage of the domainshared and domainspecific node representations simultaneously, and obtain the best performance on all datasets, which indicates both two representations matter for the crossdomain recommendation performance.
5 Conclusion
In this paper, we first proposed a novel graph neural network model called Graph Factorization Machine (GFM), which utilizes the popular Factorization Machines (FMs) to aggregate multiorder neighbor messages to overcome the shortcomings of the existing GNN models that integrate neighbor messages too simplistic. Then, we proposed a general crossdomain framework, which can be applied not only to the proposed GFM to form the crossdomain GFM (CDGFM), but also to other GNN models. The extensive experimental results on realworld datasets demonstrate the superior performance of the proposed GFM model and the general cross domain framework compared with various stateoftheart baseline methods.
Acknowledgments
The research work is supported by the National Key Research and Development Program of China under Grant No. 2018YFB1004300, the National Natural Science Foundation of China under Grant No. U1836206, U1811461, 61773361, the Project of Youth Innovation Promotion Association CAS under Grant No. 2017146.
References
 [1] (2019) An efficient adaptive transfer neural network for socialaware recommendation. In Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 225–234. Cited by: §1, §2.3, §3.3.1, 4th item.

[2]
(2018)
Improving implicit recommender systems with view data..
In
International Joint Conference on Artificial Intelligence (IJCAI)
, pp. 3343–3349. Cited by: §4.2.  [3] (2019) Graph neural networks with highorder feature interactions. arXiv preprint arXiv:1908.07110. Cited by: §1, §2.2.
 [4] (2019) Crossdomain recommendation without sharing userrelevant data. In International Conference on World Wide Web (WWW), pp. 491–502. Cited by: §2.3, §3.2.2, §3.2.2, 2nd item, §4.1, §4.2.
 [5] (2016) Node2vec: scalable feature learning for networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 855–864. Cited by: §2.2.
 [6] (2017) DeepFM: a factorizationmachine based neural network for ctr prediction. In International Joint Conference on Artificial Intelligence (IJCAI), pp. 1725–1731. Cited by: §1, §1, §2.1.
 [7] (2017) Inductive representation learning on large graphs. In Annual Conference on Neural Information Processing Systems (NeurIPS), pp. 1024–1034. Cited by: §1, §1, §1, §2.2, §3.2.2, §3.3.2, 4th item, 5th item, 3rd item, 4th item.
 [8] (2017) Neural factorization machines for sparse predictive analytics. In Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 355–364. Cited by: §1, §1, §2.1, §3.2.1.
 [9] (2017) Neural collaborative filtering. In International Conference on World Wide Web (WWW), pp. 173–182. Cited by: §1, §3.2.2, 1st item, 2nd item, §4.2.
 [10] (2019) Transfer meets hybrid: a synthetic approach for crossdomain collaborative filtering with text. In International Conference on World Wide Web (WWW), pp. 2822–2829. Cited by: §1, §2.3, §3.2.2, §3.2.2, §3.3.1, §4.2.
 [11] (2013) Personalized recommendation via crossdomain triadic factorization. In International Conference on World Wide Web (WWW), pp. 595–606. Cited by: §2.3.
 [12] (2013) Fism: factored item similarity models for topn recommender systems. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 659–667. Cited by: §3.2.2.
 [13] (2015) Adam: a method for stochastic optimization. In International Conference on Learning Representations (ICLR), Cited by: §4.3.
 [14] (2017) Semisupervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), Cited by: §1, §1, §2.2, §3.3.2, 2nd item, 1st item.
 [15] (2009) Matrix factorization techniques for recommender systems. Computer 42 (8), pp. 30–37. Cited by: §1.
 [16] (201910) Fignn: modeling feature interactions via graph neural networks for ctr prediction. In ACM International Conference on Information and Knowledge Management (CIKM), pp. . Cited by: §1.
 [17] (2018) Xdeepfm: combining explicit and implicit feature interactions for recommender systems. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 1754–1763. Cited by: §1, §2.1.

[18]
(2015)
Nonlinear crossdomain collaborative filtering via hyperstructure transfer.
In
International Conference on Machine Learning (ICML)
, pp. 1190–1198. Cited by: §2.3.  [19] (2014) Crossdomain collaborative filtering with factorization machines. In European Conference on Information Retrieval (ECIR), pp. 656–661. Cited by: §2.3.
 [20] (2019) Net: a parallel informationsharing network for sharedaccount crossdomain sequential recommendations. In Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 685–694. Cited by: §2.3.
 [21] (2017) Crossdomain recommendation: an embedding and mapping approach.. In International Joint Conference on Artificial Intelligence (IJCAI), pp. 2464–2470. Cited by: §1, §3.3.1, 3rd item.
 [22] (2008) Probabilistic matrix factorization. In Annual Conference on Neural Information Processing Systems (NeurIPS), pp. 1257–1264. Cited by: §3.2.2.
 [23] (2010) Transfer learning in collaborative filtering for sparsity reduction. In AAAI Conference on Artificial Intelligence (AAAI), Cited by: §1, §2.3, §3.3.1, 1st item.
 [24] (2016) Productbased neural networks for user response prediction. In IEEE International Conference on Data Mining (ICDM), pp. 1149–1154. Cited by: §2.1.
 [25] (2010) Factorization machines. In IEEE International Conference on Data Mining (ICDM), pp. 995–1000. Cited by: §1, §1, §2.1, §3.2.1.
 [26] (2008) The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. Cited by: §2.2.
 [27] (2012) Crossdomain collaboration recommendation. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 1285–1293. Cited by: §1, §2.3, §3.3.1.
 [28] (2018) Graph attention networks. In International Conference on Learning Representations (ICLR), Cited by: §1, §1, §2.2, §3.3.2, 3rd item, 2nd item.
 [29] (2017) Deep & cross network for ad click predictions. In Proceedings of the ADKDD, pp. 12. Cited by: §2.1.
 [30] (2018) Tem: treeenhanced embedding model for explainable recommendation. In International Conference on World Wide Web (WWW), pp. 1543–1552. Cited by: §4.2.
 [31] (2019) Neural graph collaborative filtering. In Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 165–174. Cited by: §1, §2.2.
 [32] (2019) How powerful are graph neural networks?. In International Conference on Learning Representations (ICLR), Cited by: §3.2.2.
 [33] (2019) Multisite user behavior modeling and its application in video recommendation. IEEE Transactions on Knowledge and Data Engineering. Cited by: §2.3, 1st item.
 [34] (2019) DARec: deep domain adaptation for crossdomain recommendation via transferring rating patterns. In International Joint Conference on Artificial Intelligence (IJCAI), Cited by: §1, §2.3, §3.3.1.
 [35] (2016) Deep learning over multifield categorical data. In European Conference on Information Retrieval (ECIR), pp. 45–57. Cited by: §2.1.