A Unified Framework for Cross-Domain and Cross-System Recommendations

08/18/2021 ∙ by Feng Zhu, et al. ∙ Macquarie University 0

Cross-Domain Recommendation (CDR) and Cross-System Recommendation (CSR) have been proposed to improve the recommendation accuracy in a target dataset (domain/system) with the help of a source one with relatively richer information. However, most existing CDR and CSR approaches are single-target, namely, there is a single target dataset, which can only help the target dataset and thus cannot benefit the source dataset. In this paper, we focus on three new scenarios, i.e., Dual-Target CDR (DTCDR), Multi-Target CDR (MTCDR), and CDR+CSR, and aim to improve the recommendation accuracy in all datasets simultaneously for all scenarios. To do this, we propose a unified framework, called GA (based on Graph embedding and Attention techniques), for all three scenarios. In GA, we first construct separate heterogeneous graphs to generate more representative user and item embeddings. Then, we propose an element-wise attention mechanism to effectively combine the embeddings of common entities (users/items) learned from different datasets. Moreover, to avoid negative transfer, we further propose a Personalized training strategy to minimize the embedding difference of common entities between a richer dataset and a sparser dataset, deriving three new models, i.e., GA-DTCDR-P, GA-MTCDR-P, and GA-CDR+CSR-P, for the three scenarios respectively. Extensive experiments conducted on four real-world datasets demonstrate that our proposed GA models significantly outperform the state-of-the-art approaches.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Background

Targeting data sparsity problem, Cross-Domain Recommendation (CDR) [2] and Cross-System Recommendation (CSR) [66, 70] have been proposed to leverage the richer information from a richer dataset (domain/system) to help improve the recommendation accuracy in a sparser one, resulting in single-target CDR (Conventional Scenario 1) and single-target CSR (Conventional Scenario 2). For example, in Douban system111Douban website: https://www.douban.com, the recommender system can recommend books to a target user (e.g., Alice in Fig. 1(a)

) according to her movie knowledge, i.e., this is single-target CDR. In contrast, the recommender system can recommend movies to a target user in MovieLens

222MovieLens website: http://www.movielens.org according to the knowledge of these movies (e.g., Titanic in Fig. 1(b)) learned from Netflix333Netflix website: https://www.netflix.com, i.e., this is single-target CSR. In addition to the above-mentioned rating systems, CDR and CSR have been applied to other application scenarios as well, including academic searching (e.g., Arnetminer444Arnetminer website: http://arnetminer.org/ [46]), e-commerce (e.g., Amazon555Amazon website: https://www.amazon.com/ [5]), and social networking (e.g., Facebook666Facebook website: https://www.facebook.com [41] and Tencent Weibo777Tencent Weibo website: t.qq.com (it was shut down on September 28th, 2020) [16, 15]).

CDR and CSR have different kinds of overlapping entities that serve as the ‘bridge’ to link the two data sources. These overlapping (common) entities are relations between two domains/systems, and thus the two domains/systems are termed as related domains/systems. In CDR, there are two related domains (e.g., movie domain and book domain) in the same system (e.g., Douban) and thus CDR techniques can utilize common users to transfer/share their knowledge across domains. Likewise, in CSR, the two related systems (e.g., Netflix and MovieLens) have the same domain (e.g., movie domain) and thus contain common items (e.g., movies). Technically, in solutions, we only need to replace the ‘bridge’ from common users in a CDR model to common items so as to support CSR, and vice versa. This means that CDR and CSR techniques can be applied to each other’s scenarios. Thus, in this paper, our proposed approaches can be applied for all related domains/systems. In this paper, without a special explanation, we basically focus on CDR when discussing solutions, except the scenario of CDR+CSR.

(a) Conventional Scenario 1: Single-target CDR
(b) Conventional Scenario 2: Single-target CSR
(c) The limitation of conventional single-target CDR
(d) Our Target Scenario 1: Dual-target CDR
(e) Our Target Scenario 2: Multi-target CDR
(f) Our Target Scenario 3: CDR+CSR
Fig. 1: Different scenarios of cross-domain and cross-system recommendations

1.2 Limitations of Conventional Single-Target CDR

Existing single-target CDR approaches can be generally classified into two groups: content-based transfer approaches and feature-based transfer approaches.

Content-based transfer tends to link different domains by identifying similar content information — such as user profiles, item details [2], user-generated reviews [44], and social tags [4]. Feature-based transfer [60, 28, 70, 61, 12, 34, 13, 24, 64, 20] first trains different Collaborative Filtering (CF) based models — such as Bayesian Personalised Ranking (BPR) [39], Neural Matrix Factorization (NeuMF) [9], and Deep Matrix Factorization (DMF) [53]

, to obtain user/item embeddings or patterns, and then transfers these embeddings through common or similar users across domains. In contrast to the content-based transfer approaches, feature-based transfer approaches typically employ machine learning techniques — such as transfer learning

[63]

and neural networks

[32], to transfer knowledge across domains.

Motivating Example 1

Fig. 1(c) depicts a special case in the conventional single-target CDR system (i.e., Douban) that contains two domains — DoubanMovie (the richer domain) and DoubanBook (the sparser domain) — including users, items (movies or books), and interactions (e.g., ratings and reviews). In contrast to Alice in Fig. 1(a), who is one of majority users in the dataset, Bob in Fig. 1(c), who is one of minority users in the dataset, reviewed few movies and many books, and thus Bob’s knowledge (e.g., user embedding) in the book domain would be more accurate than his knowledge in the movie domain. However, the knowledge in the book domain cannot be used to improve the knowledge in the movie domain since the conventional single-target CDR system can only leverage the information from the richer domain to improve the recommendation accuracy in the sparser domain.

However, all these existing single-target CDR approaches only focus on how to leverage the source domain to help improve the recommendation accuracy in the target domain, but not vice versa. This is also explained in Motivating Example 1. In fact, each of the two domains may be relatively richer in certain types of information (e.g., ratings, reviews, user profiles, item details, and tags); if such information can be leveraged well, it is likely to improve the recommendation performance in both domains simultaneously, rather than in a single target domain only. Therefore, the novel dual-target CDRs [69, 25, 71, 26] have been recently proposed to improve the recommendation accuracy in both richer and sparser domains simultaneously by making good use of the information or knowledge from both domains.

1.3 Our Target Scenarios

Dual-target CDR is our first target scenario. Intuitively, based on the existing single-target CDR approaches, it seems to be a solution for dual-target CDR (Target Scenario 1, see Fig. 1(d)) by simply changing their transfer direction from “RicherSparser” to “SparserRicher”. However, as referred to as Negative Transfer [36], this idea does not work, because, in principle, the knowledge learned from the sparser domain is less accurate than that learned from the richer domain, and thus, the recommendation accuracy in the richer domain is more likely to decline by simply and directly changing the transfer direction. Therefore, dual-target CDR/CSR demands novel and effective solutions.

Additionally, inspired by dual-target CDR, multi-target CDR (Target Scenario 2, see Fig. 1(e)), namely, improving the recommendation accuracy in multiple domains simultaneously, is also an interesting and challenging research problem for CDR. However, unlike dual-target CDR, in Target Scenario 2, more non-IID (independent and identically distributed) data from multiple domains may negatively affect the recommendation performance, which is likely to cause negative transfer. This is the new challenge. Though there are no solutions reported in the literature yet, multi-target CDR is similar to Multi-Domain Recommendation (MDR) to some extent. Nevertheless, MDR [62, 35, 37, 63] tends to improve the recommendation accuracy in a single target domain or the recommendation accuracy of a mixed user set from multiple domains by leveraging the auxiliary information from multiple domains. Therefore, a feasible multi-target CDR/CSR solution is in demand.

Moreover, it would be promising to devise a hybrid approach that can leverage the auxiliary information from both multiple domains and multiple systems to further improve the accuracy in these domains and systems simultaneously, i.e., CDR+CSR (Target Scenario 3, see Fig. 1(f)). This means that CDR+CSR should utilize the information of both common users and common items in the same approach. This is also an interesting and challenging research problem.

1.4 Challenges

Targeting Scenario 1, there are two challenges (CH1 and CH2) as follows.

CH1: how to leverage the data richness and diversity to generate more representative single-domain user and item embeddings for improving recommendation accuracy in each of the domains? Both traditional Collaborative Filtering (CF) models, e.g., BPR [39], and novel neural CF models, e.g., NeuMF [9] and DMF [53], are based on the user-item relationship to learn user and item embeddings. However, most of them ignore the user-user and item-item relationships, and thus can hardly enhance the quality of embeddings.

CH2: how to effectively optimize the user or item embeddings in each target domain for improving recommendation accuracy?

The state-of-the-art dual-target CDR approaches either adopt fixed combination strategies, e.g., average-pooling, max-pooling, and concatenation

[69, 30], or simply adapt the existing single-target transfer learning to dual transfer learning [25]. However, none of them can effectively combine the embeddings of common entities, and thus it is hard to achieve an effective embedding optimization in each target domain.

Targeting Scenario 2, there is a new challenge (CH3).

CH3: how to avoid negative transfer when combining the embeddings of common users from multiple domains? Compared with dual-target CDR (Target Scenario 1), the core goal of multi-target CDR (Target Scenario 2) is to leverage more auxiliary information from more domains to improve the recommendation performance. However, it is worth noting that more non-IID data from more domains may negatively affect the recommendation performance. This is because such incomplete non-IID data, especially in sparser domains, can only reflect biased features of common users. Therefore, in Target Scenario 2, the recommendation performance in some domains may decline as more sparser domains join in, i.e., the negative transfer can thus happen.

Targeting Scenario 3, there is a new challenge (CH4).

CH4: how to effectively leverage the auxiliary information of both common users and common items simultaneously? In a dual-target or multi-target CDR scenario, we only need to optimize the embeddings of common users from dual or multiple domains. Then, based on CF models in each domain, the embeddings of distinct users and items can be optimized gradually. However, in a CDR+CSR scenario, it should effectively leverage the embeddings of common users and common items simultaneously, which may improve the recommendation performance in each dataset (domain/system) more quickly.

1.5 Our Approach and Contributions

To address the above four challenges, in this paper, we propose a unified framework for all dual-target CDR, multi-target CDR, and CDR+CSR scenarios. The characteristics and contributions of our work are summarized as follows:

  • [leftmargin=*]

  • We propose a Graphical and Attentional framework, called GA, for Dual-Target CDR (GA-DTCDR) scenario, which can leverage the data richness and diversity (e.g., ratings, reviews, and tags) of different datasets, share the knowledge of common entities across domains;

  • To address CH1, we construct a heterogeneous graph, considering not only user-item relationships (based on ratings), but also user-user and item-item relationships (based on content similarities). Then, with this heterogeneous graph, we apply a graph embedding technique, i.e., Node2vec, to generate more representative single-domain user and item embeddings for accurately capturing user and item features;

  • To address CH2, we propose an element-wise attention mechanism to effectively combine the embeddings of common entities learned from dual domains, which can significantly enhance the quality of user/item embeddings and thus improve the recommendation accuracy in each of both domains simultaneously.

It is worth mentioning that this work is an extension of our preliminary work [71]. In this paper, we further deliver the following contributions:

  • [leftmargin=*]

  • Different from GA-DTCDR proposed in [71] that only supports dual-target CDR scenario, we extend the above proposed GA framework and adopt a Personalized training strategy to support all Dual-Target CDR (GA-DTCDR-P), Multi-Target CDR (GA-MTCDR-P), and CDR+CSR (GA-CDR+CSR-P) scenarios;

  • To address CH3, we propose a Personalized training strategy, deriving GA-DTCDR-P and GA-MTCDR-P, to train the recommendation models in different domains, which can first give personalized weights to the pair-wise embedding differences of common users between every two domains and then minimize these pair-wise embedding differences. The embeddings of common users in different domains tend to be similar but remain personalized, and thus the personalized strategy can avoid negative transfer to some extent;

  • To address CH4, we adjust the element-wise attention structure of GA-DTCDR to support CDR+CSR scenario and thus GA-CDR+CSR-P can enhance the qualities of the embeddings of common users and items simultaneously.

We conduct extensive experiments on four real-world datasets, which demonstrate that our GA-DTCDR-P significantly outperforms the best-performing baselines by an average of 9.04% in terms of recommendation accuracy. Additionally, we conduct more multi-target CDR and CDR+CSR experiments (see Tasks 4 and 5 in Experiments and Analysis) to demonstrate that our GA-MTCDR-P and GA-CDR+CSR-P can further improve the best-performing baselines by an average of 9.21%.

2 Related Work

2.1 Single-Target CDR

Most of the existing single-target CDR approaches tend to leverage auxiliary information from the source domain to improve the recommendation accuracy in the target domain. According to their transfer strategies, these single-target CDR approaches are classified into two categories: content-based transfer and feature-based transfer.

  • [leftmargin=*]

  • Content-based transfer. These approaches first link the richer and sparser domains by content information, e.g., user/item attributes [2], tags [42, 48], social relations [16, 15, 17], semantic properties [58], thumbs-up [41], text information [44], metadata [40], browsing or watching history [18]. Then they transfer/share user preferences or item details across domains.

  • Feature-based transfer. These approaches tend to employ some classical machine learning techniques — such as multi-task learning [31], transfer learning [11, 8, 61, 12, 34, 13, 24, 64, 20], clustering [50]

    , reinforcement learning

    [27], deep neural networks [32, 70, 5, 29], relational learning [43]

    and semi-supervised learning

    [19], to map or share features, e.g., user/item lembeddings and rating patterns [8, 57], learned by CF-based models (e.g., classical factorization models and novel neural CF models), across domains.

Additionally, some studies [62, 37, 63, 59] focus on a derivational problem, i.e., multi-domain recommendation, which is to improve the recommendation accuracy on the target domain by leveraging the auxiliary information from multiple domains. However, all of them are single-target models, which means they cannot improve the recommendation accuracy in the richer domain even if the sparser domain may contain certain types of auxiliary information to support the richer domain.

2.2 Dual-Target CDR

Dual-target CDR is still a novel concept for improving the recommendation accuracy in both domains simultaneously. Therefore, existing solutions are limited. The existing dual-target CDR approaches mainly focus on applying fixed combination strategies [69, 30], or they focus on simply changing the existing single-target transfer learning to become dual-transfer learning [25, 26]. However, none of them can effectively combine the embeddings of common users.

In [69], Zhu et al. proposed the DTCDR, which is the first dual-target CDR framework in the literature that uses multi-source information to generate more representative embeddings of users and items. Based on multi-task learning, the DTCDR framework uses three different combination strategies, e.g., average-pooling, max-pooling, and concatenation, to combine and share the embedding of common users across domains. Later on, similarly, in [30], Liu et al. also use a fixed combination strategy, i.e., hyper-parameters and data sparsity degrees of common users.

In addition, in [25], Li et al. proposed the DDTCDR, a deep dual-transfer framework for dual-target CDR. The DDTCDR framework considers the bidirectional latent relations between users and items and applies a latent orthogonal mapping to extract user preferences. Based on the orthogonal mapping, DDTCDR can transfer users’ embeddings in a bidirectional way (i.e., Richer Sparser and Sparser Richer). Recently, Li et al. proposed an improved version of DDTCDR in [26], i.e., a dual metric learning (DML) model for dual-target CDR.

2.3 Graph Embedding

Graph Embedding is to learn a mapping function that maps the nodes in a graph to low-dimensional latent representations [68]. These latent representations can be used as the features of nodes for different tasks, such as classification and link prediction. According to embedding techniques, this section classifies the existing graph-embedding approaches into two categories: dimensionality reduction and neural networks. Dimensionality reduction-based approaches — such as multidimensional scaling [22]

, principal component analysis

[51] and their extensions [54] — involve optimising a linear or non-linear function that reduces the dimension of a graph’s representative data matrix and then produces low-dimensional embeddings. Neural network-based approaches — such as DeepWalk [38], LINE [45] and Node2vec [6] — involve treating nodes as words and the generated random walks on graphs as sentences, and then learning node embeddings based on these words and sentences [68]. Also, recently, there are some graph embedding approaches that can leverage both explicit preferences and heterogeneous relationships by graph convolutional networks [65, 56].

2.4 Attention Mechanism

Attention is firstly introduced in [1], which provides more accurate alignment for each position in a machine translation task. Apart from machine translation, recently, attention mechanism also has been widely used in recommendation [3]. The general idea of the attention mechanism is to focus on selective parts of the whole information, which can capture the outstanding features of objects. For recommendation, the existing attention approaches [10, 47, 49] tend to select more informative parts of explicit or implicit data to improve the representations for users and items.

Symbol Definition
the comment (e.g., the review and the tags) of
user on item
the user comments
the content documents of users and items
Domain
the item details
the heterogeneous graph, is the set of
user-user, user-item, and item-item relationships
the dimension of embedding matrix
the number of users
the number of items
the combined embeddings of common users
the rating of user on item
the rating matrix
System
the set of users
the graph embedding matrix of users
the document embedding matrix of users
the user profiles
the set of items
the graph embedding matrix of items
the document embedding matrix of items
the interaction of user on item
the user-item interaction matrix
the notations for domain , where
is the total number of domains, e.g.,
represents the number of users in domain
the predicted notations, e.g., represents the
predicted interaction of on item
TABLE I: Important notations

3 The Proposed Model

In this section, we first formalize the dual-target CDR, multi-target CDR, and CDR+CSR problems. Then, we preliminarily propose a Graphical and Attentional framework, called GA, for DTCDR (GA-DTCDR) scenario. Next, we extend the above GA framework and adopt a Personalized training strategy to support all dual-target CDR (GA-DTCDR-P), multi-target CDR (GA-MTCDR-P), and CDR+CSR (GA-CDR+CSR-P) scenarios. Finally, we present the detailed components of GA-DTCDR (or GA-DTCDR-P), GA-MTCDR-P, and GA-CDR+CSR-P.

3.1 Problem Statement

First, for the sake of better readability, we list the important notations of this paper in Table I. Then, we define the Dual-Target CDR, Multi-Target CDR, and CDR+CSR as follows.

Definition 1

Dual-Target Cross-Domain Recommendation (DTCDR): Given two related domains and , with explicit feedback (e.g., ratings and comments), implicit feedback (e.g., purchase and browsing histories), and side information (e.g., user profiles and item details), DTCDR is to improve the recommendation accuracy in both domains simultaneously by leveraging their observed information.

Definition 2

Multi-Target Cross-Domain Recommendation (MTCDR): Given multiple related domains to , with explicit feedback, implicit feedback, and side information, MTCDR is to improve the recommendation accuracy in all domains simultaneously by leveraging their observed information.

Definition 3

Cross-Domain and Cross-System Recommendation (CDR+CSR): Given multiple related domains/sytems to , with explicit feedback, implicit feedback, and side information, CDR+CSR is to improve the recommendation accuracy in all domains and systems simultaneously by leveraging their observed information.

Note that a certain degree of overlap between the users of different domains, i.e., common users, and overlap between the items of different systems, i.e., common items, play a key role in bridging the different datasets (domains/systems) and exchanging knowledge across them. This is a common idea of the existing CDR and CSR approaches [32, 67, 70].

Fig. 2: The overview of GA-DTCDR-P (or GA-DTCDR). This is our GA framework for DTCDR scenario, and the only difference between GA-DTCDR and GA-DTCDR-P is about their training strategies (i.e., GA-DTCDR adopts the objective function in Section 3.5.1, while GA-DTCDR-P adopts the personalized objective function in Section 3.5.2). Note that, for domain , , , where

is the weight vector for the embedding of common users

3.2 Overview of GA Framework

In this section, we first take GA-DTCDR-P (or GA-DTCDR) as an example to introduce the general structure of GA. As shown in Fig. 2, GA-DTCDR-P framework is divided into five main components, i.e., Input Layer, Graph Embedding Layer, Feature Combination Layer, Neural Network Layers, and Output Layer. The main differences between GA-DTCDR-P and other two sub-frameworks, i.e., GA-MTCDR-P and GA-CDR+CSR-P, are the network structures of element-wise attention (see the Graph Embedding Layers and Feature Combination Layers of Figs. 2, 3, and 4). For clarity, we ignore the same components of GA-MTCDR-P and GA-CDR+CSR-P with GA-DTCDR, i.e., (1) Input Layer, (4) Neural Network Layers, and (5) Output Layer. We will present the details of each component in the following sections.

Like the single-target or dual-target CDR approaches in [67, 70, 69], our GA-DTCDR-P and GA-MTCDR-P can be applied to dual-target CSR and multi-target CSR as well, where the two/multiple systems have the same domain but different users, and thus contain common items only — such as DoubanMovie and MovieLens (see Task 3 in Experiments and Analysis). Accordingly, in Figs. 2 and 3, we only need to replace common users with common items for supporting dual-target CSR and multi-target CSR.

In fact, GA-MTCDR-P is an extension of GA-DTCDR-P from dual domains to multiple domains. GA-CDR+CSR-P is the full version of GA to handle almost all CDR and/or CSR scenarios. If there are only common users among all datasets in GA-CDR+CSR-P, GA-CDR+CSR-P will be degraded to GA-DTCDR-P or GA-MTCDR-P. Similarly, if there are only common items among all datasets in GA-CDR+CSR-P, then GA-CDR+CSR-P will be degraded for dual-target CSR or multi-target CSR.

The time complexities of GA-DTCDR-P and GA-MTCDR-P are both , where is the number of interactions in domain , is the number of users in domain (note that for DTCSR or MTCSR, the number of users is replaced by the number of the number of items in the time complexity expression), is the number of nodes in each MLP layer (the node number is relative to the embedding dimension ), and is depth of MLP layers. Similarly, the time complexity of GA-CDR+CSR-P is . Compared with GA-DTCDR-P and GA-MTCDR-P, GA-CDR+CSR-P can share the embeddings of both common users and common items across domains/systems, and thus there is the sum of the number of users and the number of items, i.e., , in the time complexity expression. Although and are constants in our experiments, is still very large. However, a deep MLP structure can represent a complex and well-trained non-linear relation between users and items, and thus can enhance the recommendation accuracy. This is a trade-off between running time and recommendation accuracy.

We now briefly present each component of GA as follows.

  • [leftmargin=*]

  • Input Layer. First, for the input of our GA-DTCDR-P, GA-MTCDR-P, and GA-CDR+CSR-P, we consider both explicit feedback (ratings and comments) and side information (user profiles and item details). These input data can be generally classified into two categories, i.e., rating information and content information.

  • Graph Embedding Layer. Then, we leverage rating and content information of each domain to construct a heterogeneous graph, representing user-item interaction relationships, user-user similarity relationships, and item-item similarity relationships. Based on the graph, we apply the Graph Embedding model, i.e., Node2vec [6], to generate user and item embedding matrices.

  • Feature Combination Layer. Next, we propose an element-wise attention mechanism to combine the common users’ embeddings from dual (GA-DTCDR-P) or multiple (GA-MTCDR-P) domains. This layer intelligently gives a set of weights to the embeddings of a common user learned from dual/multiple domains and generates a combined embedding for the common user, which remains his/her features learned from different domains with different proportions. Additionally, for GA-CDR+CSR-P, the element-wise attention mechanism can be applied to combine both the common users’ embeddings and the common items’ embeddings.

  • Neural Network Layers.

    In this component, we apply a fully-connected neural network, i.e., Multi-Layer Perceptrons (MLP), to represent a non-linear relationship between users and items in each domain.

  • Output Layer. Finally, we can generate final user-item interaction predictions. The training of our model is mainly based on the loss between predicted user-item interactions and observed user-item interactions.

Next, we will introduce the details of Graph Embedding Layer, Feature Combination Layer, Neural Network Layers, and Output Layer in the following sections.

3.3 Graph Embedding Layer

The existing embedding strategies for recommender systems mainly focus on representing the user-item interaction relationship. Apart from the user-item interaction relationship, we use a graph to represent user-user and item-item relationships as well. Therefore, based on the rating and content information observed from dual or multiple domains, we construct a heterogeneous graph, including nodes (users and items) and weighted edges (ratings and content similarities), for each domain. Then, we can generate more representative user and item embedding matrices. The Graph Embedding contains three main sub-components, i.e., Document Embedding, Graph Construction, and Output.

3.3.1 Document Embedding

To construct the heterogeneous graph, we need to compute the content similarities between two users or two items. To this end, we consider multi-source content information, e.g., reviews, tags, user profiles, item details, observed from dual/multiple domains, to generate user and item content embedding matrices. In this paper, we adopt the most widely used model, i.e., Doc2vec [23], as the document embedding technique. The detailed document embedding process works as follows: (1) First, in the training set, for a user , we collect the comments (reviews and tags) and the user profile of into the same content document , while for an item , we collect the comments (reviews and tags) on the item and its item detail into the same content document ; (2) Next, we segment the words in the documents by using the most widely used natural language tool, i.e., StanfordCoreNLP [33]; (3) Finally, we apply Doc2vec model to map the documents into the text vectors and for users and items, respectively.

3.3.2 Graph Construction

First, we link the users and items via their interaction relationships. The weights of these interaction edges are normalized ratings, i.e.,

. To consider the user-user and item-item relationships in the heterogeneous graph, we generate the synthetic edges between two users or two items according to their normalized content similarities (edge weights). The generation probability

of the edge between users and is as follows:

(1)

where is a hyper-parameter which controls the sampling probability and

is the normalized cosine similarity between

and . Similarly, we can obtain the generation probability between two items. Based on the user-item interaction relationships, user-user similarity relationships, and item-item similarity relationships, we can construct the heterogeneous graphs for domain , where .

Similar to the approaches proposed in [16, 15], we also construct a heterogeneous graph to represent the relations among users and items. But we construct a heterogeneous graph in each domain rather than a common graph as in [16, 15].

3.3.3 Output

Based on the heterogeneous graph , we employ the graph embedding model, i.e., Node2vec [6], to generate user embedding matrix and item embedding matrix for domain .

Fig. 3: The overview of GA-MTCDR-P. Compared with GA-DTCDR-P, this is the extended structure for MTCDR scenario. In fact, this is an extension of GA-DTCDR-P from dual domains to multiple domains. For clarity, we ignore the same components with GA-DTCDR-P, i.e., Input Layer, Neural Network Layers, and Output Layer. Note that, for domain , , , where is the total number of domains and is the weight vector for the embedding of common users

3.4 Feature Combination Layer

Feature Combination Layer is to combine the embeddings of common entities learned from dual/multiple datasets. By doing so, the combined embeddings of common entity for each dataset can remain all features learned from the two/multiple datasets in different proportions. To this end, we propose an element-wise attention mechanism. The traditional attention mechanism tends to select a certain part of representative features and give these features higher weights when generating the combined features [1]. Similarly, for a common entity, our element-wise attention mechanism tends to pay more attention to the more informative elements from each set of embedding elements (the embeddings of this common entity learned from different datasets). Compared with DTCDR and MTCDR scenarios (only common users), in CDR+CSR scenario, our element-wise attention mechanism needs to combine the embeddings of common users and items simultaneously. Therefore, we will separately introduce the feature combination layers of GA-DTCDR-P and GA-MTCDR-P and the feature combination layer of GA-CDR+CSR-P.

3.4.1 For GA-DTCDR-P and GA-MTCDR-P

In GA-DTCDR-P and GA-MTCDR-P (see Figs. 2 and 3), the feature combination layers are to combine the embeddings of common users learned from dual/multiple domains by our element-wise attention mechanism. For a common user , our element-wise attention mechanism tends to pay more attention to the more informative elements from each set of elements in , where is the total number of domains (for DTCDR, and for MTCDR, ). Thus our element-wise attention mechanism can generate more representative embeddings of the common user for domains , respectively. The structures of element-wise attention are shown in Feature Combination Layer of Figs. 2 and 3, respectively. The combined embedding of a common user for domain can be represented as:

(2)

where is the element-wise multiplication and is the weight vector of the embedding of common users from domain for domain .

Note that for the distinct users and all the items in each domain, we just reserve their embeddings without using the attention mechanism because they do not have dual/multiple embeddings.

Fig. 4: The overview of GA-CDR+CSR-P. Compared with GA-DTCDR-P, this is the extended structure for CDR+CSR scenario. Similar to GA-MTCDR-P, we ignore the same components with GA-DTCDR-P. Note that, for dataset (domain/system) , , and , where is the total number of datasets, is the weight vector for the embedding of common users for CDR, and is the weight vector for the embedding of common items for CSR. Domain/System contains common users with domain and common items with system . The details are explained in Section 3.4.2

3.4.2 For GA-CDR+CSR-P

In GA-CDR+CSR-P (see Fig. 4), the element-wise attention mechanism is used to combine both the embeddings of common users for related domains and the embeddings of common items for related systems. Unlike GA-DTCDR-P and GA-MTCDR-P, in GA-CDR+CSR-P, if two or multiple datasets have common users, they should make cross-domain recommendations, thus these datasets are related domains to each other. While if two or multiple datasets have common items, they should make cross-system recommendations, thus these datasets are related systems to each other. For example, in Fig. 4, domain/system has the common users with domain , thus it is a related domain for domain . Meanwhile, domain/system has the common items with system , thus it is also a related system for system . Domain/system plays two roles, i.e., a related domain and a related system, in GA-CDR+CSR-P.

In GA-CDR+CSR-P, there are related datesets (domains and systems). Similar to Eq. (2), for a common user from dual/multiple domains (), his/her combined embedding for a domain () can be represented as:

(3)

Similarly, for a common item from dual/multiple systems (), its combined embedding for a system () can be represented as:

(4)

where is the weight vector of the embedding of common items from system for system .

3.5 Training for NN and Output Layers

In this section, we introduce two training strategies, i.e., preliminary training and personalized training, for the neural network layers and output layer of our GA models. The preliminary training strategy is adopted by our preliminary work [71] and the personalized training strategy is adopted in this work (marked with ‘-P’, e.g., GA-DTCDR-P).

3.5.1 Preliminary Training

In our preliminary work [71], we train our models with the following objective function in domain :

(5)

where

is a loss function between an observed interaction

and its corresponding predicted interaction (see Eq. (7)), and denote all the observed and the unobserved user-item interactions in domain respectively, is the regularizer (see Eq. (8)), is a hyper-parameter which controls the importance of the regularizer, and is the parameter set. To avoid our model over-fitted to (positive instances), we randomly select a certain number of unobserved user-item interactions as negative instances, denoted by , to replace . This training strategy has been widely used in the existing approaches [9].

Unlike the unified loss functions in [52, 17, 30], we train our recommendation model in each domain respectively and parallelly, which focuses on specifically improving the recommendation accuracy in each of the domains.

Based on rating information, the user-item interaction between a user and an item can be represented as:

(6)

We choose a normalized cross-entropy loss which can be represented as:

(7)

where is the maximum rating in a domain.

As shown in Neural Network Layers of Fig. 2, our GA sub-frameworks employ a neural network, i.e., MLP, to represent a non-linear relationship between users and items. The input embedding matrices of users and items in domain for the MLP are and respectively, where is the combined embedding matrix of common users for domain , and is the embedding matrix of distinct users in domain . Therefore the embedding of user and item embedding of item in the output layer of the MLP can be represented as:

(8)

where the activation function

is ReLU, and are the weights of multi-layer networks in different layers in domain for and , respectively.

Finally, in Output Layer of Fig. 2, the predicted interaction between and in domain is as follows:

(9)

Compared with the conventional inner product, the biggest advantage of cosine distance for interaction prediction is that it does not need to normalize separately.

Similarly, we can train our models in each system.

3.5.2 Personalized Training

Although we have adopted the element-wise attention to combine the embeddings of common entities (users/items) from different datasets (domains/systems), our preliminary training strategy still suffers from the negative transfer problem. Especially in multi-target CDR and CDR+CSR scenarios, the recommendation performance may decline as more sparser datasets join in.

Inspired by the optimization problem in [55], we propose a personalized objective function for our GA framework. This personalized training strategy first gives trainable weights on the pair-wise embedding differences of common entities between every two datasets, and then minimizes both local loss (see Eq. (7)) and these pair-wise embedding differences. The embeddings of common entities in different datasets tend to be similar but remain good personalization. Therefore, this personalized strategy can avoid negative transfer to some extent. The objective function in dataset is represented as follows:

(10)

where is the normalized corss-entropy loss (see Eq. (7)), is the attention-inducing function, which measures the embedding difference in a non-linear manner, ( or ) is the embeddings of common entities in dataset . We adopt the negative exponential funciton, i.e., with a hyper-parameter , which has been widely-used in many existing personalized approaches [14]. Additionally, the existing personalized approaches tend to choose a fixed hyper-parameter to control the weight on embedding difference. But in our objective function, we use a set of trainable variables (, e.g., is the weight for the embedding difference between datasets and ), to train suitable weights on the embedding differences. These trainable weights can effectively control the importance of pair-wise embedding difference of common entities and thus the trained recommendation model in each dataset can achieve good personalization. Therefore, our GA-DTCDR-P, GA-MTCDR-P, and GA-CDR+CSR-P can alleviate negative transfer by using this personalized training strategy.

4 Experiments and Analysis

We conduct extensive experiments on four real-world datasets to answer the following key questions:

  • [leftmargin=*]

  • Q1: How do our GA models (GA-DTCDR-P, GA-MTCDR-P, and GA-CDR+CSR-P) perform when compared with the state-of-the-art models (see Result 1)?

  • Q2: How do the element-wise attention mechanism and personalized training strategy contribute to performance improvement (see Result 2)?

  • Q3: How does the dimension of embeddings affect the performance of our models (see Result 3)?

  • Q4: How do our models perform on Top- recommended lists (see Result 4)?

  • Q5: How do the data sparsity and the scale of overlap affect the performance of our models (see Result 5)?

4.1 Experimental Settings

Datasets Douban MovieLens
Domains Book Music Movie Movie
#Users 2,110 1,672 2,712 10,000
#Items 6,777 5,567 34,893 9,395
#Interactions 96,041 69,709 1,278,401 1,462,905
Density 0.67% 0.75% 1.35% 1.56%
Tasks Sparser Richer Overlap
CDR Task 1 DoubanBook DoubanMovie #Common Users = 2,106
Task 2 DoubanMusic DoubanMovie #Common Users = 1,666
CSR Task 3 DoubanMovie MovieLens #Common Items = 4,115
Tasks Domains/Systems
MTCDR Task 4 DoubanBook+DoubanMusic+DoubanMovie #Common Users = 1,662
CDR+CSR Task 5 DoubanBook+DoubanMovie+MovieLens #Common Users (DoubanBook+DoubanMovie)= 2,106 #Common Items (DoubanMovie+MovieLens) = 4,115
TABLE II: Experimental datasets and tasks
Model Training Data
Encoding
Embedding
Transfer Strategy
Baselines
Single-Domain
Recommendation (SDR)
NeuMF [9] Rating One-hot Non-linear MLP -
DMF [53] Rating Rating Vector Non-linear MLP -
Single-Target
Cross-Domain
Recommendation
(CDR)
CTR-RBF [52] Rating & Content Topic Modeling Linear MF
Mapping &
Transfer Learning
BPR_DCDCSR [70]
Rating Random Initialization Linear MF Combination & MLP
TMH [12]
Rating & Content One-hot Non-linear MLP
Mapping & Transfer
Learning & Attention
Dual-Target CDR
DMF_DTCDR_Concat [69]
Rating & Content Rating Vector Non-linear MLP
Multi-task Learning
& Concatenation
DDTCDR [25]
Rating One-hot & Multi-hot Non-linear MLP
Dual Transfer Learning
Our
Methods
Dual-Target CDR
GA-DTCDR_Average [71]
(a variant of GA-DTCDR
for ablation study)
Rating & Content Heterogeneous Graph Graph Embedding
Combination
(Average-Pooling)
GA-DTCDR [71]
(our prior work,
preliminary training)
Rating & Content Heterogeneous Graph Graph Embedding
Combination
(Element-wise Attention)
GA-DTCDR-P
(personalized training)
Rating & Content Heterogeneous Graph Graph Embedding
Element-wise Attention
& Personalization
Multi-Target CDR
GA-MTCDR-P
(personalized training)
Rating & Content Heterogeneous Graph Graph Embedding
Element-wise Attention
& Personalization
CDR+CSR
GA-CDR+CSR-P
(personalized training)
Rating & Content Heterogeneous Graph Graph Embedding
Element-wise Attention
& Personalization
TABLE III: The comparison of the baselines and our methods

4.1.1 Experimental Datasets and Tasks

To validate the recommendation performance of our GA approaches and baseline approaches, we choose four real-world datasets, i.e., three Douban subsets (DoubanBook, DoubanMusic, and DoubanMovie) [69], and MovieLens 20M [7]. For the three Douban subsets, we retain the users and items with at least interactions each user, while for MovieLens 20M, we extract a MovieLens subset containing users with at least interactions each user as well. This filtering strategy has been widely used in the existing approaches [57, 69]. The three Douban subsets contain ratings, reviews, tags, user profiles, and item details while MovieLens contains ratings, tags, and item details. Based on these four datasets, we design two CDR tasks (Task 1 & 2 in Table II) and one CSR task (Task 3) to validate the recommendation performance in dual-target CDR and CSR scenarios, respectively. In addition, we design one MTCDR task (Task 4) and one CDR+CSR task (Task 5) to validate the recommendation performance in multi-target CDR and CDR+CSR scenarios, respectively. We list the dataset statistics and designed tasks in Table II.

Task
Domain
(R: Richer
S: Sparser)
SDR Baselines
Single-Target CDR Baselines
DTCDR Baselines
Our DTCDR
(our prior works)
Our DTCDR
Improvement
(GA-DTCDR-P
vs. best baselines)
NeuMF DMF CTR-RBF
BPR
_DCDCSR
TMH
DMF_DTCDR
_Concat
DDTCDR
GA-DTCDR
_Average
GA-DTCDR GA-DTCDR-P
HR NDCG HR NDCG HR NDCG HR NDCG HR NDCG HR NDCG HR NDCG HR NDCG HR NDCG HR NDCG HR NDCG
Task1
()
DoubanBook (S) .3810 .2151 .3841 .2265 .3830 .2217 .3954 .2419 .4199 .2583* .4412* .2571 .4033 .2257 .4057 .2513 .4479 .2759 .4481 .2766 1.56%  7.08%
DoubanMovie (R) .5266 .2911 .5498 .3114 -  - -  - -  - .6032* .3732* .5612 .3185 .5968 .3546 .6518 .4025 .6536 .4059 8.35%  8.76%
Task 1
()
DoubanBook (S) .3833 .2181 .3854 .2356 .3870 .2256 .4014 .2413 .4331 .2522* .4408* .2513 .4054 .2292 .4190 .2577 .4706 .2900 .4718 .2909 7.03%  15.34%
DoubanMovie (R) .5282 .2939 .5573 .3141 -  - -  - -  - .6080* .3721* .5750 .3595 .6013 .3596 .6566 .4014 .6582 .4043 8.26%  8.65%
Task 1
()
DoubanBook (S) .3899 .2182 .3871 .2340 .3956 .2264 .4079 .2436 .4468* .2647* .4318 .2461 .4180 .2344 .4346 .2610 .4758 .2896 .4771 .2899 6.78%  9.52%
DoubanMovie (R) .5411 .2991 .5612 .3254 -  - -  - -  - .6011* .3718* .5739 .3386 .6374 .3896 .6747 .4187 .6742 .4277 12.16%  15.03%
Task 1
()
DoubanBook (S) .3908 .2226 .3917 .2362 .4017 .2314 .4107 .2454 .4504* .2768* .4265 .2452 .4258 .2430 .4423 .2671 .4882 .3026 .4891 .3131 8.60%  13.11%
DoubanMovie (R) .5449 .3152 .5632 .3387 -  - -  - -  - .5998* .3649* .5825 .3553 .6416 .3941 .6817 .4205 .6802 .4249 13.40%  16.44%
Task 1
()
DoubanBook (S) .4012 .2310 .4046 .2451 .4171 .2532 .4111 .2431 .4523* .2814* .4317 .2510 .4225 .2439 .4490 .2691 .4995 .3098 .5011 .3121 10.79%  10.91%
DoubanMovie (R) .5512 .3301 .5776 .3505 -  - -  - -  - .5991* .3680* .5863 .3589 .6449 .3981 .6957 .4406 .6942 .4391 15.87%  19.32%
Task 2
()
DoubanMusic (S) .3135 .1703 .3127 .1812 .3227 .1895 .3259 .1894 .3579 .2034 .3614* .2117* .3302 .1930 .3690 .2109 .3852 .2166 .3871 .2231 7.11%  5.38%
DoubanMovie (R) .5266 .2911 .5498 .3114 -  - -  - -  - .5873* .3867* .5655 .3629 .5987 .3731 .6470 .3983 .6473 .4008 10.22%  3.65%
Task 2
()
DoubanMusic (S) .3190 .1731 .3170 .1891 .3121 .1761 .3261 .1901 .3612 .2137 .3663* .2213* .3451 .2092 .3706 .2037 .3947 .2256 .3976 .2330 8.54%  5.29%
DoubanMovie (R) .5282 .2939 .5573 .3141 -  - -  - -  - .5887* .3863* .5704 .3676 .6058 .3716 .6426 .3950 .6463 .4001 9.78%  3.57%
Task 2
()
DoubanMusic (S) .3198 .1771 .3218 .1912 .3141 .1844 .3271 .1931 .3701* .2202* .3607 .2201 .3463 .2050 .3789 .2056 .4133 .2318 .4165 .2449 12.53%  11.22%
DoubanMovie (R) .5411 .2991 .5612 .3254 -  - -  - -  - .5770* .3758* .5739 .3726 .6145 .3754 .6677 .4141 .6672 .4121 15.63%  9.66%
Task 2
()
DoubanMusic (S) .3242 .1791 .3267 .1926 .3324 .1916 .3304 .2001 .3882* .2323* .3571 .2109 .3466 .2045 .3812 .2144 .4384 .2489 .4402 .2527 13.40%  8.78%
DoubanMovie (R) .5449 .3152 .5632 .3387 -  - -  - -  - .5787* .3705* .5719 .3621 .6120 .3681 .6817 .4284 .6811 .4289 17.69%  15.76%
Task 2
()
DoubanMusic (S) .3314 .1810 .3301 .1971 .3412 .1954 .3452 .2074 .3946* .2430* .3580 .2132 .3520 .2117 .3996 .2207 .4491 .2604 .4496 .2669 13.94%  9.84%
DoubanMovie (R) .5512 .3301 .5776 .3505 -  - -  - -  - .5792* .3742 .5748 .3762* .6311 .3859 .7068 .4526 .7053 .4533 21.77%  20.49%
Task 3
()
DoubanMovie (S) .5266 .2911 .5498 .3114 .5514 .3156 .5762 .3347 .5987 .3487 .6387* .3628* .6070 .3522 .6140 .3572 .6486 .4005 .6491 .4032 16.28%  11.34%
MovieLens (R) .7818 .5024 .8115 .5219 -  - -  - -  - .8328* .5293* .8211 .5283 .8225 .5241 .8541 .5372 .8584 .5388 3.07%  1.79%
Task 3
()
DoubanMovie (S) .5282 .2939 .5573 .3141 .5631 .3213 .5816 .3438 .6031 .3580 .6391* .3606* .6100 .3518 .6266 .3710 .6514 .4018 .6526 .4056 2.11%  12.48%
MovieLens (R) .7901 .5084 .8143 .5212 -  - -  - -  - .8312* .5260* .8263 .5170 .8280 .5277 .8547 .5381 .8542 .5376 2.76%  2.21%
Task 3
()
DoubanMovie (S) .5411 .2991 .5612 .3254 .5721 .3347 .5821 .3447 .6108 .3733* .6530* .3631 .6137 .3460 .6310 .3776 .6598 .4087 .6603 .4123 1.11%  10.45%
MovieLens (R) .7978 .5124 .8180 .5231* -  - -  - -  - .8243* .5213 .8111 .5167 .8301 .5280 .8612 .5478 .8614 .5488 4.50%  4.91%
Task 3
()
DoubanMovie (S) .5449 .3152 .5632 .3387 .5704 .3327 .5926 .3559 .6186 .3754* .6477* .3605 .6200 .3544 .6423 .3841 .6654 .4101 .6665 .4158 2.90%  10.76%
MovieLens (R) .7935 .5149 .8231* .5277 -  - -  - -  - .8200 .5382* .8130 .5198 .8324 .5320 .8668 .5516 .8654 .5532 5.14%  2.79%
Task 3
()
DoubanMovie (S) .5512 .3301 .5776 .3505 .5912 .3741 .6142 .3904 .6314 .3927* .6521* .3642 .6222 .3714 .6489 .3792 .6812 .4198 .6838 .4312 4.86%  9.80%
MovieLens (R) .8042 .5205 .8319* .5344 -  - -  - -  - .8267 .5401* .8210 .5311 .8349 .5381 .8642 .5512 .8651 .5553 3.99%  2.81%
TABLE IV: The experimental results (HR@ & NDCG@) for Tasks 1, 2, and 3 (the best-performing baselines with results marked with * while our best-performing models with results marked with black body)

4.1.2 Parameter Setting

For a fair comparison, we optimize the parameters of our GA-DTCDR-P, GA-MTCDR-P, GA-CDR+CSR-P, and those of the baselines according to the parameter settings in their original papers. For Graph Embedding Layer of GA framework, we set the hyper-parameters of Doc2vec and Node2vec models as suggested in [23, 6], and the sampling probability as . In Neural Network Layers of GA framework, the structure of the layers is ‘

’, the parameters of the neural network are initialized as the Gaussian distribution

. For training our GA-DTCDR-P, GA-MTCDR-P, and GA-CDR+CSR-P, we randomly select negative instances for each observed positive instance into , adopt Adam [21]

to train the neural network, and set the maximum number of training epochs to

. The learning rate is , the regularization coefficient is , and the batch size is . To answer Q3, the dimension of the embedding varies in .

4.1.3 Evaluation Metrics

To evaluate the recommendation performance of our GA-DTCDR-P, GA-MTCDR-P, GA-CDR+CSR-P models, and baseline models, we adopt the ranking-based evaluation strategy, i.e., leave-one-out evaluation, which has been widely used in the literature [53, 49]. For each test user, we choose the latest interaction with a test item as the test interaction and randomly sample unobserved interactions for the test user, and then rank the test item among the items. Leave-one-out evaluation includes two main metrics, i.e., Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG) [49]. HR@ is the recall rate while NDCG@ measures the specific ranking quality that assigns high scores to hits at top position ranks. Note that we only report HR@ and NDCG@ results in Results 1-3, and HR@ and NDCG@ results in Result 4.

4.1.4 Comparison Methods

As shown in Table III, we compare our GA models with seven baseline models in three groups, i.e., (1) Single-Domain Recommendation (SDR), (2) Single-Target Cross-Domain Recommendation (CDR), and (3) Dual-Target CDR. All seven baselines are representative and/or state-of-the-art approaches for each group. Also, for the ablation study, in addition to GA-DTCDR and GA-DTCDR-P, we implement a simplified version of GA-DTCDR, i.e., GA-DTCDR_Average (replacing element-wise attention with a fixed combination strategy, i.e., average-pooling). For a clear comparison, in Table III, we list the detailed training data types, encoding strategies, embedding strategies, and transfer strategies of all the models implemented in the experiments.

Domain
HR NDCG HR NDCG HR NDCG HR NDCG HR NDCG
DoubanBook .4493 .2813 .4721 .2913 .4805 .3084 .4903 .3158 .5022 .3161
DoubanMusic .3889 .2274 .3995 .2419 .4174 .2452 .4377 .2598 .4429 .2671
DoubanMoive .6496 .4042 .6521 .4041 .6761 .4280 .6831 .4314 .7064 .4542
TABLE V: The experimental results of GA-MTCDR-P for Task 4
Domain/System
HR NDCG HR NDCG HR NDCG HR NDCG HR NDCG
DoubanBook .4503 .2794 .4701 .2877 .4781 .3063 .4884 .3121 .4991 .3187
DoubanMoive .6493 .4026 .6517 .4033 .6595 .4098 .6750 .4272 .6916 .4435
MoiveLens .8512 .5249 .8533 .5303 .8594 .5486 .8685 .5536 .8632 .5527
TABLE VI: The experimental results GA-CDR+CSR-P for Task 5

4.2 Performance Comparison and Analysis

(a) DoubanBook (HR@)
(b) DoubanBook (NDCG@)
(c) DoubanMovie (HR@)
(d) DoubanMovie (NDCG@)
Fig. 5: The result of Top- recommendation for Task 1 ()
Datasets DoubanBook
Versions v1 v2 v3 v4
#Users 1,413 1,050 583 276
#Items 6,777 6,777 6,775 6,739
#Interactions 93,165 88,053 72,767 51,581
Density 0.97% 1.24% 1.84% 2.77%
HR@  NDCG@ .4746   .2928 .4827   .3021 .4911   .3097 .4860   .3077
Datasets DoubanMusic
Versions v1 v2 v3 v4
#Users 909 634 356 185
#Items 5,567 5,567 5,557 5,543
#Interactions 66,996 63,147 54,390 42,379
Density 1.32% 1.79% 2.75% 4.13%
HR@  NDCG@ .4361   .2502 .4457  .2590 .4505   .2617 .4448   .2581
Datasets DoubanMovie
Versions v1 v2 v3 v4
#Users 2,589 2,514 2,337 2,060
#Items 9,555 9,555 9,555 9,555
#Interactions 1,132,973 1,131,899 1,125,831 1,104,814
Density 4.58% 4.71% 5.04% 5.61%
HR@  NDCG@ .6822   .4234 .6836   .4246 .6910   .4345 .6815   .4232

#Common Users
727 444 198 58
TABLE VII: The experimental results of GA-MTCDR-P () for Task 4 on different sparsity degrees of sub-datasets (density = - sparsity)

4.2.1 Result 1: Performance Comparison (for Q1)

To answer Q1, we compare the performance of our GA-DTCDR-P with those of the seven baseline models. Note that for the SDR baselines, we train them in each domain and then report their performance in each domain; for the single-target CDR baselines, we train them in both domains and then only report their performance on the sparser domain; and for the dual-target CDR models, we train them in both domains and then report their performance in each domain.

Table IV shows the experimental results in terms of HR@10 and NDCG@10 with different embedding dimensions for Tasks 1, 2, and 3, respectively. As indicated in Table IV, our GA-DTCDR-P outperforms all the SDR, single-target CDR, and dual-target CDR baselines by an average improvement of 9.04%. In particular, our GA-DTCDR-P improves the best-performing baselines (with results marked by * in Table IV) by an average of 10.85% for Task 1, an average of 11.21% for Task 2, and an average of 5.06% for Task 3. This is because our GA-DTCDR-P effectively leverages the richness and diversity of the information in both domains, and intelligently and effectively combines the embeddings of common users.

Tables V and VI show the experimental results of our GA-MCDR-P and GA-CDR+CSR-P with different embedding dimensions for Tasks 4 and 5, respectively. Compared with the results of the seven baselines in Table IV, our GA-MTCDR-P and GA-CDR+CSR-P can improve the best-performing baselines by an average of 9.21% (the general improvement of our GA-DTCDR-P is 9.04%). This means that, in general, the recommendation performance in all datasets improves as more datasets join in, and hence avoiding negative transfer to some extent.

4.2.2 Result 2: Ablation Study (for Q2)

To answer Q2, we implement a variant of our preliminary GA-DTCDR, i.e., GA-DTCDR_Average, by replacing the element-wise attention with average-pooling, which can demonstrate the detailed contribution of the element-wise attention in our GA models. Average-pooling is the combination strategy used by the existing dual-target CDR approaches [69], which gives the weight equally, i.e., , to the embeddings of common users learned from dual domains. Additionally, to demonstrate the detailed contribution of our proposed personalized training strategy, we also compare the performance of GA-DTCDR with that of GA-DTCDR-P in this section.

On the one hand, as we can see from Table IV, with the element-wise attention, our preliminary GA-DTCDR improves GA-DTCDR_Average by an average of 6.76%. This means that element-wise attention plays a very important role in our GA-DTCDR and the existing fixed combination strategies can hardly achieve an effective embedding optimization in each target dataset.

On the other hand, compared with our preliminary GA-DTCDR, our GA-DTCDR-P achieves an average improvement of 0.54% (according to the results in Table IV). This result indicates that our personalized training strategy can further improve the recommendation accuracy of the baselines and our DTCDR models and can alleviate negative transfer.

4.2.3 Result 3: Impact of Embedding Dimension (for Q3)

To answer Q3, we analyze the effect of on the performance of our preliminary GA-DTCDR, GA-DTCDR-P, GA-MTCDR-P, and GA-CDR+CSR-P, as depicted in Tables IV, V, and VI. In general, in terms of HR@ and NDCG@, the recommendation accuracy of our GA models increases with because a larger embedding can represent a user/item more accurately. However, considering the structure of the neural network layers in Parameter Setting, the training time of our GA models also increases with . This is a trade-off. Therefore, considering both aspects, is ideal in our experiments.

4.2.4 Result 4: Top- Recommendation (for Q4)

To answer Q4, we compare the performance of top- recommendation in terms of HR@ and NDCG@ where ranges from to . In fact, the performance trends of all top- experiments (for all the tasks with different ) are similar. Thus, due to space limitation, we only report the Top- recommendation results of all the seven baseline models, GA-DTCDR_Average, and GA-DTCDR for Task 1 (). In Fig. 5, in both DoubanBook (sparser) and DoubanMovie (richer), the performance of our preliminary GA-DTCDR is consistently better than those of all the seven baselines. On DoubanBook, considering all the Top- recommendations, our preliminary GA-DTCDR improves the best-performing baselines in different experimental cases by an average of 1.74% for HR@, and by an average of 5.83% for NDCG@, while on DoubanMovie, our preliminary GA-DTCDR improves the best-performing baselines in different experimental cases by an average of 8.13% for HR@, and by an average of 7.55% for NDCG@.

4.2.5 Result 5: Impact of Sparsity and Overlap Scale (for Q5)

To answer Q5, we extract different sparsity degrees of sub-datasets to analyze the effect of sparsity and overlap scale on the performance of our models. Due to space limitations, we only report the experimental results of GA-MTCDR-P () for Task 4 on different sub-datasets in Table VII. From it, we find that, in general, the recommendation performance increases with densities of sub-datasets. However, the recommendation results in Dataset Version v3 are better than those in Dataset Version v4. This is because the number of common users, i.e., overlap scale, significantly decreases with the increase of density. Therefore, according to the experimental results, in general, the recommendation performance of our GA-MTCDR-P increases with density (i.e., decreases with sparsity) and overlap scale.

5 Conclusion and Future Work

In this paper, we have proposed a unified framework, called GA (based on Graph embedding and Attention techniques), for all dual-target CDR (GA-DTCDR-P), multi-target CDR (GA-MTCDR-P), and CDR+CSR (GA-CDR+CSR-P) scenarios. In our GA framework, the element-wise attention mechanism and the personalized training strategy effectively improve the recommendation accuracy in all datasets and avoid negative transfer to some extent. Also, we have conducted extensive experiments to demonstrate the superior performance of our proposed GA models. In the future, we plan to take more training strategies and further alleviate negative transfer.

References

  • [1] D. Bahdanau, K. Cho, and Y. Bengio (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. Cited by: §2.4, §3.4.
  • [2] S. Berkovsky, T. Kuflik, and F. Ricci (2007) Cross-domain mediation in collaborative filtering. In International Conference on User Modeling, pp. 355–359. Cited by: §1.1, §1.2, 1st item.
  • [3] J. Chen, H. Zhang, X. He, L. Nie, W. Liu, and T. Chua (2017) Attentive collaborative filtering: multimedia recommendation with item-and component-level attention. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 335–344. Cited by: §2.4.
  • [4] I. Fernández-Tobías and I. Cantador (2014) Exploiting social tags in matrix factorization models for cross-domain collaborative filtering.. In CBRecSys@ RecSys, pp. 34–41. Cited by: §1.2.
  • [5] W. Fu, Z. Peng, S. Wang, Y. Xu, and J. Li (2019) Deeply fusing reviews and contents for cold start users in cross-domain recommendation systems. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    ,
    Vol. 33, pp. 94–101. Cited by: §1.1, 2nd item.
  • [6] A. Grover and J. Leskovec (2016) Node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864. Cited by: §2.3, 2nd item, §3.3.3, §4.1.2.
  • [7] F. M. Harper and J. A. Konstan (2016) The movielens datasets: history and context. Acm transactions on interactive intelligent systems (TIIS) 5 (4), pp. 19. Cited by: §4.1.1.
  • [8] M. He, J. Zhang, P. Yang, and K. Yao (2018) Robust transfer learning for cross-domain collaborative filtering using multiple rating patterns approximation. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 225–233. Cited by: 2nd item.
  • [9] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T. Chua (2017) Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web, pp. 173–182. Cited by: §1.2, §1.4, §3.5.1, TABLE III.
  • [10] B. Hu, C. Shi, W. X. Zhao, and P. S. Yu (2018)

    Leveraging meta-path based context for top-n recommendation with a neural co-attention model

    .
    In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1531–1540. Cited by: §2.4.
  • [11] G. Hu, Y. Zhang, and Q. Yang (2018) Conet: collaborative cross networks for cross-domain recommendation. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 667–676. Cited by: 2nd item.
  • [12] G. Hu, Y. Zhang, and Q. Yang (2019) Transfer meets hybrid: a synthetic approach for cross-domain collaborative filtering with text. In The World Wide Web Conference, pp. 2822–2829. Cited by: §1.2, 2nd item, TABLE III.
  • [13] L. Huang, Z. Zhao, C. Wang, D. Huang, and H. Chao (2019) LSCD: low-rank and sparse cross-domain recommendation. Neurocomputing 366, pp. 86–96. Cited by: §1.2, 2nd item.
  • [14] Y. Huang, L. Chu, Z. Zhou, L. Wang, J. Liu, J. Pei, and Y. Zhang (2020) Personalized federated learning: an attentive collaboration approach. arXiv preprint arXiv:2007.03797. Cited by: §3.5.2.
  • [15] M. Jiang, P. Cui, X. Chen, F. Wang, W. Zhu, and S. Yang (2015) Social recommendation with cross-domain transferable knowledge. IEEE transactions on knowledge and data engineering 27 (11), pp. 3084–3097. Cited by: §1.1, 1st item, §3.3.2.
  • [16] M. Jiang, P. Cui, F. Wang, Q. Yang, W. Zhu, and S. Yang (2012) Social recommendation across multiple relational domains. In Proceedings of the 21st ACM international conference on Information and knowledge management, pp. 1422–1431. Cited by: §1.1, 1st item, §3.3.2.
  • [17] M. Jiang, P. Cui, N. J. Yuan, X. Xie, and S. Yang (2016) Little is much: bridging cross-platform behaviors through overlapped crowds. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30. Cited by: 1st item, §3.5.1.
  • [18] H. Kanagawa, H. Kobayashi, N. Shimizu, Y. Tagami, and T. Suzuki (2019) Cross-domain recommendation via deep domain adaptation. In European Conference on Information Retrieval, pp. 20–29. Cited by: 1st item.
  • [19] S. Kang, J. Hwang, D. Lee, and H. Yu (2019)

    Semi-supervised learning for cross-domain recommendation to cold-start users

    .
    In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1563–1572. Cited by: 2nd item.
  • [20] Y. Kang, S. Gai, F. Zhao, D. Wang, and A. Tang (2020) Deep transfer collaborative filtering with geometric structure preservation for cross-domain recommendation. In 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. Cited by: §1.2, 2nd item.
  • [21] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.1.2.
  • [22] J. B. Kruskal (1978) Multidimensional scaling. Sage. Cited by: §2.3.
  • [23] Q. Le and T. Mikolov (2014) Distributed representations of sentences and documents. In ICML, pp. 1188–1196. Cited by: §3.3.1, §4.1.2.
  • [24] L. Li, Q. Do, and W. Liu (2019) Cross-domain recommendation via coupled factorization machines. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 9965–9966. Cited by: §1.2, 2nd item.
  • [25] P. Li and A. Tuzhilin (2019) DDTCDR: deep dual transfer cross domain recommendation. arXiv preprint arXiv:1910.05189. Cited by: §1.2, §1.4, §2.2, §2.2, TABLE III.
  • [26] P. Li and A. Tuzhilin (2021) Dual metric learning for effective and efficient cross-domain recommendations. IEEE Transactions on Knowledge and Data Engineering. Cited by: §1.2, §2.2, §2.2.
  • [27] B. Liu, Y. Wei, Y. Zhang, Z. Yan, and Q. Yang (2018) Transferable contextual bandit for cross-domain recommendation. In Thirty-Second AAAI Conference on Artificial Intelligence, pp. 3619–3626. Cited by: 2nd item.
  • [28] G. Liu, Y. Liu, K. Zheng, A. Liu, Z. Li, Y. Wang, and X. Zhou (2017)

    MCS-gpm: multi-constrained simulation based graph pattern matching in contextual social graphs

    .
    IEEE Transactions on Knowledge and Data Engineering 30 (6), pp. 1050–1064. Cited by: §1.2.
  • [29] J. Liu, P. Zhao, F. Zhuang, Y. Liu, V. S. Sheng, J. Xu, X. Zhou, and H. Xiong (2020) Exploiting aesthetic preference in deep cross networks for cross-domain recommendation. In Proceedings of The Web Conference 2020, pp. 2768–2774. Cited by: 2nd item.
  • [30] M. Liu, J. Li, G. Li, and P. Pan (2020) Cross domain recommendation via bi-directional transfer graph collaborative filtering networks. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 885–894. Cited by: §1.4, §2.2, §2.2, §3.5.1.
  • [31] Y. Lu, R. Dong, and B. Smyth (2018) Why i like it: multi-task learning for recommendation and explanation. In Proceedings of the 12th ACM Conference on Recommender Systems, pp. 4–12. Cited by: 2nd item.
  • [32] T. Man, H. Shen, X. Jin, and X. Cheng (2017) Cross-domain recommendation: an embedding and mapping approach. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 2464–2470. Cited by: §1.2, 2nd item, §3.1.
  • [33] C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky (2014) The Stanford CoreNLP toolkit. In ACL System Demonstrations, pp. 55–60. Cited by: §3.3.1.
  • [34] J. Manotumruksa, D. Rafailidis, C. Macdonald, and I. Ounis (2019) On cross-domain transfer in venue recommendation. In European Conference on Information Retrieval, pp. 443–456. Cited by: §1.2, 2nd item.
  • [35] O. Moreno, B. Shapira, L. Rokach, and G. Shani (2012) Talmud: transfer learning for multiple domains. In Proceedings of the 21st ACM international conference on Information and knowledge management, pp. 425–434. Cited by: §1.3.
  • [36] S. J. Pan and Q. Yang (2009) A survey on transfer learning. TKDE 22 (10), pp. 1345–1359. Cited by: §1.3.
  • [37] W. Pan and Q. Yang (2013) Transfer learning in heterogeneous collaborative filtering domains. Artificial intelligence 197, pp. 39–55. Cited by: §1.3, §2.1.
  • [38] B. Perozzi, R. Al-Rfou, and S. Skiena (2014) Deepwalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710. Cited by: §2.3.
  • [39] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme (2009) BPR: bayesian personalized ranking from implicit feedback. In UAI, pp. 452–461. Cited by: §1.2, §1.4.
  • [40] S. Sahebi and T. Walker (2014) Content-based cross-domain recommendations using segmented models.. In CBRecSys@ RecSys, pp. 57–64. Cited by: 1st item.
  • [41] B. Shapira, L. Rokach, and S. Freilikhman (2013) Facebook single and cross domain data for recommendation systems. User Modeling and User-Adapted Interaction 23 (2-3), pp. 211–247. Cited by: §1.1, 1st item.
  • [42] Y. Shi, M. Larson, and A. Hanjalic (2011) Tags as bridges between domains: improving recommendation with tag-induced cross-domain collaborative filtering. In International Conference on User Modeling, Adaptation, and Personalization, pp. 305–316. Cited by: 1st item.
  • [43] S. Sopchoke, K. Fukui, and M. Numao (2018) Explainable cross-domain recommendations through relational learning. In Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: 2nd item.
  • [44] S. Tan, J. Bu, X. Qin, C. Chen, and D. Cai (2014) Cross domain recommendation based on multi-type media fusion. Neurocomputing 127, pp. 124–134. Cited by: §1.2, 1st item.
  • [45] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei (2015) Line: large-scale information network embedding. In Proceedings of the 24th international conference on world wide web, pp. 1067–1077. Cited by: §2.3.
  • [46] J. Tang, S. Wu, J. Sun, and H. Su (2012) Cross-domain collaboration recommendation. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1285–1293. Cited by: §1.1.
  • [47] Y. Tay, A. T. Luu, and S. C. Hui (2018) Multi-pointer co-attention networks for recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2309–2318. Cited by: §2.4.
  • [48] J. Wang and J. Lv (2020) Tag-informed collaborative topic modeling for cross domain recommendations. Knowledge-Based Systems 203, pp. 106–119. Cited by: 1st item.
  • [49] X. Wang, X. He, Y. Cao, M. Liu, and T. Chua (2019)

    Kgat: knowledge graph attention network for recommendation

    .
    In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 950–958. Cited by: §2.4, §4.1.3.
  • [50] Y. Wang, C. Feng, C. Guo, Y. Chu, and J. Hwang (2019) Solving the sparsity problem in recommendations via cross-domain item embedding based on co-clustering. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 717–725. Cited by: 2nd item.
  • [51] S. Wold, K. Esbensen, and P. Geladi (1987) Principal component analysis. Chemometrics and intelligent laboratory systems 2 (1-3), pp. 37–52. Cited by: §2.3.
  • [52] X. Xin, Z. Liu, C. Lin, H. Huang, X. Wei, and P. Guo (2015) Cross-domain collaborative filtering with review text. In Twenty-Fourth International Joint Conference on Artificial Intelligence, pp. 1827–1834. Cited by: §3.5.1, TABLE III.
  • [53] H. Xue, X. Dai, J. Zhang, S. Huang, and J. Chen (2017) Deep matrix factorization models for recommender systems. In Proceedings of the 26th international joint conference on artificial intelligence, pp. 3203–3209. Cited by: §1.2, §1.4, §4.1.3, TABLE III.
  • [54] S. Yan, D. Xu, B. Zhang, and H. Zhang (2005) Graph embedding: a general framework for dimensionality reduction. In

    2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)

    ,
    Vol. 2, pp. 830–837. Cited by: §2.3.
  • [55] L. Yang, B. Tan, V. W. Zheng, K. Chen, and Q. Yang (2020) Federated recommendation systems. In Federated Learning, pp. 225–239. Cited by: §3.5.2.
  • [56] Y. Yang, Z. Guan, J. Li, J. Huang, and W. Zhao (2020) Interpretable and efficient heterogeneous graph convolutional network. arXiv preprint arXiv:2005.13183. Cited by: §2.3.
  • [57] F. Yuan, L. Yao, and B. Benatallah (2019) DARec: deep domain adaptation for cross-domain recommendation via transferring rating patterns. arXiv preprint arXiv:1905.10760. Cited by: 2nd item, §4.1.1.
  • [58] Q. Zhang, P. Hao, J. Lu, and G. Zhang (2019) Cross-domain recommendation with semantic correlation in tagging systems. In 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. Cited by: 1st item.
  • [59] Q. Zhang, J. Lu, and G. Zhang (2020) Cross-domain recommendation with multiple sources. In 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–7. Cited by: §2.1.
  • [60] Q. Zhang, D. Wu, J. Lu, F. Liu, and G. Zhang (2017) A cross-domain recommender system with consistent information transfer. Decision Support Systems 104, pp. 49–63. Cited by: §1.2.
  • [61] Q. Zhang, D. Wu, J. Lu, and G. Zhang (2018) Cross-domain recommendation with probabilistic knowledge transfer. In International Conference on Neural Information Processing, pp. 208–219. Cited by: §1.2, 2nd item.
  • [62] Y. Zhang, B. Cao, and D. Yeung (2012) Multi-domain collaborative filtering. arXiv preprint arXiv:1203.3535. Cited by: §1.3, §2.1.
  • [63] Z. Zhang, X. Jin, L. Li, G. Ding, and Q. Yang (2016)

    Multi-domain active learning for recommendation

    .
    In Thirtieth AAAI Conference on Artificial Intelligence, pp. 2358–2364. Cited by: §1.2, §1.3, §2.1.
  • [64] C. Zhao, C. Li, R. Xiao, H. Deng, and A. Sun (2020) CATN: cross-domain recommendation for cold-start users via aspect transfer network. arXiv preprint arXiv:2005.10549. Cited by: §1.2, 2nd item.
  • [65] J. Zhao, Z. Zhou, Z. Guan, W. Zhao, W. Ning, G. Qiu, and X. He (2019) Intentgc: a scalable graph convolution framework fusing heterogeneous information for recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2347–2357. Cited by: §2.3.
  • [66] L. Zhao, S. J. Pan, E. W. Xiang, E. Zhong, Z. Lu, and Q. Yang (2013) Active transfer learning for cross-system recommendation. In Twenty-Seventh AAAI Conference on Artificial Intelligence, Cited by: §1.1.
  • [67] L. Zhao, S. J. Pan, and Q. Yang (2017) A unified framework of active transfer learning for cross-system recommendation. Artificial Intelligence 245, pp. 38–55. Cited by: §3.1, §3.2.
  • [68] J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, and M. Sun (2018) Graph neural networks: a review of methods and applications. arXiv preprint arXiv:1812.08434. Cited by: §2.3.
  • [69] F. Zhu, C. Chen, Y. Wang, G. Liu, and X. Zheng (2019) DTCDR: a framework for dual-target cross-domain recommendation. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1533–1542. Cited by: §1.2, §1.4, §2.2, §2.2, §3.2, §4.1.1, §4.2.2, TABLE III.
  • [70] F. Zhu, Y. Wang, C. Chen, G. Liu, M. A. Orgun, and J. Wu (2018) A deep framework for cross-domain and cross-system recommendations. In IJCAI International Joint Conference on Artificial Intelligence, pp. 3711–3717. Cited by: §1.1, §1.2, 2nd item, §3.1, §3.2, TABLE III.
  • [71] F. Zhu, Y. Wang, C. Chen, G. Liu, and X. Zheng (2020) A graphical and attentional framework for dual-target cross-domain recommendation. In 29th International Joint Conference on Artificial Intelligence, pp. 3001–3008. Cited by: 1st item, §1.2, §1.5, §3.5.1, §3.5, TABLE III.