1. Introduction
Recommender systems have become the key component for online marketplaces to achieve great success in understanding user preferences. Collaborative filtering (CF) (Sarwar et al., 2001) approaches, especially matrix factorization (MF) (Koren et al., 2009) methods constitute the corner stone to help user discover satisfying products. However, these models suffer the cold start and the data sparsity problems (Schein et al., 2002; Adomavicius and Tuzhilin, 2005), for we only have access to a small fraction of past transactions, which makes it hard to model user preferences accurately and efficiently.
To address these problems, researchers propose to use cross domain recommendation (FernándezTobías et al., ) through transfer learning (Pan and Yang, 2010) approaches that learn user preferences in the source domain and transfer them to the target domain. For instance, if a user watches a certain movie, we will recommend the original novel on which the movie is based to that user. Most of the methods along this line focus on the unidirectional transfer learning from the source domain to the target domain and achieve great recommendation performance (FernándezTobías et al., ). In addition, it is beneficial for us to also transfer user preferences in the other direction via dual transfer learning (Long et al., 2012; Zhong et al., 2009; Wang et al., 2011). For example, once we know the type of books that the user would like to read, we can recommend movies on related topics to form a loop for better recommendations in both domains (Li et al., 2009).
However, previous dual transfer models only focus on explicit information of users and items, without considering latent and complex relations between them. Besides, they do not include user and item features during the recommendation process, thus limiting the recommender system to provide satisfying results. Latent embedding methods, on the other hand, constitute a powerful tool to extract latent user preferences from the data record and model user and item features efficiently (Zhang et al., 2019). Therefore, it is crucially important to design a model that utilize latent embedding approach to conduct dual transfer learning for cross domain recommendations.
In this paper, we apply the idea of deep dual transfer learning to crossdomain recommendations using latent embeddings of user and item features. We assume that if two users have similar preferences in a certain domain, their preferences should also be similar across other domains as well. We address this assumption by proposing a unifying mechanism that extracts the essence of preference information in each domain and improves the recommendations for both domains simultaneously through better understanding of user preferences.
Specifically, we propose Deep Dual Transfer Cross Domain Recommendation (DDTCDR) model that learns latent orthogonal mappings across domains and provides cross domain recommendations leveraging user preferences from all domains. Furthermore, we empirically demonstrate that by iteratively updating the dual recommendation model, we simultaneously improve recommendation performance over both domains and surpass all baseline models. Compared to previous approaches, the proposed model has the following advantages:

It transfers latent representations of features and user preferences instead of explicit information between the source domain and the target domain to capture latent and complex interactions as well as modeling feature information.

It utilizes deep dual transfer learning mechanism to enable bidirectional transfer of user preferences that improves recommendation performance in both domains simultaneously over time.

It learns the latent orthogonal mapping function across the two domains, that is capable of preserving similarities of user preferences and computing the inverse mapping function efficiently.
In this paper, we make the following contributions:

We propose to apply the combination of dual transfer learning mechanism and latent embedding approach to the problem of cross domain recommendations.

We empirically demonstrate that the proposed model outperforms the stateoftheart approaches and improves recommendation accuracy across multiple domains and experimental settings.

We theoretically demonstrate the convergence condition for the simplified case of our model and empirically show that the proposed model stabilizes and converges after several iterations.

We illustrate that the proposed model can be easily extended for multipledomain recommendation applications.
2. Related Work
The proposed model stems from two research directions: cross domain recommendations and deeplearning based recommendations. Also, we discuss the literature on dual transfer learning, and how they motivate the combination of dual transfer learning mechanism and latent embedding methods for cross domain recommendations.
2.1. Cross Domain and Transfer Learningbased Recommendations
Cross domain recommendation approach (FernándezTobías et al., ) constitutes a powerful tool to deal with the data sparsity problem. Typical cross domain recommendation models are extended from singledomain recommendation models, including CMF (Singh and Gordon, 2008), CDCF (Li et al., 2009; Hu et al., 2013), CDFM (Loni et al., 2014), Canonical Correlation Analysis (Sahebi et al., 2017; Sahebi and Brusilovsky, 2015; Sahebi and Walker, 2014; Sahebi and Brusilovsky, 2013) and Dual Regularization (Wu et al., 2018). These approaches assume that different patterns characterize the way that users interact with items of a certain domain and allow interaction information from an auxiliary domain to inform recommendation in a target domain.
The idea of information fusion also motivates the use of transfer learning (Pan and Yang, 2010) that transfers extracted information from the source domain to the target domain. Specifically, researchers (Hu et al., 2018; Lian et al., 2017) propose to learn the user preference in the source domain and transfer the preference information into target domain for better understanding of user preferences. These models have achieved great success in addressing the coldstart problem and enhancing recommendation performance.
However, these models do not fundamentally address the relationship between different domains, for they do not improve recommendation performance of both domains simultaneously, thus might not release the full potential of utilizing the crossdomain user interaction information. Also they do not explicitly model user and item features during recommendation process. In this paper, we propose to use a novel dual transfer learning mechanism combining with autoencoder to overcome these issues and significantly improve recommendation performance.
2.2. Dual Transfer Learning
Transfer learning (Pan and Yang, 2010) deals with the situation where the data obtained from different resources are distributed differently. It assumes the existence of common knowledge structure that defines the domain relatedness, and incorporate this structure in the learning process by discovering a shared latent feature space in which the data distributions across domains are close to each other. The existing transfer learning methods for crossdomain recommendation include Collaborative DualPLSA (Zhuang et al., 2010), Joint Subspace Nonnegative Matrix Factorization (Liu et al., 2013) and JDA (Long et al., 2013) that learn the latent factors and associations spanning a shared subspace where the marginal distributions across domains are close.
In addition, to exploit the duality between these two distributions and to enhance the transfer capability, researchers propose the dual transfer learning mechanism (Long et al., 2012; Zhong et al., 2009; Wang et al., 2011) that simultaneously learns the marginal and conditional distributions. Recently, researchers manage to achieve great performance on machine translation with duallearning mechanism (He et al., 2016; Xia et al., 2017; Wang et al., 2018). All these successful applications address the importance of exploiting the duality for mutual reinforcement. However, none of them apply the dual transfer learning mechanism into crossdomain recommendation problems, where the duality lies in the symmetrical correlation between source domain and target domain user preferences. In this paper, we utilize a novel duallearning mechanism and significantly improve recommendation performance.
2.3. Deep Learningbased Recommendations
Recently, deep learning has been revolutionizing the recommendation architectures dramatically and brings more opportunities to improve the performance of existing recommender systems. To capture the latent relationship between users and items, researchers propose to use deep learning based recommender systems (Zhang et al., 2019; Wang et al., 2015), especially embedding methods (He et al., 2017) and autoencoding methods (Sedhain et al., 2015; Wu et al., 2016; Li and She, 2017; Liang et al., 2018) to extract the latent essence of useritem interactions for the better understanding of user preferences.
However, user preferences in different domains are learned separately without exploiting the duality for mutual reinforcement, for researchers have not addressed the combination of deep learning methods with dual transfer learning mechanism in recommender systems, which learns user preferences from different domains simultaneously and further improves the recommendation performance. To this end, we propose a dual transfer collaborative filtering model that captures latent interactions across different domains. The effectiveness of dual transfer learning over the existing methods is demonstrated by extensive experimental evaluation.
3. Method
In this section, we present the proposed Deep Dual Transfer Cross Domain Recommendation (DDTCDR) model. First, we construct feature embeddings using the autoencoding technique from the user and item features. Then we design a specific type of latent orthogonal mapping function for transferring feature embeddings across two different domains through dual transfer learning mechanism. Moreover, we theoretically demonstrate convergence of the proposed model under certain assumptions. The complete architecture of DDTCDR system is illustrated in Figure 1. As shown in Figure 1, we take the user and item features as input and map them into the feature embeddings, from which we compute the withindomain and crossdomain user preferences and provide recommendations. Important mathematical notations used in our model are listed in Table 1. We explain the details in the following sections.
Symbol  Description 

User Features  
Item Features  
User Feature Embeddings  
Item Feature Embeddings  
Learning Rate  
Estimated Ratings  
WithinDomain Estimated Ratings  
CrossDomain Estimated Ratings  
Latent Orthogonal Mapping  
Transpose of Latent Orthogonal Mapping  
DomainSpecific Recommender System  
DomainSpecific Autoencoder  
Hyperparameter in Hybrid Utility Function 
3.1. Feature Embeddings
To effectively extract latent user preferences and efficiently model features of users and items, we present an autoencoder framework that learns the latent representations of user and item features and transforms the heterogeneous and discrete feature vectors into continuous feature embeddings. We denote the feature information for user
as and the feature information for item as , where andstand for the dimensionality of user and item feature vectors respectively. The goal is to train two separate neural networks: encoder that maps feature vectors into latent embeddings, and decoder that reconstructs feature vectors from latent embeddings. Due to effectiveness and efficiency of the training process, we formulate both the encoder and the decoder as multilayer perceptron (MLP). MLP learns the hidden representations by optimization reconstruction loss
:(1) 
where and represents the MLP network for encoder and decoder respectively. Note that, in this step we train the autoencoder separately for users and items in different domains to avoid the information leak between domains.
3.2. Latent Orthogonal Mapping
In this section we introduce the latent orthogonal mapping function for transferring user preferences from the source domain to the target domain. The fundamental assumption of our proposed approach is that, if two users have similar preferences in the source domain, their preferences should also be similar in the target domain. This implies that we need to preserve similarities between users during transfer learning across domains. In particular, we propose to use a latent orthogonal matrix to transfer information across domains for two reasons. First, it preserves similarities between user embeddings across different latent spaces since orthogonal transformation preserves inner product of vectors. Second, it automatically derives the inverse mapping matrix as
, because holds for any given orthogonal mapping matrix , thus making inverse mapping matrix equivalent to its transpose. This simplifies the learning procedure and reduces the perplexity of the recommendation model.Using the latent orthogonal mapping, we could transfer user preferences from one domain to the other. In the next section, we will introduce model to utilize this latent orthogonal mapping for constructing dual learning mechanism for cross domain recommendations.
3.3. Deep Dual Transfer Learning
As an important tool to provide recommendations, matrix factorization methods associate useritem pairs with a shared latent space, and use the latent feature vectors to represent users and items. In order to model latent interactions between users and items as well as feature information, it is natural to generalize matrix factorization methods using latent embedding approaches. However, in the cross domain recommendation application, we may not achieve the optimal performance by uniformly applying neural network model to all domains of users and items due to the coldstart and sparsity problems.
To address these problems and improve performance of recommendations, we propose to combine dual transfer learning mechanism with cross domain recommendations, where we transfer user preferences between the source domain and the target domain simultaneously. Consider two different domains and that contain useritem interactions as well as user and item features. In real business practices, user group in often overlaps with , meaning that they have purchased items in both domains, while there is no overlap of item between two domains for each item only belongs to one single domain. One crucial assumption we make here is that, if two users have similar preferences in , they are supposed to have similar preferences in as well, indicating that we can learn and improve user preferences in both domains simultaneously over time. Therefore to obtain better understanding of user preferences in , we also utilize external user preference information from and combine them together. Similarly, we could get better recommendation performance in if we utilize user preference information from at the same step. To leverage duality of the two transfer learning based recommendation models and to improve effectiveness of both tasks simultaneously, we conduct dual transfer learning recommendations for the two models together and learn the latent mapping function accordingly.
Specifically, we propose to model user preferences using two components: withindomain preference that captures user interactions and predict user behaviors in the target domain and crossdomain preference that utilizes user actions from the source domain. We also introduce transfer rate as a hyperparameter, which represents the relative importance of the two components in prediction of user preferences. We propose to estimate user ratings in domain pairs as follows:
(2) 
(3) 
where represents user and item embeddings and stand for the neural recommendation model for domain A and B respectively. The first term in each equation computes withindomain user preferences from user and item features in the same domain, while the second term denotes crossdomain user preferences obtained by the latent orthogonal mapping function to capture heterogeneity of different domains. We use weighted linear combination for rating estimations in equations (4) and (5). When or there is no user overlap between two domains, the dual learning model degenerates to two separate singledomain recommendation models; when and , the dual learning model degenerates to one universal recommendation model across two domains. For users that only appear in one domain, the hyperparameter is selected as 0. For users that appear in both domains, normally should take a positive value between 0 and 0.2 based on the experimental results conducted in Section 6, which indicates that withindomain preferences play the major role in the understanding of user behavior.
As shown in Figure 1 and equations (4) and (5), dual transfer learning entails the transfer loop across two domains and the learning process goes through the loop iteratively. It is important to study the convergence property of the model, which we discuss in the next section.
3.4. Convergence Analysis
In this section, we present the convergence theorem of matrix factorization with dual transfer learning mechanism. We denote the rating matrices as for the two domains A and B respectively. The goal is to find the approximate nonnegative factorization matrices that simultaneously minimizes the reconstruction loss. We conduct multiplicative update algorithms (Lin, 2007) for the two reconstruction loss iteratively given , , and .
(4) 
Proposition 1 ().
Convergence of the iterative optimization of dual matrix factorization for rating matrices and in (6) is guaranteed.
Proof.
Combining the two parts of objective functions and eliminating term, we will get
(5) 
Similarly, when we eliminate term, we will get
(6) 
Based on the analysis in (Lee and Seung, 2001) for the classical singledomain matrix factorization, to show that the repeated iteration of update rules is guaranteed to converge to a locally optimal solution, it is sufficient to show that
where 0
stands for the zero matrix. Condition (a) is an intuitive condition which indicates that the information from the target domain dominantly determines the user preference, while information transferred from the source domain only serves as the regularizer in the learning period. To fulfill the seemingly complicated condition (b) and (c), we recall that
and are ratings matrices, which are nonnegative and bounded by the rating scale . We design two ”positive perturbation” and where is the rank of the mapping matrix . Condition (a) is independent of the specific rating matrices, and we could check that and satisfies condition (b) and (c). Thus, the matrix factorization of and is guaranteed convergence; to reconstruct the original matrix and , we only need to deduce the ”positive perturbation” term , so the original problem is also guaranteed convergence. ∎In this paper, to capture latent interactions between users and items, we use the neural network based matrix factorization approach, therefore it still remains unclear if this convergence happens to the proposed model. However, our hypothesis is that even in our case we will experience similar convergence process guaranteed in the classical matrix factorization case, as stated in the proposition. We test the hypothesis in Section 6.
3.5. Extension to Multiple Domains
In previous sections, we describe the proposed DDTCDR model with special focus on the idea of combining dual transfer learning mechanism with latent embedding approaches. We point out that the proposed model not only works for cross domain recommendations between two domains, but can easily extend to recommendations between multiple domains as well. Consider different domains . To provide recommendations for domain , we estimate the ratings similarly as the hybrid combination of withindomain estimation and crossdomain estimation:
(7) 
where represents the latent orthogonal transfer matrix between domain and respectively. Therefore, the proposed model is capable of providing recommendations for multipledomain application effectively and efficiently.
4. Experiment
To validate the performance of the proposed model, we compare the crossdomain recommendation accuracy with several stateoftheart methods. In addition, we conduct multiple experiments to study sensitivity of hyperparameters of our proposed model.
4.1. Dataset
We use a largescale anonymized dataset obtained from a European online recommendation service for carrying our experiments. It allows users to rate and review a range of items from various domains, while each domain was treated as an independent subsite with separate withindomain recommendations. Consequently, the combination of explicit user feedback and diverse domains makes the dataset unique and valuable for crossdomain recommendations. We select the subset that includes three largest domains  books, movies and music, three domains being linked together through a common user ID identifying the same user. We normalize the scale of ratings between 0 and 1. Note that each user might purchase items from different domains, while each item only belongs to one single domain. The basic statistics about this dataset are shown in Table 2.
Domain  Book  Movie  Music 

# Users  804,285  959,502  45,962 
# Items  182,653  79,866  183,114 
# Ratings  223,007,805  51,269,130  2,536,273 
Density  0.0157%  0.0669%  0.0301% 
4.2. User and Item Features
Besides the records of interactions between users and items, the online platform also collects user information by asking them online questions when they are using the recommender services. Typical examples include questions about preferences between two items, ideas about recent events, life styles, demographic information, etc.. From all these questions, we select eight mostpopular questions and use the corresponding answers from users as user features. Meanwhile, although the online platform does not collect directly item features, we obtain the metadata through Google Books API^{1}^{1}1https://developers.google.com/books/, IMDB API^{2}^{2}2http://www.omdbapi.com/ and Spotify API^{3}^{3}3https://developer.spotify.com/documentation/webapi/. To ensure the correctness of the collected item features, we validate the retrieved results with the timestamp included in the dataset. We describe the entire feature set used in our experiments in Table 3.
Category  Feature Group  Dimensionality  Type 

User Features  Gender  2  onehot 
Age  numerical  
Movie Taste  12  onehot  
Residence  12  onehot  
Preferred Category  9  onehot  
Recommendation Usage  5  onehot  
Marital Status  3  onehot  
Personality  6  onehot  
Book Features  Category  8  onehot 
Title  onehot  
Author  multihot  
Publisher  onehot  
Language  4  onehot  
Country  4  onehot  
Price  numeric  
Date  date  
Movie Features  Genre  6  onehot 
Title  onehot  
Director  multihot  
Writer  multihot  
Runtime  numeric  
Country  4  onehot  
Rating  numeric  
Votes  numeric  
Music Features  Listener  numeric  
Playcount  numeric  
Artist  onehot  
Album  onehot  
Tag  8  onehot  
Release  date  
Duration  numeric  
Title  onehot 
4.3. Baseline Models
To conduct experiments and evaluations of our model, we utilize the recordstratified 5fold cross validation and evaluate the recommendation performance based on RMSE, MAE, Precision and Recall metrics
(Ricci et al., 2011). We compare the performance with a group of stateoftheart methods.
CCFNet(Lian et al., 2017) Crossdomain Contentboosted Collaborative Filtering neural NETwork (CCCFNet) utilizes factorization to tie CF and contentbased filtering together with a unified multiview neural network.

CDFM(Loni et al., 2014) Cross Domain Factorization Machine (CDFM) proposes an extension of FMs that incorporates domain information in this pattern, which assumes that user interaction patterns differ sufficiently to make it advantageous to model domains separately.

CoNet(Hu et al., 2018) Collaborative Cross Networks (CoNet) enables knowledge transfer across domains by cross connections between base networks.

NCF(He et al., 2017) Neural Collaborative Filtering (NCF) is a neural network architecture to model latent features of users and items using collaborative filtering method. The NCF models are trained separately for each domain without transferring any information.

CMF(Singh and Gordon, 2008) Collective Matrix Factorization (CMF) simultaneously factor several matrices, sharing parameters among factors when a user participates in multiple domains.
Furthermore, we select hyperparamter for our study as 0.03 using Bayesian Optimization. We use onelayer MLP for constructing encoder and decoder separately. The size of feature embedding is 8. For each baseline method, we set the same hyperparameters as our proposed model if applicable. The results are reported in the next section.
5. Results
5.1. CrossDomain Recommendation Performance
Since we have three different domains, i.e. books, movies and music, this results in three domain pairs for evaluation. The performance comparison results of DDTCDR with the baselines are reported using the experimental settings in Section 4.
As shown in Table 4, 5 and 6, the proposed DDTCDR model significantly and consistently outperforms all other baselines in terms of RMSE, MAE, Precision and Recall metrics across all the three domain pairs. As Table 3 shows, RMSE measures for book vs. movie domains obtained using the proposed model are 0.2213 and 0.2213, which outperform the secondbest baselines by 3.98% and 2.44%; MAE measures are 0.1708 and 0.1704, which outperform the secondbest baselines by 9.54% and 9.80%. We observe similar improvements in book vs. music domains where DDTCDR outperforms the secondbest baselines by 4.07%, 8.87%, 2.14%, 4.74% and improvements in movie vs. music domains by 3.75%, 9.77%, 1.89%, 4.24% respectively. To summarize, all these results show that the dual transfer learning mechanism works well in practice, and the proposed DDTCDR model achieves significant crossdomain recommendation performance improvements. It is also worth noticing that DDTCDR only requires the same level of time complexity for training compared with other baseline models.
Furthermore, we observe that the improvements of proposed model on the book and movie domains are relatively greater than that of the music domain. One possible reason is the size of the dataset, for the book and movie domains are significantly larger than the music domain. We plan to explore this point further as a topic of future research.
Algorithm  Book  Movie  

RMSE  MAE  Precision@5  Recall@5  RMSE  MAE  Precision@5  Recall@5  
DDTCDR  0.2213*  0.1708*  0.8595*  0.9594*  0.2213*  0.1714*  0.8925*  0.9871* 
Improved %  (+3.98%)  (+9.54%)  (+2.77%)  (+6.30%)  (+2.44%)  (+9.80%)  (+2.75%)  (+2.74%) 
NCF  0.2315  0.1887  0.8357  0.8924  0.2276  0.1895  0.8644  0.9589 
CCFNet  0.2639  0.1841  0.8102  0.8872  0.2476  0.1939  0.8545  0.9300 
CDFM  0.2494  0.2165  0.7978  0.8610  0.2289  0.1901  0.8498  0.9312 
CMF  0.2921  0.2478  0.7972  0.8523  0.2738  0.2293  0.8324  0.9012 
CoNet  0.2305  0.1892  0.8328  0.8990  0.2298  0.1903  0.8680  0.9601 
Algorithm  Book  Music  

RMSE  MAE  Precision@5  Recall@5  RMSE  MAE  Precision@5  Recall@5  
DDTCDR  0.2209*  0.1704*  0.8570*  0.9602*  0.2753*  0.2302*  0.8392*  0.8928* 
Improved %  (+4.07%)  (+8.87%)  (+3.97%)  (+3.15%)  (+2.14%)  (+4.74%)  (+5.51%)  (+5.35%) 
NCF  0.2315  0.1887  0.8230  0.9294  0.2828  0.2423  0.7930  0.8450 
CCFNet  0.2630  0.1842  0.8150  0.9108  0.3090  0.2422  0.7902  0.8388 
CDFM  0.2489  0.2155  0.8104  0.9102  0.3252  0.2463  0.7895  0.8365 
CMF  0.2921  0.2478  0.8072  0.8978  0.3478  0.2698  0.7820  0.8324 
CoNet  0.2307  0.1897  0.8230  0.9300  0.2801  0.2410  0.7912  0.8428 
Algorithm  Movie  Music  

RMSE  MAE  Precision@5  Recall@5  RMSE  MAE  Precision@5  Recall@5  
DDTCDR  0.2174*  0.1720*  0.8926*  0.9869*  0.2758*  0.2311*  0.8370*  0.8902* 
Improved %  (+3.75%)  (+9.77%)  (+5.32%)  (+3.68%)  (+1.89%)  (+4.24%)  (+4.30%)  (+4.38%) 
NCF  0.2276  0.1895  0.8428  0.9495  0.2828  0.2423  0.7970  0.8501 
CCFNet  0.2468  0.1932  0.8398  0.9310  0.3090  0.2433  0.7952  0.8498 
CDFM  0.2289  0.1895  0.8306  0.9382  0.3252  0.2467  0.7880  0.8460 
CMF  0.2738  0.2293  0.8278  0.9222  0.3478  0.2698  0.7796  0.8400 
CoNet  0.2302  0.1908  0.8450  0.9508  0.2811  0.2428  0.8010  0.8512 
5.2. Convergence
In Section 3.4, we proof convergence of the dual transfer learning method for classical matrix factorization problem. Note that however, this proposition is not applicable to the DDTCDR model because the optimization process is different. Therefore, we test the convergence empirically in our study, and conjecture should still happen even though our method does not satisfy the condition. We conduct the convergence study to empirically demonstrate the convergence.
In particular, we train the model iteratively for 100 epochs until the change of loss function is less than 1e5. We plot the training loss over time in Figure
2, 3 and 4 for three domain pairs. The key observation is that, DDTCDR starts with relatively higher loss due to the noisy initialization. As times goes by, DDTCDR manages to stabilize quickly and significantly outperforms NCF only after 10 epochs.5.3. Sensitivity
To validate that the improvement gained in our model is not sensitive to the specific experimental setting, we conduct multiple experiments to examine the change of recommendation performance corresponding to different hyperparameter settings. As Figure 5, 6 and 7 show, we could observe certain amount of performance fluctuation with respect to transfer rate . Note that the crossdomain recommendation results (when take positive and small value) are consistently better than that of singledomain recommendations (when ). Also, we verify the recommendation performance using different types of autoencoders, including AE (Sutskever et al., 2014), VAE (Kingma and Welling, 2013), AAE (Makhzani et al., 2015), WAE (Tolstikhin et al., 2017) and HVAE (Davidson et al., 2018), and the improvement of recommendation performance is consistent across these settings as shown in Table 7. The choice of particular autoencoder is not relevant to the recommendation performance.
Autoencoder  Book Movie  

RMSE  MAE  RMSE  MAE  
AE  0.2213  0.1708  0.2213  0.1714 
VAE  0.2240  0.1704  0.2196  0.1707 
AAE  0.2236  0.1729  0.2195  0.1715 
WAE  0.2236  0.1739  0.2202  0.1739 
HVAE  0.2220  0.1717  0.2186  0.1704 
6. Conclusion
In this paper, we propose a novel dual transfer learning based model that significantly improves recommendation performance across different domains. We accomplish this by transferring latent information from one domain to the other through embeddings and iteratively going through the transfer learning loop until the models stabilize for both domains. We also prove that this convergence is guaranteed under certain conditions and empirically validate the hypothesis for our model across different experimental settings.
Note that, the proposed approach provides several benefits, including that it (a) transfers information about latent interactions instead of explicit features from the source domain to the target domain; (b) utilizes the dual transfer learning mechanism to enable the bidirectional training that improves performance measures for both domains simultaneously; (c) learns the latent orthogonal mapping function across two domains, that (i) preserves similarity of user preferences and thus enables proper transfer learning and (ii) computes the inverse mapping function efficiently.
As the future work, we plan to extend dual learning mechanism to multiple domains by simultaneously improving performance across all domains instead of domain pairs. We also plan to extend the convergence proposition to more general settings.
References
 Toward the next generation of recommender systems: a survey of the stateoftheart and possible extensions. IEEE Transactions on Knowledge & Data Engineering (6), pp. 734–749. Cited by: §1.
 Hyperspherical variational autoencoders. arXiv preprint arXiv:1804.00891. Cited by: §5.3.
 [3] Crossdomain recommender systems: a survey of the state of the art. Cited by: §1, §2.1.
 Dual learning for machine translation. In Advances in Neural Information Processing Systems, pp. 820–828. Cited by: §2.2.
 Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, pp. 173–182. Cited by: §2.3, 4th item.
 CoNet: collaborative cross networks for crossdomain recommendation. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 667–676. Cited by: §2.1, 3rd item.
 Personalized recommendation via crossdomain triadic factorization. In Proceedings of the 22nd international conference on World Wide Web, pp. 595–606. Cited by: §2.1.
 Autoencoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §5.3.
 Matrix factorization techniques for recommender systems. Computer (8), pp. 30–37. Cited by: §1.
 Algorithms for nonnegative matrix factorization. In Advances in neural information processing systems, pp. 556–562. Cited by: §3.4.

Can movies and books collaborate? crossdomain collaborative filtering for sparsity reduction.
In
TwentyFirst International Joint Conference on Artificial Intelligence
, Cited by: §1, §2.1.  Collaborative variational autoencoder for recommender systems. In Proceedings of the 23rd ACM SIGKDD, pp. 305–314. Cited by: §2.3.
 CCCFNet: a contentboosted collaborative filtering neural network for cross domain recommender systems. In Proceedings of the 26th international conference on World Wide Web companion, pp. 817–818. Cited by: §2.1, 1st item.
 Variational autoencoders for collaborative filtering. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, Cited by: §2.3.
 On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Transactions on Neural Networks 18 (6), pp. 1589–1596. Cited by: §3.4.
 Multiview clustering via joint nonnegative matrix factorization. In Proceedings of the 2013 SIAM International Conference on Data Mining, Cited by: §2.2.
 Dual transfer learning. In Proceedings of the 2012 SIAM International Conference on Data Mining, Cited by: §1, §2.2.

Transfer feature learning with joint distribution adaptation
. In Proceedings of the IEEE ICCV, Cited by: §2.2.  Crossdomain collaborative filtering with factorization machines. In European conference on information retrieval, pp. 656–661. Cited by: §2.1, 2nd item.
 Adversarial autoencoders. arXiv preprint arXiv:1511.05644. Cited by: §5.3.
 A survey on transfer learning. IEEE Transactions on knowledge and data engineering. Cited by: §1, §2.1, §2.2.
 Introduction to recommender systems handbook. In Recommender systems handbook, pp. 1–35. Cited by: §4.3.
 Crossdomain recommendation for largescale data. In CEUR Workshop Proceedings, Vol. 1887, pp. 9–15. Cited by: §2.1.
 Crossdomain collaborative recommendation in a coldstart context: the impact of user profile size on the quality of recommendation. In International Conference on User Modeling, Adaptation, and Personalization, Cited by: §2.1.
 It takes two to tango: an exploration of domain pairs for crossdomain collaborative filtering. In Proceedings of the 9th ACM RecSys, pp. 131–138. Cited by: §2.1.
 Contentbased crossdomain recommendations using segmented models.. Cited by: §2.1.
 Itembased collaborative filtering recommendation algorithms.. Www 1, pp. 285–295. Cited by: §1.
 Methods and metrics for coldstart recommendations. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 253–260. Cited by: §1.
 Autorec: autoencoders meet collaborative filtering. In Proceedings of the 24th International Conference on World Wide Web, Cited by: §2.3.
 Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 650–658. Cited by: §2.1, 5th item.
 Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pp. 3104–3112. Cited by: §5.3.
 Wasserstein autoencoders. arXiv preprint arXiv:1711.01558. Cited by: §5.3.
 Collaborative deep learning for recommender systems. In Proceedings of the 21th ACM SIGKDD, pp. 1235–1244. Cited by: §2.3.
 Crosslanguage web page classification via dual knowledge transfer using nonnegative matrix trifactorization. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pp. 933–942. Cited by: §1, §2.2.

Dual transfer learning for neural machine translation with marginal distribution regularization
. In ThirtySecond AAAI Conference on Artificial Intelligence, Cited by: §2.2.  Dualregularized matrix factorization with deep neural networks for recommender systems. KnowledgeBased Systems 145, pp. 46–58. Cited by: §2.1.
 Collaborative denoising autoencoders for topn recommender systems. In Proceedings of the Ninth ACM WSDM, Cited by: §2.3.

Dual supervised learning
. InProceedings of the 34th International Conference on Machine LearningVolume 70
, Cited by: §2.2.  Deep learning based recommender system: a survey and new perspectives. ACM Computing Surveys (CSUR) 52 (1), pp. 5. Cited by: §1, §2.3.
 Cross domain distribution adaptation via kernel mapping. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Cited by: §1, §2.2.
 Collaborative dualplsa: mining distinction and commonality across multiple domains for text classification. In Proceedings of the 19th ACM international conference on Information and knowledge management, pp. 359–368. Cited by: §2.2.
Comments
There are no comments yet.