Deep Cross Networks with Aesthetic Preference for Cross-domain Recommendation

05/29/2019 ∙ by Jian Liu, et al. ∙ Institute of Computing Technology, Chinese Academy of Sciences Rutgers University University of Central Arkansas The University of Queensland 0

When purchasing appearance-first products, e.g., clothes, product appearance aesthetics plays an important role in the decision process. Moreover, user's aesthetic preference, which can be regarded as a personality trait and a basic requirement, is domain independent and could be used as a bridge between domains for knowledge transfer. However, existing work has rarely considered the aesthetic information in product photos for cross-domain recommendation. To this end, in this paper, we propose a new deep Aesthetic preference Cross-Domain Network (ACDN), in which parameters characterizing personal aesthetic preferences are shared across networks to transfer knowledge between domains. Specifically, we first leverage an aesthetic network to extract relevant features. Then, we integrate the aesthetic features into a cross-domain network to transfer users' domain independent aesthetic preferences. Moreover, network cross-connections are introduced to enable dual knowledge transfer across domains. Finally, the experimental results on real-world data show that our proposed ACDN outperforms other benchmark methods in terms of recommendation accuracy. The results also show that users' aesthetic preferences are effective in alleviating the data sparsity issue on the cross-domain recommendation.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Recommendation systems have attracted a great amount of interests in recent years. They are utilized to handle the information overload problem and help people make right decisions according to their historical behaviors. When shopping online, we usually look through product images before making the decision, especially products that are important in appearance, e.g., clothes, shoes. Product images provide abundant visual information, including design, color schemes, decorative patterns, texture, and so on. We can even estimate the quality and the authenticity of a product from its images. As such, visual information plays an important role in improving the performance of recommendation with appearance priority.

Researchers have started to use image data for recommendation with various image features, such as features extracted by convolutional neural networks (CNN features), the scale-invariant feature transform algorithm (SIFT features), and color histograms

(Zhao et al., 2016, 2017; He and McAuley, 2016a). These image features contain semantic information to distinguish items and have been proved effective in recommendation tasks. However, one important visual factor, aesthetics, has rarely been considered in previous visual content enhanced recommendation systems. When purchasing appearance-first products, what consumers concern is not only ”What is the product?”, but also ”Does the product look good?” and ”Does the product match the aesthetic preference?”. Unfortunately, the image features, e.g., CNN features and SIFT features, do not encode aesthetic information by nature. Thus, to provide a high-quality recommendation, comprehensive and high-level aesthetic features are greatly desired.

Image aesthetics assessment, which requires an in-depth understanding of photographic attributes and semantics in an image, has a variety of applications, such as image search, photo ranking, and personal album curation. To characterize the complex and personal aesthetic perception, increasing research interests can be observed. For example, deep aesthetic networks have been developed to imitate human aesthetic perception and achieve the ability to represent image content from low-level features to high-level features (Wang et al., 2016; Lu et al., 2014; Ma et al., 2017). It is easy to understand that image aesthetics is a highly subjective task as individual user has very diversified aesthetic preferences. For instance, some people like the simple black and white appearance products, while some like colorful, flowery and punk style products, and some others like the outdoor wild wind style. Hence, if you know more about your consumer’s aesthetic preferences, you can recommend the products more convincingly according to consumer’s taste. However, few efforts have been found considering the aesthetic preferences for recommendation except Yu et al. (Yu et al., 2018) introduce aesthetic information into clothing recommendation systems. It demonstrates that incorporating aesthetic features can improve the recommendation performance significantly, since aesthetic features and CNN features complement each other. However, it does not consider aesthetic features for cross-domain recommendation.

Figure 1. A user’s aesthetic behaviors in different areas have consistency.

Generally, users are active in many E-commerce websites and have a large number of behavioral data in different domains. And the aesthetic preference varies significantly from user to user. However, a user’s aesthetic behaviors in different areas could be consistent. For example, as shown in Figure 1, if a user likes the simple black and white style, she/he will prefer item in domain and item in domain . If a user likes bells and whistles, a hip hop style, she/he will prefer item in domain and item in domain . Based on the above observation, we can see the aesthetic behavioral data of domain may help model the aesthetic preferences in domain . This consistency of aesthetic behavior is helpful for cross-domain recommendation, especially when one domain suffers from the data sparsity issue.

To capture aesthetic preferences and to transfer knowledge among different domains, we propose a new deep Aesthetic preference Cross-Domain Network, termed as ACDN, in which parameters characterizing the personal aesthetic preferences are shared across different domains to achieve a significant improvement for recommendation. Specifically, we first leverage an aesthetic network to extract relevant features. We utilize a deep aesthetic network (i.e., ILGNet (Jin et al., 2018)

) to extract the holistic features to represent the aesthetic elements of a product photo (for example, the aesthetic elements can be color, structure, proportion, style, etc.). Then, we incorporate the aesthetic features into a deep cross-domain recommendation network. Moreover, dual knowledge transfer is achieved by using dual cross transfer unit and joint loss function, which can enable them benefit from each other. Finally, we conduct extensive experiments to evaluate the effectiveness of the proposed model ACDN on two real-world Amazon datasets. Our experimental results show that ACDN achieves better performance in terms of the ranking metric, comparing with various baselines. We conduct a thorough analysis to understand how the aesthetic features and transferred knowledge help improve the performance of ACDN.

To the best of our knowledge, ACDN is the first deep model that transfers knowledge from auxiliary domain for recommendation with the aesthetic preference. The main contributions of this work are summarized as follows.

  • We leverage novel aesthetic features for cross-domain recommendation to capture users’ domain independent aesthetic preferences. Moreover, we compare the effectiveness of the aesthetic features with different types of conventional features for cross-domain recommendation to demonstrate the advantage of the aesthetic features.

  • We propose a new cross-domain recommendation algorithm ACDN for better modeling an individual’s propensity from the aesthetic perspective for recommendation, in which the aesthetic preference of each individual is shared for knowledge transfer across different domains.

  • We conduct extensive experiments on two real-world cross-domain datasets. Our experimental results show the proposed model ACDN outperforms the state-of-the-art methods via comprehensive analysis. Moreover, it can alleviate the data sparsity issue.

The remainder of this paper is organized as follows. Section 2 briefly introduces the related work. Section 3 provides the Notations and problem definition. In Section 4, we introduce our proposed ACDN model in detail. Our experimental results with analysis are shown in Section 5. Finally, we conclude this paper in Section 6.

2. Related Work

2.1. Collaborative Filtering

Recommender system is usually seen as predicting users’ preferences on unobserved items based on their past history interactions. Collaborative filtering (CF) is an early popular and widely used recommendation method based on matching users with similar tastes or interests (Herlocker et al., 1999). One representative technology for CF is Matrix Factorization (MF), which learns latent factors of users and items from a user-item rating matrix (Mnih and Salakhutdinov, 2008; Koren et al., 2009)

. Latent factor models extract feature vectors for users and items mainly based on MF. Factorization Machine (FM) can mimic MF with the flexibility of feature engineering

(Rendle, 2012). Moreover, with the revival of neural networks, neural CF methods are proposed to learn the underlying complex user-item interactions with a highly nonlinear function, such as Wide Deep (Cheng et al., 2016) and NCF (He et al., 2017). However, these CF-based methods based on the sole rating matrix are faced with data sparse and the cold-start problem.

Items are related with content information in general, such as unstructured text and visual features. A famous saying that A picture is worth a thousand words suggests that image contains rich information, which is an effective strategy to solve the above problems for recommender system. For instance, He et al. (He and McAuley, 2016a) proposed a scalable factorization model to incorporate visual features from product images into predictors of people’s opinions. Zhao et al. (Zhao et al., 2017) proposed a visual-enhanced probabilistic matrix factorization model for tour recommendation, which integrates visual features into the collaborative filtering model. Recently, Yu et al. (Yu et al., 2018)

proposed a coupled matrix and tensor factorization model for aesthetic-based clothing recommendation in which CNNs are used to learn the image features and aesthetic features. Different from our work, the above methods only focus on single domain recommendation.

2.2. Cross-domain Recommendation

Cross-Domain Recommendation (CDR) (Cremonesi et al., 2011) is another effective technique for alleviating data sparse issues by leveraging the rating information from other domains to enhance the performance on the target domain (Zhang et al., 2012). Existing CDR methods can be divided into two groups, i.e., content-based and transfer-based. Berkovsky et al. (Berkovsky et al., 2007) proposed a content-based CDR approach targeting the data sparsity problem by importing and aggregating vectors of users’ ratings operating in different application domains. Later on, Winoto et al. (Winoto and Tang, 2008)

uncovers the association between user preferences on related items across domains. Transfer-based approaches mainly employ machine learning techniques (e.g., transfer learning and neural networks) to transfer knowledge across domains. Li et al.

(Li et al., 2009) proposed a codebook method, which transfers user-item rating patterns from an auxiliary task in other domains to a sparse rating matrix in a target domain. Man et al. (Man et al., 2017)

proposed an embedding and mapping framework (EMCDR), which uses a multi-layer perceptron to learn the nonlinear mapping function between a source domain and a target domain. In terms of neural network, Misra et al.

(Misra et al., 2016) proposed a convolutional network with cross-stitch units to learn an optimal combination of shared and task-specific representation using multi-task learning, and hence enable the knowledge transfer between two domains. However, these methods treat knowledge transfer as a global process with shared global parameters and do not match source items with the specific target item given a user. Different from the above works, we introduce novel aesthetic features for cross-domain recommendation to capture users’ domain independent aesthetic preference and propose a new deep aesthetic preference cross-domain network for better modeling an individual’s propensity from the aesthetic perspective for recommendation.

3. Notations and Problem Definition

In this section, we will introduce related notations and our problem settings. Given a target domain and a source domain , where users (its size = ——) are shared, we want to transfer knowledge across domains. We denote the set of items in source domain as and the size of items in source domain is = . Similarly, we denote the set of items in target domain as and its size is = . We use to index a user, to index a target item and to index a source item. Then, matrix is used to represent the user-item interaction matrix in the target domain, and the entry is if the user has purchased the item and otherwise. Similarly for the source domain, matrix is used to describe user-item interactions, the entry is if user has an interaction with item and otherwise. Here each domain can be treated as a problem of collaborative filtering for implicit feedback  (Hu et al., 2008; Pan et al., 2008).

For the task of item recommendation, our goal is to recommend a ranked list of items for each user based on his/her history records, i.e., top- recommendation. We aim to improve the recommendation performance in the target domain with the help of the user-item interaction information and user’s aesthetic preference from the source domain. The items are ranked by their predicted scores:


where is an interaction function and are model parameters. For matrix factorization techniques, the match function is the fixed dot product:


and parameters are latent vectors of users and items, where , and is the dimension size. For neural CF approaches, neural networks are used to a parameterized function and learn it from interactions:


where the input

is merged from projections of the user and the item, and the projections are based on their one-hot encodings

and embedding matrices , . The output and the hidden layers are computed by and () in a multi-layer feedforward neural network (FFNN), and the connection weight matrices and biases are denoted by .

In our aesthetic preference cross-domain recommendation network, each domain is modeled by a neural network, and these networks are jointly learned to improve the performance through mutual knowledge transfer.

4. The Proposed MODEl

4.1. Model Overview

In this subsection, we briefly describe the proposed Aesthetic preference Cross-Domain Network model (ACDN), in which parameters characterizing the personal aesthetic preferences are shared across different domains to achieve a significant improvement for cross-domain recommendation.

Figure 2. The left figure is the proposed deep aesthetic preference cross-domain model architecture and the right figure is the aesthetic network (ILGNet) architecture.

As is shown in Figure 2(a), we adopt FFNN as the base network for each domain to parameterize the interaction function. The base network is similar to the Deep Model in (Cheng et al., 2016; Covington et al., 2016) and the MLP model in [12]. The proposed ACDN model processes the information flow from the input to the output with following four modules: Aesthetic Feature Extraction, Embedding Layer, Cross Transfer Layer and Output Layer. On the bottom of the figure is aesthetic feature extraction. For each item in the target domain and item in the source domain, we utilize the pre-trained deep aesthetic network to extract the aesthetic features from a corresponding image in advance. In the embedding layer, we embed the sparse one-hot encoding representation into a dense vector. The obtained user (item) embedding can be seen as the latent vector for user (item) in the context of the latent factor model. Then, the user embedding, item embedding, and aesthetic features are concatenated. Above the embedding layer is the cross transfer layer, which can enable dual knowledge transfer across domains from one base network to another and vice versa. The core idea of the cross transfer unit is to adopt a relationship/transfer matrix rather than a scalar weight to transfer knowledge. We enforce a sparse structure (-norm regularization) on the relationship/transfer matrix to control knowledge transfer so that the cross transfer layer can adaptively transfer selective and useful information. The final output layer is used to predict the score for the given user-item pair based on the representation from the last layer of the multi-hop module. In the following subsections, we will introduce our model in detail.

4.2. Aesthetic Feature Extraction

We utilize the pre-trained deep aesthetic neural network ILGNet (Jin et al., 2018)

to extract aesthetic features from item images. ILGNet (I : Inception, L : Local, G : Global) is a novel deep convolutional neural network , which introduces the inception module into image aesthetics classification and can extract aesthetic features from low level to high level. As is shown in Figure


(b), this network connects the layer of local features to the layer of global features to form a concat layer of 1024 dimension, which are binary patterns. Specifically, the first and the second inception layers are considered to extract local image features and the last inception layer is considered to extract global image features after two max pooling and one average pooling. Then, we connect the output of the first two inception layers (256 dimension for each) and the last inception layer (512 dimension) to form a 1024 dimension concat layer as the holistic aesthetic feature.

In our work, for each item in the target domain, we utilize the pre-trained ILGNet111 to extract its aesthetic features from the corresponding image in advance. Similarly, for each item in the source domain, we obtain its aesthetic feature . With the aesthetic features of items, we can capture users’ aesthetic preference across domains and improve the target domain recommendation performance.

4.3. Embedding Layer

To represent the input, we encode user-item interaction indices by one-hot encoding. For user , item from the target domain and item from the source domain, we map them into one-hot encoding , and , where only the element corresponding to index is and others are . Then, we embed one-hot encodings into continuous representation , and by embedding matrices , and , respectively. Finally, we concatenate , to be the input of following building blocks.

4.4. Cross Transfer Layer

In this subsection, we will introduce the cross transfer layer for knowledge transfer in detail. Different from CSN (Misra et al., 2016), the core idea of the cross transfer unit is to adopt a relationship/transfer matrix rather than a scalar weight to transfer knowledge. The target domain can receive information from the source domain and vice versa.

As is shown in Figure 2(a), we add cross transfer units to the entire FFNN. Denote as the weight connecting from the -th layer to the ()-th layer and as the bias in target domain. Similarly, there are and in the source domain. Denote as the relationship matrix from the -th layer to the -th layer. The two base networks can be coupled by cross transfer unit:



is the activation function and we use ReLU

(Nair and Hinton, 2010) here. In the target domain, we can observe that the representations of the -th layer receives two information flows: one is from the transform gate controlled by a weight matrix and another is from transfer gate controlled by (similarly for the in the source domain). This way of knowledge transfer happens in two directions, from the source domain to the target domain and from the target domain to the source domain, which can enable dual knowledge transfer across domains and let them benefit from each other. Similar to CSN (Misra et al., 2016), we take the same relationship/transfer matrix for both directions to reduce model parameters and make the model compact. Actually, it does not improve the performance of recommendation by taking different transfer matrices for two directions.

Obviously, the relationship/transfer matrix is very crucial to our model. We assume that not all representations from another domain are useful and we expect that the representations receiving from other domains are selective and useful. This corresponds to enforcing a sparse prior on the structure and can be achieved by penalizing the relationship/transfer matrix via regularization. We take the widely used sparsity-induced regularization: least absolute shrinkage and selection operator (Tibshirani, 1996). We enforce the -norm regularization on the relationship/transfer matrix to induce sparsity:


where is the entry () of , hyper-parameter controls the degree of sparsity and is the size of matrix . It means that linearly transforms representations in the source domain and the result is as part of the input to the next layer in the target domain.

4.5. Model Learning

According to the task of item recommendation and the nature of the implicit feedback, we adopt cross-entropy as our loss function for model optimization. The objective function to be minimized in the model optimization is defined as follows:


where and are the observed interaction matrix and randomly sampled negative examples (Pan et al., 2008), respectively. This objective function has probabilistic interpretation and is the negative logarithm likelihood of the following likelihood function:


where are model parameters.

we add joint loss function to our proposed model, which can be trained efficiently by back-propagation. Instantiating the base loss described in Eq.7 by the loss of the target domain () and loss of the source domain (), the objective function of our proposed model is their joint losses:


where the model parameters

. This objective function can be optimized by stochastic gradient descent (SGD):


where is the learning rate.

4.6. Complexity Analysis

The model parameters include , and , where user embedding , item embedding and

contain numbers of parameters because they depend on the input size of the user latent vector, the item latent vector and the aesthetic features. Usually, the number of neurons in a hidden layer is about one hundred. Thus, the size of the weight matrix and the cross transfer matrix is hundreds by hundreds. All in all, the size of model parameters is close to the size of typical latent factor models

(Koren et al., 2009) and is linear with the input size. During training process, we update the target network and the source network by the data of the corresponding domain. The learning strategy is similar to CSN (Misra et al., 2016) and the total cost of learning each base network is approximately equal to that of running a typical neural CF approach (He et al., 2017). Totally, the whole network can be trained efficiently by back-propagation with mini-batch stochastic optimization.

Dataset Clothing & Home Improvement (Dataset 1) Outdoor and Sports & Clothing (Dataset 2)
Method TopN = 5 TopN = 10 TopN = 20 TopN = 5 TopN = 10 TopN = 20
BPRMF 0.0902 0.0753 0.0650 0.1730 0.0941 0.0704 0.2757 0.0939 0.0785 0.1105 0.0853 0.0766 0.1743 0.0975 0.0779 0.2848 0.1371 0.0892
VBPR 0.1027 0.0831 0.0778 0.1836 0.1103 0.0831 0.2903 0.1142 0.0811 0.1335 0.0970 0.0885 0.1976 0.1105 0.0861 0.3044 0.1501 0.1023
CMF 0.1201 0.0903 0.0812 0.2014 0.1213 0.0947 0.3189 0.1242 0.0811 0.1479 0.1022 0.0931 0.2214 0.1233 0.1005 0.3237 0.1601 0.1255
CDCF 0.1130 0.0863 0.0794 0.1904 0.1178 0.0877 0.3054 0.1183 0.0803 0.1338 0.0928 0.0876 0.2103 0.1167 0.0931 0.3155 0.1566 0.1148
MLP 0.1251 0.0926 0.0866 0.2079 0.1225 0.0988 0.3266 0.1385 0.0871 0.1533 0.1047 0.0958 0.2321 0.1280 0.1021 0.3304 0.1622 0.1295
MLP++ 0.1292 0.0957 0.0974 0.2101 0.1278 0.1033 0.3321 0.1379 0.0944 0.1590 0.1136 0.1011 0.2467 0.1339 0.1104 0.3367 0.1734 0.1356
CSN 0.1388 0.1022 0.0922 0.2179 0.1335 0.1104 0.3465 0.1424 0.1027 0.1655 0.1243 0.1033 0.2498 0.1449 0.1170 0.3390 0.1881 0.1408
CoNet 0.1437 0.1059 0.1014 0.2230 0.1383 0.1185 0.3524 0.1513 0.1143 0.1739 0.1328 0.1124 0.2539 0.1480 0.1241 0.3437 0.1938 0.1510
ACDN 0.1472 0.1077 0.1047 0.2289 0.1403 0.1220 0.3601 0.1560 0.1166 0.1763 0.1357 0.1140 0.2611 0.1529 0.1254 0.3529 0.2003 0.1543
Improve 2.4% 1.69% 3.2% 2.6% 1.44% 2.95% 2.18% 3.10% 2.01% 1.38% 2.18% 1.40% 2.83% 3.31% 1.04% 2.68% 3.35% 2.18%
Table 1. Performance Comparison of different methods on two datasets. The best performance is highlighted in boldface.

5. Experiments

In this section, we first introduce experimental settings. And then we conduct experiments to answer the following research questions and validate our technical contributions.
RQ1: How does our proposed cross-domain recommender model ACDN perform as compared with state-of-the-art recommendation methods, including single-domain and cross-domain, visual enhanced methods, and deep/shadow methods?
RQ2: What are the advantages of the aesthetic features for cross-domain recommendation, compared with other conventional features, such as color histograms and CNN features?
RQ3: How do the hyper-parameters affect the performance of the proposed model?

5.1. Experimental Setup

Dataset. We study the effectiveness of our proposed approach on a real-world public dataset Amazon222 with different kinds of domains. It contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014, and has been used to evaluate the performance of various approaches. Here we use three domains: Home Improvement, Clothing, Outdoor and Sports, and conduct experiments on two datasets with following combinations. The statistics of the two datasets are summarized in Table 2.

Clothing & Home Improvement (Dataset 1) : Source domain = Clothing, Target domain = Home Improvement. The number of the sharing users is 8,673, and there are 18,442 items, 56,183 interactions and 21,317 items 60,942 interactions in the target domain and the source domain, respectively. Similar to  (Hu et al., 2018), we remove users and items with fewer than 5 purchase records. The density of the two domains is 0.035% and 0.032% respectively.

Outdoor and Sports & Clothing (Dataset 2): Source domain = Outdoor and Sports, Target domain = Clothing. The number of the sharing users is 13,164, and there are 22,465 items, 82,416 interactions and 17,765 items 68,291 interactions in the target domain and the source domain, respectively. Similar to  (Hu et al., 2018), we remove users and items with fewer than 5 purchase records. The density of the two domains is 0.029% and 0.029% respectively.

Dataset Statistics Source Domain Target Domain
Clothing Home Improvement
Dataset 1 #user 8673 8673
#item 21317 18442
#interactions 60942 56183
#density 0.032% 0.035%
Dataset Statistics Source Domain Target Domain
Outdoor and Sports Clothing
Dataset 2 #user 13164 13164
#item 17765 22465
#interactions 68291 82416
#density 0.029% 0.029%
Table 2. Dataset Description

Evaluation Protocol. For the item recommendation task, the leave-one-out evaluation is widely used and we follow the protocol in (He et al., 2017). It means that we reserve one interaction as the test item for each user. We determine hyper-parameters by randomly sampling another interaction per user as the validation set. We follow the common strategy which randomly samples 99 negative items that are not interacted by the user and then evaluate how well the recommender can rank the test item against these negative ones. Since we aim at

item recommendation, the typical evaluation metrics are hit ratio (HR), normalized discounted cumulative gain (NDCG) and mean reciprocal rank (MRR), where the ranked list is cut off at topN =

. HR intuitively measures whether the reserved test item is present on the top-N list, defined as:


where is the indicator function. NDCG and MRR also account for the rank of the hit position respectively, which are defined as:


Note that a higher value is better.

Baselines. As is shown in Table 3, we compare with various baselines, categorized as single/cross domain and shadow/deep methods.

Figure 3. This figure shows the impact of the -norm regularization for Sparsity. (a) (b) show the performance on Dataset 1 and (c) (d) show the performance on Dataset 2 in terms of TokN=10, 20.
  • BPRMF: Bayesian personalized ranking (Rendle et al., 2009) is a typical collaborate filtering approach, which learns the user and item latent factors via matrix factorization and pairwise rank loss.

  • MLP: Multi-layer perception (He et al., 2017) is a neural collaborate filtering approach, which can learn a user-item interaction function by neural networks.

  • MLP++: We combine two MLPs by sharing the user embedding matrix. This is a degenerated method that no cross transfer units.

  • VBPR: VBPR (He and McAuley, 2016b) is a scalable factorization model to incorporate visual signals into predictors of people’s opinions, which can make use of visual features extracted from product images by pre-trained deep networks.

  • CDCF: Cross-Domain Collaborate Filtering (Loni et al., 2014) is a cross-domain recommendation method, which is a context-aware approach that applies factorization on the merged domains aligned by the shared users. The auxiliary domain is utilized as a context.

  • CMF: Collective matrix factorization (Singh and Gordon, 2008) is a multi-relation learning approach, which jointly factorizes matrices of individual domains. Here, the relation is user-item interaction. The shared user factors enable knowledge transfer between two domains.

  • CSN: The cross-stitch network (Misra et al., 2016) is a deep multitask learning model and jointly learns two base networks. It enables knowledge transfer by a linear combination of activation maps from two domains via a shared coefficient.

  • CoNet: CoNet (Hu et al., 2018) is the latest collaborative cross networks for cross-domain recommendation, which can enable dual knowledge transfer across domains by introducing cross connections from one base network to another and vice versa and let them benefit from each other.

Baselines Shadow method Deep method
Single-Domain BPRMF(Rendle et al., 2009)VBPR(He and McAuley, 2016b) MLP(He et al., 2017)
Cross-Domain CDCF(Rendle, 2012) CMF(Singh and Gordon, 2008) CoNet(Hu et al., 2018) CSN(Misra et al., 2016) MLP++
Table 3. Categories of baselines

Implementation. For BPRMF (Rendle et al., 2009), we use LightFM’s implementation333, which is a popular collaborate filtering library. For VBPR (He and McAuley, 2016b), we use the open source code444 For CDCF (Loni et al., 2014), we adopt the official libFM implementation555 For MLP (He et al., 2017), we use the code released by its authors666 For CMF (Singh and Gordon, 2008), we use a Python version reference to the original Matlab code777 a jit/cmf/. For CSN (Misra et al., 2016), it requires that the number of neurons in each hidden layer is the same. The configuration can be denoted as [64] 4 (means [64, 64 ,64 ,64]). For CoNet (Hu et al., 2018)

, we use the code shared by its author. Our methods are implemented by Python with TensorFlow and parameters are randomly initialized by Gaussian

(0,0.01). We adopt Adam (Kingma and Ba, 2014) as the optimizer with an initial learning rate 0.001. The ratio of negative sampling is 1 and the size of the mini-batch is 128. As for the design of network structure, we take a tower pattern, having the layer size for each successive higher layer. Specifically, the configuration of hidden layers in each base network is [1152,512,256,128]. The size of the first hidden layer(i.e., 1152) is equal to the concatenation of , and .

5.2. Performance Comparison (RQ1)

To demonstrate the recommendation performance of our model ACDN, we compare it with state-of-the-art methods. The experimental results of all methods on two combinations datasets are illustrated in Table 1, and we have the following observations.

Firstly, we can find that cross-domain methods (i.e., CMF and CDCF) produce a better performance than single-domain methods (i.e., BPRMF and VBPR) at all settings on both datasets, regardless of shadow methods and deep methods. This indicates that cross-domain methods benefit from knowledge transfer and is an effective technique for alleviating the data sparsity issue. VBPR outperforms BPRMF, which indicates that visual features extract from item images can indeed enhance the performance of recommendation.

Secondly, we can notice that deep methods perform better than shadow methods in both single-domain and cross-domain. For example, MLP improves more than 15% comparing with shadow methods BRPMF and VBPR in all cases in single-domain, and deep cross-domain models (i.e., MLP++, CoNet, and CSN) outperform shadow cross-domain models (i.e., CMF and CDCF) in all cases on two datasets. This shows the effectiveness of deep neural models with the non-linear combination and more parameters can benefit not only single-domain recommendation but also cross-domain recommendation.

Thirdly, we can observe that our proposed neural model ACDN is better than all baselines on both two datasets at each setting, including the base MLP network, shallow cross-domain models (i.e., CMF and CDCF), deep cross-domain models (i.e., MLP++, CoNet, and CSN). These results demonstrate the effectiveness of the proposed aesthetic features enhanced the cross-domain neural model. Comparing MLP++ and MLP, sharing user embedding is slightly better than the base network due to unilateral knowledge transfer, which shows the necessity of dual knowledge transfer in a deep way. CSN is inferior to CoNet on both datasets. The reason is possible that the assumption of CSN is not appropriate: all representations from the auxiliary domain are equally important and are all useful. This motivates us to learn what to transfer adaptively and filter irrelevant information for target domain recommendation by using a cross transfer matrix rather than a scalar weight. Also, our model outperforms the state-of-the-art method CoNet since CoNet merely transfers user-item rating information, which demonstrates that aesthetic features can help improve cross-domain recommendation performance, especially in appearance-first products.

In summary, the empirical comparison results demonstrate the superiority of the proposed neural model to transfer aesthetic preference and source domain knowledge for cross-domain recommendation.

5.3. Necessity of the Aesthetic Features (RQ2)

In this subsection, we discuss the necessity of aesthetic features. We combine various widely used features in our basic model and compare the effect of each type of features by constructing models:

  • CDN: Removing the aesthetic features from our proposed model.

  • CHCDN: Replacing the aesthetic features with color histograms of our model.

  • CCDN: Replacing the aesthetic features with CNN features of our model.

Figure 4(a) shows the distribution of 10 maximum at HR@10 on Dataset 1 during 40 iterations. We can observe that CHCDN performs the worst since the low-level features are too crude and unilateral, and can provide very limited information about consumers’ aesthetic preference for cross-domain.

Our model ACDN, with aesthetic information, performs the best, though CNN features also contain some aesthetic information (like color, texture, etc.). It is far from a comprehensive description, which can be provided by the aesthetic features on account of the abundant raw aesthetic features inputted and training for knowledge transfer for cross-domain recommendation. CNN features can perform better than aesthetic features in a single domain (Yu et al., 2018), but experiments demonstrate the effectiveness of the aesthetic features in cross-domain recommendation. This phenomenon proves our assumption that a user’s aesthetic preference is domain independent and can be used as a bridge between domains for knowledge transfer.

(a) Necessity of the Aesthetic Features
(b) Loss and Performance
Figure 4. (a) shows the comparison of various visual features. (b) is the analysis of optimization performance of our model.

5.4. Impact of Hype-Parameters (RQ3)

5.4.1. Impact of -norm Regularization

Figure 3 shows the impact of -norm regularization on the entries of in Eq.6 . ACDN- is that we remove the -norm regularization from our model. From the experimental results, we can observe that ACDN performs better than ACDN- on both datasets, which demonstrates the effectiveness of enforcing the sparse structure (-norm regularization) on the cross transfer matrices. The -norm regularization can control knowledge transfer between source domain and target domain. In other words, with -norm regularization, our model can utilize the cross transfer matrices to select representations adaptively to transfer for cross-domain recommendation.

5.4.2. Sensitivity Analysis of

From the above analysis of impact of -norm regularization on cross transfer matrices, we can see that the -norm regularization is crucial to our model. But how to set the appropriate penalty parameter of -norm regularization? We will analyze the sensitivity of the penalty parameter of -norm regularization and we optimize the performance of our model varying with . As is shown in Figure 5, our model achieves the best performance with setting =0.5 on Dataset 1, while it achieves best performance with setting = 0.01 on Dataset 2. It is possible that the two datasets have different distribution of information. Thus, setting appropriate sparse penalty parameter under different background can improve the performance of our model.

5.4.3. Optimization Performance

We analyze the optimization performance of our model varying with training epochs. Figure

4(b) shows the training loss and NDCG@20 test performance on dataset 2 (HR and MRR have similar trends) varying with each optimization iteration. We can observe that with more iterations, the training loss gradually decreases and the recommendation performance is improved accordingly. The most effective updates are occurred in the first 30 iterations, and its performance gradually improves until 40 iterations. With more iterations, Our model is relatively stable.

(a) Performance on Dataset 1
(b) Performance on Dataset 2
Figure 5. Sensitivity Analysis of

6. Conclusions

In this paper, a new deep Aesthetic preference Cross-Domain network (ACDN) was introduced to transfer users’ aesthetic preferences across different domains to enhance the recommendation performance. Specifically, we proposed a deep cross-domain recommendation network incorporated with aesthetic preferences, which enabled dual knowledge transfer across domains by introducing cross transfer unit from one base network to another. Our work improved existing cross-domain recommendation research in two ways: (i) We leveraged novel aesthetic features for cross-domain recommendation to capture users’ domain independent aesthetic preference; and (ii) We proposed a new cross-domain recommendation algorithm for better modeling an individual’s propensity from the aesthetic perspective, in which the aesthetic preference of each individual is shared for knowledge transfer across different domains to alleviate the data sparsity problem. Using the Amazon dataset across three domains, we evaluated the effectiveness of our proposed approach against various baseline methods. Experimental results showed that: (i) The aesthetic features were effective in cross-domain recommendation. This further demonstrated that users’ aesthetic preference is domain independent. (ii) We found that deep/transfer models were superior to shadow/non-transfer methods, and incorporating aesthetic features into cross-domain recommendation could further improve the accuracy of recommendation. (iii) Dual knowledge transfer across domains by introducing cross connections from one base network to another can let them benefit from each other, which is superior to the knowledge transfer in one direction.


  • (1)
  • Berkovsky et al. (2007) Shlomo Berkovsky, Tsvi Kuflik, and Francesco Ricci. 2007. Cross-domain mediation in collaborative filtering. In ICUM. Springer, 355–359.
  • Cheng et al. (2016) Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016.

    Wide & deep learning for recommender systems. In

    Proceedings of the 1st workshop on deep learning for recommender systems. ACM, 7–10.
  • Covington et al. (2016) Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. ACM, 191–198.
  • Cremonesi et al. (2011) Paolo Cremonesi, Antonio Tripodi, and Roberto Turrin. 2011. Cross-domain recommender systems. In 2011 IEEE 11th International Conference on Data Mining Workshops. IEEE, 496–503.
  • He and McAuley (2016a) Ruining He and Julian McAuley. 2016a. VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback. In

    Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence

    (AAAI’16). AAAI Press, 144–150.
  • He and McAuley (2016b) Ruining He and Julian McAuley. 2016b. VBPR: visual bayesian personalized ranking from implicit feedback. In Thirtieth AAAI Conference on Artificial Intelligence.
  • He et al. (2017) Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 173–182.
  • Herlocker et al. (1999) Jonathan L Herlocker, Joseph A Konstan, Al Borchers, and John Riedl. 1999. An algorithmic framework for performing collaborative filtering. In SIGIR. ACM, 230–237.
  • Hu et al. (2018) Guangneng Hu, Yu Zhang, and Qiang Yang. 2018. CoNet: Collaborative Cross Networks for Cross-Domain Recommendation. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 667–676.
  • Hu et al. (2008) Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. In 2008 Eighth IEEE International Conference on Data Mining. IEEE, 263–272.
  • Jin et al. (2018) Xin Jin, Le Wu, Xiaodong Li, Xiaokun Zhang, Jingying Chi, Siwei Peng, Shiming Ge, Geng Zhao, and Shuying Li. 2018. ILGNet: inception modules with connected local and global features for efficient image aesthetic quality classification using domain adaptation.

    IET Computer Vision

  • Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  • Koren et al. (2009) Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 8 (2009), 30–37.
  • Li et al. (2009) Bin Li, Yang Qiang, and Xiangyang Xue. 2009. Can Movies and Books Collaborate? Cross-Domain Collaborative Filtering for Sparsity Reduction. In IJCAI.
  • Loni et al. (2014) Babak Loni, Yue Shi, Martha Larson, and Alan Hanjalic. 2014. Cross-domain collaborative filtering with factorization machines. In European conference on information retrieval. Springer, 656–661.
  • Lu et al. (2014) Xin Lu, Zhe Lin, Hailin Jin, Jianchao Yang, and James Z. Wang. 2014. RAPID: Rating Pictorial Aesthetics Using Deep Learning. In Proceedings of the 22Nd ACM International Conference on Multimedia (MM ’14). ACM, New York, NY, USA, 457–466.
  • Ma et al. (2017) Shuang Ma, Jing Liu, and Chang-Wen Chen. 2017. A-Lamp: Adaptive Layout-Aware Multi-patch Deep Convolutional Neural Network for Photo Aesthetic Assessment. 722–731.
  • Man et al. (2017) Tong Man, Huawei Shen, Xiaolong Jin, and Xueqi Cheng. 2017. Cross-Domain Recommendation: An Embedding and Mapping Approach. In IJCAI. 2464–2470.
  • Misra et al. (2016) Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. 2016. Cross-stitch networks for multi-task learning. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    . 3994–4003.
  • Mnih and Salakhutdinov (2008) Andriy Mnih and Ruslan R Salakhutdinov. 2008. Probabilistic matrix factorization. In NIPS. 1257–1264.
  • Nair and Hinton (2010) Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10). 807–814.
  • Pan et al. (2008) Rong Pan, Yunhong Zhou, Bin Cao, Nathan N Liu, Rajan Lukose, Martin Scholz, and Qiang Yang. 2008. One-class collaborative filtering. In 2008 Eighth IEEE International Conference on Data Mining. IEEE, 502–511.
  • Rendle (2012) Steffen Rendle. 2012. Factorization machines with libfm. ACM Transactions on Intelligent Systems and Technology (TIST) 3, 3 (2012), 57.
  • Rendle et al. (2009) Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI, 452–461.
  • Singh and Gordon (2008) Ajit P Singh and Geoffrey J Gordon. 2008. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 650–658.
  • Tibshirani (1996) Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) (1996), 267–288.
  • Wang et al. (2016) Zhangyang Wang, Florin Dolcos, Diane Beck, Shiyu Chang, and Thomas S. Huang. 2016. Brain-Inspired Deep Networks for Image Aesthetics Assessment. CoRR abs/1601.04155 (2016).
  • Winoto and Tang (2008) Pinata Winoto and Tiffany Tang. 2008. If you like the devil wears prada the book, will you also enjoy the devil wears prada the movie? a study of cross-domain recommendations. New Generation Computing 26, 3 (2008), 209–225.
  • Yu et al. (2018) Wenhui Yu, Huidi Zhang, Xiangnan He, Xu Chen, Li Xiong, and Zheng Qin. 2018. Aesthetic-based clothing recommendation. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 649–658.
  • Zhang et al. (2012) Yu Zhang, Bin Cao, and Dit Yan Yeung. 2012. Multi-Domain Collaborative Filtering. UAI 93 (2012), 725–732.
  • Zhao et al. (2016) Lili Zhao, Zhongqi Lu, Sinno Jialin Plan, and Qiang Yang. 2016. Matrix Factorization+ for Movie Recommendation. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI’16). AAAI Press, 3945–3951.
  • Zhao et al. (2017) Pengpeng Zhao, Xiefeng Xu, Yanchi Liu, Victor S. Sheng, Kai Zheng, and Hui Xiong. 2017. Photo2Trip: Exploiting Visual Contents in Geo-tagged Photos for Personalized Tour Recommendation. In Proceedings of the 25th ACM International Conference on Multimedia (MM ’17). ACM, New York, NY, USA, 916–924.