By providing personalized item suggestions for each user, recommender systems have become a cornerstone of the E-commerce shopping experience (Adomavicius and Tuzhilin, 2005; Su and Khoshgoftaar, 2009). Among all recommendation algorithms, learning low dimensional user and item embeddigs is a key building block that have been widely studied (Mnih and Salakhutdinov, 2008; Koren et al., 2009; Rendle, 2012). With the learned user and item embeddings, it is convenient to approximate the likelihood or the predicted preference that a user would give to an item by way of a simple inner product between the corresponding user embedding and item embedding.
Many efforts have been devoted to designing sophisticated models to learn precise user and item embeddings. In the typical collaborative filtering scenario with user-item interaction behavior, the latent factor based approaches have received great success in both academia and industry due to their relatively high performance (Koren et al., 2009; Rendle et al., 2009; Mnih and Salakhutdinov, 2008; Rendle, 2012)
. Though successful, the recommendation performance is unsatisfactory due to the sparseness of user-item interaction data. As sometimes users and items are associated with features, factorization machines generalize most latent factor models with an additional linear regression function of user and item features(Rendle, 2012). Recently, researchers also designed more advanced neural models based on latent factor models and FMs (He et al., 2017; Guo et al., 2017). E.g., NeuMF extends over latent factor based models by modeling the complex relationships between user and item embedding with a neural architecture (He et al., 2017). These deep embedding models advance the performance of previous shallow embedding models. Nevertheless, the recommendation performance is still hampered by the sparse data.
Luckily, with the prevalence of online social networks, more and more people like to express their opinions of items in these social platforms. The social recommender systems have emerged as a promising direction, which leverage the social network among users to alleviate the data sparsity issue and enhance recommendation performance (Ma et al., 2011; Jiang et al., 2014; Guo et al., 2015; Tang et al., 2013). These social recommendation approaches are based on the social influence theory that states connected people would influence each other, leading to the similar interests among social connections (Bond et al., 2012; Lewis et al., 2012; Anagnostopoulos et al., 2008). E.g., social regularization has been empirically proven effective for social recommendation, with the assumption that connected users would share similar latent embeddings (Jamali and Ester, 2010; Ma et al., 2011; Jiang et al., 2014). TrustSVD++ extended the classic latent factor based models by incorporating each user’s trusted friends’ feedbacks to items as the auxiliary feedback of the active user (Guo et al., 2015). All these works showed performance improvement by considering the first-order local neighbors of each user. Nevertheless, we argue that, for each user, instead of the interest diffusion from a user’s neighbors to this user at one time, the social diffusion presents a dynamic recursive effect to influence each user’s embedding. In detail, as the social influence propagation process begins (i.e., the diffusion iteration ), each user’s first latent embedding is influenced by the initial embeddings of her trusted connections. With the recursive influence diffuses over time, each user’s latent embedding at -th iteration is influenced by her trusted neighbors at the -th iteration. Therefore, the social influence recursively propagates and diffuses in the social network. Correspondingly, each user’s interests change in the recursive process. Precise simulating the recursive diffusion process in the global social network would better model each user’s embedding, thus improve the social recommendation performance.
In this paper, we propose DiffNet: an Influence Diff
usion neural network based model to stimulate the recursive social influence propagation process for better user and item embedding modeling in social recommendation. The key idea behind the proposed model is a carefully designed layer-wise influence diffusion structure for users, which models how users’ latent embeddings evolve as the social diffusion process continues. Specifically, the diffusion process starts with an initial embedding for each user on top of the fusion of each user’s features and a a free user latent vector that captures the latent behavior preference. For the item side, as items do not propagate in the social network, each item’s embedding is also fused by the free item latent embedding and the item features. With the influence diffuses to a predefined-th diffusion step, the -th layer user interest embedding is obtained. In fact, with the learned user and item embedddings, DiffNet can be seamlessly incorporated into classical CF models, such as BPR and SVD++, and efficiently trained using SGD.
We summarize the contributions of this paper as follows:
We propose a DiffNet model with a layer-wise influence propagation structure to model the recursive dynamic social diffusion in social recommendation. Besides, DiffNet has a fusion layer such that each user and each item could be represented as an embedding that encompasses both the collaborative and the feature content information.
We show that the proposed DiffNet model is time and storage efficient in comparison to most embedding based recommendation models. The proposed model is a generalization of many related recommendation models and it is flexible when user and item attributes are not available.
Experimental results on two real-world datasets clearly show the effectiveness of our proposed DIP model. DiffNet outperforms more than 13.5% on Yelp and 15.5% on Flickr for top-10 recommendation compared to the the baselines with the best performance.
2. Problem Definition and Preliminaries
2.1. Problem Definition
In a social recommender system, there are two sets of entities: a user set (), and an item set (). Users interact with items to show their preference. As the implicit feedback (e.g., watching an movie, purchasing an item, listening to a song ) are more common in, we also consider the recommendation scenario with implicit feedback (Rendle et al., 2009). Let denote users’ implicit feedback based rating matrix, with if user is interested in item , otherwise it equals 0. The social network can be represented as a user-user directed graph , with is the user set and represents the social connections between users. If user trusts or follows user , , otherwise it equals 0. If the social network is undirected, then user connects to user denotes follows , and also follows , i.e., . Then, each user ’s ego social network, i.e., is the i-th column () of . For notational convenience, we use to denote the userset that trusts, i.e., .
Besides, each user is associated with real-valued attributes (e.g, user profile), denoted as in the user attribute matrix . Also, each item has an attribute vector (e.g., item text representation, item visual representation) in item attribute matrix . Without confusion, we use to denote users and to denote items. The matrices are denoted with capital bold letters, and vectors with small bold letters. Then, the social recommendation problem can be defined as:
Definition 2.1 (Social Recommendation).
Given a rating matrix , a social network , and associated real-valued feature matrix and of users and items, our goal is to predict users’ unknown preferences to items as: , where denotes the predicted preferences of users to items.
Classical Embedding Models. Given the user-item rating matrix , the latent embedding based models are among the most successful approaches to capture the collaborative information for building the recommender systems (Rendle et al., 2009; Koren et al., 2009; Rendle, 2012). Specifically, these latent embedding models embed both users and items in a low latent space, such that each user’s predicted preference to an unknown item turns to the inner product between the corresponding user and item embeddings as:
where is the embedding of , which is the a-th column of the user embedding matrix . Similarly, represents item ’s embedding in the -th column of item embedding matrix .
SVD++ is an enhanced version of the latent factor based models that leveraged the rated history items of each user for better user embedding modeling (Koren, 2008). In SVD++, each user’s embedding is composed of a free embedding as classical latent factor based models, as well as an auxiliary embedding that is summarized from her rated items. Therefore, the predicted preference is modeled as:
where is the itemset that shows implicit feedback, and is an implicit factor vector.
As sometimes users and items are associated with attributes, the feature enriched embedding models give the predicted preference of user to item is:
where the first term captures the bias terms with the feature engineering, and the second term models the second-order interaction between users and items. Different embedding based models vary in the embedding matrix formulation and the optimization function. E.g., Bayesian Personalized Ranking (BPR) is one of the most successful pair-wise based optimization function for implicit feedback (Rendle et al., 2009). In BPR, it assumes that the embedding matrices and are free embeddings that follow a Gaussian prior, which is equivalent to adding a L2-norm regularization in the optimization function as:
where is a logistic function that transforms the input into range . is the training data for with the itemset that positively shows feedback, and denotes the items that does not show feedback in the training data.
Social Recommendation Models Social influence occurs when a persons’s emotions, opinions or behaviors are affected by others (??, si_). In a social platform, social scientists have converged that social influence is a natural process for users to disseminate their preferences to the followers in the social network, such that the action (preference, interest) of a user changes with the influence from his/her trusted users (Lewis et al., 2012; Ibarra and Andrews, 1993; Kempe et al., 2003; Kramer et al., 2014). Therefore, as the social diffusion process continues, the social correlation phenomenon exits, with each user’s preference and behavior are similar to her social connections (Lewis et al., 2012; Anagnostopoulos et al., 2008).
The social influence and social correlation among users’ interests are the foundation for building social recommender systems (Ma et al., 2011; Sun et al., 2018; Guo et al., 2015). Due to the superiority of embedding based models for recommendation, most social recommendation models are also built on these embedding models. These social embedding models could be summarized into the following two categories: the social regularization based approaches (Jamali and Ester, 2010; Ma et al., 2011; Jiang et al., 2014) and the user behavior enhancement based approaches (Guo et al., 2015, 2016)
. Specifically, the social regularization based approaches assumed that connected users would show similar embeddings under the social influence diffusion. As such, besides the classical collaborative filtering based loss function (e.g, Eq.(4)), an additional social regularization term is incorporated in the overall optimization function as:
where is a diagonal matrix with .
Instead of the social regularization term, some researchers argued that the social network provides valuable information to enhance each user’s behavior to alleviate the data sparsity issue. TrustSVD is such a model that shows state-of-the-art performance (Guo et al., 2015, 2016). As researchers have well recognized that each user shows similar preferences as their social connections, the implicit feedbacks of a user’s social neighbors’ on items could be regarded as the auxiliary feedback of this user, with the modeling process as:
where denotes the latent embedding of user , who is trusted by . As such, ’s latent embedding is enhanced by considering the influence of her trusted users’ latent embeddings in the social network.
Despite the performance improvement of incorporating the social network for social recommendation, we notice that nearly all of the current social recommendation models leveraged the observed social connections (each user’s social neighbors) for recommendation with a static process at once. However, the social influence is not a static but a recursive process, with each user is influenced by the social connections as time goes on. At each time, users need to balance their previous preferences with the influences from social neighbors to form their updated latent interests. Then, as the current user interest evolves, the influences from social neighbors changes. The process is recursively diffused in the social network. Therefore, current solutions neglected the iterative social diffusion process for social recommendation. What’s worse, when the user features are available, these social recommendation models need to be redesigned to leverage the feature data for better correlation modeling between users (Jiang et al., 2014).
3. The Proposed Model
In this part, we build a DiffNet model that stimulates the influence diffusion for social recommendation. We start with the overall architecture of DiffNet, followed by the model learning process. Finally, we give a detailed discussion of the proposed model.
3.1. Model Architecture
We show the overall neural architecture of DiffNet in Fig 1. By taking an user-item pair
as input, it outputs the probabilitythat would like item . The overall neural architecture of DiffNet contains four main parts: the embedding layer, the fusion layer, the layer-wise influence diffusion layers, and the prediction layer. Specifically, by taking related inputs, the embedding layer outputs free embeddings of users and items. For each user (item), the fusion layer generates a hybrid user (item) embedding by fusing both a user’s ( an item’s) free embedding and the associated features. The fused user embedding is then sent to the influence diffusion layers. The influence diffusion layers are built with a layer-wise structure to model the recursive social diffusion process in the social network, which is the key idea of the DiffNet. After the influence diffusion process reaches stable, the output layer generates the final predicted preference of a user-item pair. We detail each part as follows:
Embedding Layer. Similar as many embedding based recommendation models (Rendle et al., 2009; Rendle, 2012; He et al., 2017), let and represent the free embeddings of users and items. These free embeddings capture the collaborative latent representations of users and items. Given the one hot representations of user and item , the embedding layer performs an index operation and outputs the free user latent vector and free item latent vector from user free embedding matrix and item free embedding matrix .
Fusion Layer. For each user , the fusion layer takes and her associated feature vector as input, and outputs a user fusion embedding that captures the user’s initial interests from different kinds of input data. We model the fusion layer as a one-layer fully connected neural network as:
where is a transformation matrix, and is a non-linear function. Without confusion, we omit the bias term in a fully-connected neural network for notational convenience. This fusion layer could generalize many typical fusion operations, such as the concatenation operation as by setting
as an identity matrix.
Similarly, for each item , the fusion layer models the item embedding
as a non-linear transformation between its free latent vectorand its feature vector as:
Influence Diffusion Layers. By feeding the output of each user ’s fusion embedding from the fusion layer into the influence diffusion part, the influence diffusion layers model the dynamics of users’ latent preference diffusion in the social network . As information diffuses in the social network from time to time, the influence diffusion part is analogously built with a multi-layer structure. Each layer takes the users’ embeddings from the previous layers as input, and output users’ updated embeddings after the current social diffusion process finishes. Then, the updated user embeddings are sent to the -th layer for the next diffusion process.
For each user , let denote her latent embedding in the -th layer of the influence diffusion part. By feeding the output of the -th layer into the -th layer, the influence diffusion operation at the -th social diffusion layer updates each user ’s latent embedding into . Specifically, the updated embedding is composed of two steps: diffusion influence aggregation (AGG) from ’s trusted users from the -th layer, which transforms all the social trusted users’ influences into a fixed length vector :
where the function could be defined as an average pooling that performs a mean operation of all the trusted users’ latent embedding at the -th layer. The can also be defined as a max operation that select the maximum element of all the trusted users’ latent embedding at the -th layer to form .
Then, -th updated embedding is a combination of her latent embedding at the -th layer and the influence diffusion embedding aggregation from her trusted users. Since we do not know how each user balances these two parts, we use a non-linear neural network to model the combination as:
where is non-linear transformation function.
With a predefined diffusion depth , for each user , the influence diffusion layer starts with the layer-0 user embedding (Eq.(7)), i.e., the output of the fusion layer, and the layer-wise influence diffusion process then diffuses to layer 1, followed by layer 1 diffuses to layer 2. This influence diffusion step is repeated for steps to reach the diffusion depth , where each user ’s latent embedding at the -th layer is .
Please note that, DiffNet only diffuses users’ latent vectors in the influence diffusion part without any item vector diffusion modeling. This is quite reasonable as item latent embeddings are static and do not propagate in the social network.
Prediction Layer. Given each user ’s embedding at the -th layer after the iterative diffusion process, each item ’s fusion vector , we model the predicted preference of user to item as:
where is the itemset that likes. In this equation, each user’s final latent representation is composed of two parts: the embeddings from the output of the social diffusion layers as , and the preferences from her historical behaviors as: . Specifically, the first term captures the user’s interests from the recursive social diffusion process in the social network structure. The second term resembles the SVD++ model that leveraged the historical feedbacks of the user to alleviate the data sparsity of classical CF models (Koren, 2008), which has shown better performance over the classical latent factor based models. Thus, the final user embedding part is more representative with the recursive social diffusion modeling and the historical feedbacks of the user. After that, the final predicted rating is still measured by the inner produce between the corresponding user final latent vector and item latent vector.
3.2. Model Training
As we focus on implicit feedbacks of users, similar to the widely used ranking based loss function in BPR (Rendle et al., 2009), we also design a pair-wise ranking based loss function for optimization:
is a sigmoid function., with , and . is a regularization parameter that controls the complexity of user and item free embedding matrices. denotes the pairwise training data for with represents the itemset that positively shows feedback.
All the parameters in the above loss function are differentiable. In practice, we implement the proposed model with TensorFlow222https://www.tensorflow.org to train model parameters with mini-batch Adam. We split the mini-batch according to the userset, i.e., each user’s training records are ensured in the same mini-batch. This mini-batch splitting procedure avoids the repeated computation of each user ’s latent embedding in the iterative influence diffusion layers.
As we could only observe positive feedbacks of users with huge missing unobserved values, similar as many implicit feedback works, for each positive feedback, we randomly sample 10 missing unobserved feedbacks as pseudo negative feedbacks at each iteration in the training process (Wu et al., 2016). As each iteration the pseudo negative samples change, each missing value gives very weak negative signal.
Space complexity. As shown in Eq.(13), the model parameters are composed of two parts: the user and item free embeddings , and the parameter set . Since most embedding based models (e.g., BPR (Rendle et al., 2009), FM (Rendle, 2012)) need to store the embeddings of each user and each item, the space complexity of is the same as classical embedding based models and grows linearly with users and items. For parameters in , they are shared among all users and items, with the dimension of each parameter is far less than the number of users and items. This additional storage cost is a small constant that could be neglected. Therefore, the space complexity of DiffNet is the same as classical embedding models.
Time complexity. Since our proposed loss function resembles the BPR model with the pair-wise loss function that is designed for implicit feedback, we compare the time complexity of DiffNet with BPR. The main additional time cost lies in the layer-wise influence diffusion process. The dynamic diffusion process costs , where is the number of users, and denotes the diffusion depth and denotes the average social neighbors of each user. Similarly, the additional time complexity of updating parameters is . Therefore, the additional time complexity is . In fact, as shown in the empirical findings as well as our experimental results, DiffNet reaches the best performance when . Also, the average social neighbors per user are limited with . Therefore, the additional time complexity is acceptable and the proposed DiffNet could be applied to real-world social recommender systems.
3.3.2. Model Generalization.
The proposed DiffNet model is designed under the problem setting with the input of user feature matrix , item feature matrix , and the social network . Specifically, the fusion layer takes users’ (items’) feature matrix for user (item) representation learning. The layer-wise diffusion layer utilized the social network structure to model how users’ latent preferences are dynamically influenced from the recursive social diffusion process. Next, we would show that our proposed model is generally applicable when different kinds of data input are not available.
When the user (item) features are not available, the fusion layer disappears. In other words, as shown in Eq.(8), each item’s latent embedding degenerates to . Similarly, each user’s initial layer-0 latent embedding (Eq.(7)). Similarly, when either the user attributes or the item attributes do not exist, the corresponding fusion layer of user or item degenerates.
The key idea of our proposed model is the carefully designed social diffusion layers with the input social network . When the recommender system does not contain any social network information, the social diffusion layers disappear with . Under this circumstances, as shown in Eq. (12) our proposed model degenerates to an enhanced SVD++ model (Koren, 2008) for recommendation, with the user and item latent embeddings contain the fused free embeddings and the associated user and item features.
3.3.3. Comparisons to Graph Convolutional based Models
In our proposed DiffNet, the designed layer-wise diffusion part (Eq.(9) and Eq.(10)) presents similar idea as the Graph Convolutional Networks (GCN), which are state-of-the-art representation learning techniques of graphs (Hamilton et al., 2017b; Kipf and Welling, 2017; van den Berg et al., 2017). GCNs generate node embeddings in a recursive message passing or information diffusion manner of a graph, where the representation vector of a node is computed recursively from aggregation features in neighbor nodes. GCNs has shown theoretical elegance as simplified version of spectral based graph models (Kipf and Welling, 2017). Besides, GCNs are time efficient and achieve better performance in many graph-based tasks.
Due to the success of GCNs, several models have attempted to transfer the idea of GCNs for the recommendation tasks. By transferring these models to the recommendation scenario, the main components are how to construct a graph and further exploit the uniqueness of recommendation properties. Among them, the most closely related works are GC-MC (van den Berg et al., 2017) and PinSage (van den Berg et al., 2017).
GC-MC: It is one of the first few attempts that directly applied the graph convolutions for recommendation. GC-MC defines a user-item bipartite graph from user-item interaction behavior (van den Berg et al., 2017). Then, each user embedding is convolved as the aggregation of the embeddings of her rated items. Similarly, each item embedding is convolved as the aggregation of the embeddings of the rated users’ embeddings. However, the graph convolution is only operated with one layer of the observed links between users and items, neglecting the layer-wise diffusion structure of the graph.
PinSage: It is designed for similar item recommendation from a large recommender system. By constructing an item-item correlation graph from users’ behaviors, a data-efficient GCN algorithm PinSage is developed (Ying et al., 2018). PinSage could incorporate both the item correlation graph as well as node features to generate item embeddings. The main contribution lies in how to design efficient sampling techniques to speed up the training process. Instead of message passing on item-item graph, our work performs the recursive information diffusion of the social network, which is more realistic to reflect how users are dynamically influenced by the social influence diffusion. Applying GCNs for social recommendation is quite natural and to the best of our knowledge, has not been studied before.
In this section, we conduct experiments to evaluate the performance of DiffNet on two datasets. Specifically, we aim to answer the following two research questions: First, does DiffNet outperforms the state-of-the-art baselines for the social recommendation task? Second, what is the performance of these models under different data sparsity? Third, the effectiveness of each part in DiffNet, e.g., diffusion modeling, attributes modeling, and so on.
4.1. Experimental Settings
Datasets. Yelp is an online location-based social network. Users make friends with others and express their experience through the form of reviews and ratings. As each user give ratings in the range , similar to many works, we transform the ratings that are larger than 3 as the liked items by this user. As the rich reviews are associated with users and items, we use the popular gensim tool333https://radimrehurek.com/gensim/ to learn the embedding of each word with Word2vec model (Mikolov et al., 2013). Then, we get the feature vector of each user (item) by averaging all the learned word vectors of the user(item).
is a who-trust-whom online image based social sharing platform. Users follow other users and share their preferences to images to their social followers. Users express their preferences through the upvote behavior. For research purpose, we crawl a large dataset from this platform. Given each image, we have a ground truth classification of this image on the dataset. We send images to a VGG16 convolutional neural network and treat the 4096 dimensional representation in the last connected layer in VGG16 as the feature representation of the image(Simonyan and Zisserman, 2015). For each user, her feature representation is the average of the image feature representations she liked in the training data.
In the data preprocessing step, for both datasets, we filtered out users that have less than 2 rating records and 2 social links, and removed the items which have been rated less than 2 times. We randomly select 10% of the data for the test. In the remaining 90% data, to tune the parameters, we select 10% from the training data as the validation set. The detailed statistics of the data after preprocessing is shown in Table 1.
Baselines and Evaluation Metrics.
Baselines and Evaluation Metrics.We compare DiffNet with various state-of-the-art baselines, including the classical pair-wise based recommendation model BPR (Rendle et al., 2009), feature enhanced latent factor model FM (Rendle, 2010), a state of the art social recommendation model TrustSVD (Guo et al., 2015), a context-aware social recommendation model ContextMF that utilized the same input as our proposed model for recommendation (Jiang et al., 2014). Besides, we also compare our proposed model with two graph convolutional based recommendation models: GC-MC (van den Berg et al., 2017) and PinSage (Ying et al., 2018). As the original PinSage focuses on generating high-quality embeddings of items, we generalize this model by constructing a user-item bipartite for recommendation (Ying et al., 2018). Both of these two convolutional recommender models utilized the user-item bipartite and the associated features of users and items for recommendation.
As we focus on recommending top-N items for each user, we use two widely adopted ranking based metrics: Hit Ratio (HR) and Normalized Discounted Cumulative Gain(NDCG) (Sun et al., 2018). Specifically, HR measures the number of items that the user likes in the test data that has been successfully predicted in the top-N ranking list. And NDCG considers the hit positions of the items and gives a higher score if the hit items in the top positions. For both metrics, the larger the values, the better the performance. Since there are too many unrated items, in order to reduce the computational cost, for each user, we randomly sample 1000 unrated items at each time and combine them with the positive items the user likes in the ranking process. We repeat this procedure 10 times and report the average ranking results.
Parameter Setting. For all the models that are based on the latent factor models, we initialize the latent vectors with small random values. In the model learning process, we use Adam as the optimizing method for all models that relied on the gradient descent based methods with an initial learning rate of 0.001. And the batch size is set as 512. In our proposed DiffNet model, we try the regularization parameter in the range , and find reaches the best performance. For the aggregation function in Eq.(9
), we have tried the max pooling and average pooling. We find the average pooling usually shows better performance. Hence, we set the average pooling as the aggregation function. Similar to many GCN models(Ying et al., 2018; Kipf and Welling, 2017), we set the depth parameter . With the user and item free embedding size , in the fusion layer and the following influence diffusion layers, each layer’s output is also set as dimension. For the non-linear function in the fusion layer, we use a sigmoid function that transforms each value into range . And we set the non linear functions of 2015). There are several parameters in the baselines, we tune all these parameters to ensure the best performance of the baselines for fair comparison. Please note that as generating user and item features are not the focus of our paper, we use the feature construction techniques as mentioned above. However, our proposed model can be seamlessly incorporated with more advanced feature engineering techniques.
4.2. Overall Comparison
In this section, we compare the overall performance of all models on two datasets. Specifically, Table 2 shows the HR@10 and NDCG@10 results for both datasets with varying latent dimension size . Among all the baselines, BPR only considered the user-item rating information for recommendation, FM and TrustSVD improve over BPR by leveraging the node features and social network information. PinSage takes the same kind of input as FM and shows better performance than FM, showing the effectiveness of modeling the information passing of a graph. ContextMF is the baseline that uses the user and item features, as well as the social network structure. It performs better than most baselines. Our proposed model consistently outperforms ContextMF, showing the effectiveness of modeling the recursive social diffusion process in the social recommendation process.
When comparing the results of the two datasets, we observe that leveraging the social network structure and the social diffusion process contributes more on Flickr compared to Yelp. On both datasets, PinSAGE is the best baseline that leverages the node features without the social network information. E.g., DiffNet improves over PinSAGE about 13% on Yelp, and nearly 30% on Flickr. We guess a possible reason is that, the Flickr is a social based image sharing platform with a stronger social influence diffusion effect. In contrast, Yelp is a location based social network, and users’ food and shopping preferences are not easily influenced in the social platform. Last but not least, we find the performance of all models does not increase as the latent dimension size increases from 16 to 64. Some models reach the best performance when (e.g., BPR) while other models reach the best performance when (e.g., DiffNet). In the following experiment, we set the proper for each model with the best performance in order to ensure fairness.
Table 3 shows the HR@N and NDCG@N on both datasets with varying top-N recommendation size N. From the results, we also find similar observations as Table 2, with our proposed model DiffNet always shows the best performance. Based on the overall experiment results, we could empirically conclude that our proposed DiffNet model outperforms all the baselines under different ranking metrics and different parameters.
4.3. Performance under Different Data Sparsity
The data sparsity issue is a main challenge for most CF based recommender systems. In this part, we would like to show the performance of various models under different sparsity.
Specifically, we bin users into different groups based on the number of observed feedbacks in the training data. Then we show the performance of each group with different models on NDCG@10 in Fig 2. In this figure, the horizontal axis shows the user group information. E.g., means for each user in this group, the training records satisfy . As can be observed from this figure, for both datasets, with the increase of the user rating records, the performance increases quickly for all models. When the rating records of each user is less than 16, the BPR baseline could not work well as this model only exploited the very sparse user-item interaction behavior for recommendation. Under this situation, all improvement is significant by leveraging various kinds of side information. E.g., FM, SVD++, and ContextMF improves over BPR by 9.6%, 15.2% and 20.7% on Yelp, and 34.9%, 50.3%, 58.4% on Flickr. The improvement of Flick is much more significant than that of Yelp, as Flickr has much more items compared to Yelp. By considering the iteratively social diffusion process in social recommendation, our proposed model improves BPR by 34.8% and 97.1% on Flick and Yelp, which far exceeds the remaining models. With the increase of user rating records, the performance improvements of all models over BPR decrease, but the overall trend is that all models have better performance than BPR. We also observe that when users have more than 256 records, some methods have similar results as BPR or even a little worse than BPR. We guess a possible reason is that BPR could well learn the user interests from enough interaction data. Withe the additional side information, some noises are introduced to decrease the performance.
4.4. Detailed Model Analysis
We would analyze the recursive social diffusion depth , and the impact of the fusion layer that combines the collaborative free embedding and associated entity feature vector.
Table 4 shows the results on DiffNet with different values. The column of “Improve” show the performance changes compared to the best setting of DiffNet, i.e., . When , the layer-wise diffusion part disappears, and our proposed model degenerates to an enhanced SVD++ with entity feature modeling. As can be observed from this figure, as we leverage the layer wise diffusion process from to , the performance increases quickly for for both datasets. For both datasets, the best performance reaches with two recursive diffusion depth, i.e., . When continues to increase to 3, the performance drops for both datasets. We hypothesis that, considering the -step recursive social diffusion process resembles the k-hop neighbors of each user. Since the social diffusion diminishes with time and the distance between each user and the -hop neighbors, setting with 2 is enough for social recommendation. In fact, other related studies have empirically find similar trends, with the best diffusion size is set as or (Ying et al., 2018; Kipf and Welling, 2017).
Table 4 shows the performance on DiffNet with different fusion inputs. As can be seen from this figure, the performance drops when the user and (or) item features are not available. We also notice that it is very important to add the free latent embeddings of users and items in the modeling process. As can be observed from this figure, the performance drops very quickly when either the user free latent embedding matrix or the item free embedding matrix are not considered. E.g., the performance drops about 80% on Flickr when the item free embedding is not considered. The reason is that, the item (user) latent factors could not be captured by the item (user) features. Therefore, learning the collaborative effect between users and items with the free embeddings is very important for the recommendation task.
5. Related Work
Collaborative Filtering. Given an user-item rating matrix , CF usually projected both users and items in a same low latent space for comparison (Koren et al., 2009; Mnih and Salakhutdinov, 2008). In reality, compared to the explicit ratings, it is more common for users implicitly express their feedbacks through action or inaction, such as click, add to cart or consumption (Hu et al., 2008; Rendle et al., 2009). Bayesian Personalized Ranking (BPR) is a state-of-the-art latent factor based technique for dealing with implicit feedback. Instead of directly predicting each user’s point-wise explicit ratings, BPR modeled the pair-wise preferences with the assumption that users prefer the observed implicit feedbacks compared to the unobserved ones (Rendle et al., 2009). Despite the relatively high performance, the data sparsity issue is a barrier to the performance of these collaborative filtering models. To tackle the data sparsity issue, many models have been proposed by extending these classical CF models. E.g., SVD++ is proposed to combine users’ implicit feedbacks and explicit feedbacks for modeling users’ latent interests (Koren, 2008). Besides, as users and items are associated with rich attributes, Factorization Machine (FM) is such a unified model that leverages the user and item attributes in latent factor based models (Wang et al., 2015)
. Recently, some deep learning based models have been proposed to tackle the CF problem. E.g., instead of assuming user and item are interacted as shallow linear interaction function between user and item latent vectors, NeuMF is proposed to model the complex interactions between user and item embeddings(He et al., 2017). As sometimes the user and item features are sparse, many deep learning based models have been proposed to tackle how to model these sparse features (Guo et al., 2017; Wang et al., 2015). In contrast to these works, we do not consider the scenario of sparse features and put emphasis on how to model the recursive social diffusion process for social recommendation.
Social Recommendation With the prevalence of online social platforms, social recommendation has emerged as a promising direction that leverages the social network among users to enhance recommendation performance (Ma et al., 2011; Guo et al., 2015; Jiang et al., 2014). In fact, social scientists have long converged that as information diffuses in the social networks, users are influenced by their social connections with the social influence theory, leading to the phenomenon of similar preferences among social neighbors (Anagnostopoulos et al., 2008; Ibarra and Andrews, 1993; Bond et al., 2012; Qiu et al., 2018). Social regularization has been empirically proven effective for social recommendation, with the assumption that similar users would share similar latent preferences under the popular latent factor based models (Jamali and Ester, 2010; Ma et al., 2011). SBPR model is proposed into the pair-wise BPR model with the assumption that users tend to assign higher ratings to the items their friends prefer (Zhao et al., 2014). By treating the social neighbors’ preferences as the auxiliary implicit feedbacks of an active user, TrustSVD (Guo et al., 2015, 2016) is proposed to incorporate the trust influence from social neighbors on top of SVD++ (Koren, 2008). As items are associated with attribute information (e.g., item description, item visual information), ContextMF is proposed to combine social context and social network under a collective matrix factorization framework with carefully designed regularization terms (Jiang et al., 2014). Social recommendation has also been extended with social circles (Qian et al., 2014) and the temporal context (Sun et al., 2018). Recently, the problem of how bridge a few overlapping users in the two domains of the social network domain and information domain for better recommendation has also been considered (Wang et al., 2017).
Instead of simply considering the local social neighbors of each user, our work differs from these works in explicitly modeling the recursive social diffusion process to better model each user’s latent preference in the global social network.
Graph Convolutional Networks and Applications. Our proposed model with recursive social diffusion process borrows the recent advances of graph convolutional networks (GCN) (Kipf and Welling, 2017; van den Berg et al., 2017; Hamilton et al., 2017b). GCNs have shown success to extend the convolution operation from the regular Euclidean domains to non-Euclidean graph domains. Spectral graph convolutional neural network based approaches provide localized convolutions in the spectral domain (Bruna et al., 2014; Defferrard et al., 2016)
. These spectral models usually handle the whole graph simultaneously, and are difficult to parallel or scale to large graphs. Recently, Kipf et al. designed a graph convolutional network (GCN) for semi-supervised learning on graph data, which can be motivated based on the spectral graph convolutional networks(Hamilton et al., 2017b; Kipf and Welling, 2017; van den Berg et al., 2017; Hamilton et al., 2017a). The key idea of GCNs is to generate node embeddings in a message passing or information diffusion manner of a graph. These GCN based models advanced previous spectral based models with much less computational cost, and could be applied to real-world graphs.
Researchers also exploited the possibility of applying spectral models and GCNs to recommender systems. As in the collaborative setting, the user-item interaction could be defined as a bipartite graph, some works adopted the spectral graph theory for recommendation. These spectral models could leverage the overall graph structure (Zheng et al., 2018; Monti et al., 2017). Nevertheless, these models showed high computational cost and it is non-trivial to incorporate user (item) features in the modeling process. As GCNs showed improved efficiency and effectiveness over the spectral models (Kipf and Welling, 2017), a few research works exploited GCNs for recommendation (van den Berg et al., 2017; Ying et al., 2018; Wu et al., 2018). These models all share the commonality of applying the graph convolution operation that aggregates the information of the graph’s first-order connections. GC-MC is one of the first few attempts that directly applied the graph convolutions on the user-item rating graph (van den Berg et al., 2017). However, the graph convolution is only operated with one layer of the observed links between users and items, neglecting the higher order structure of the graph. GCMC is proposed for bipartite edge prediction with inputs of user-item interaction matrix (Wu et al., 2018). This model is consisted of two steps: constructing a user-user graph and item-item graph from the user-item interaction matrix, then updating user and item vectors are based on the convolutional operations of the constructed graphs. Hence, the performance of GCMC relies heavily on the user-user and item-item construction process, and the two step process is not flexible compared to the end-to-end training process. By constructing an item correlation graph, researchers developed a data-efficient GCN algorithm PinSage, which combines efficient random walks and graph convolutions to generate embeddings of nodes that incorporate both graph structure as well as node feature information (Ying et al., 2018). Our work differs from them as we leverage the graph convolution operation for the recursive social diffusion in the social networks, which is quite natural. Besides, our proposed model is general and could be applied when the user (item) attributes or the social network structure is not available.
In this paper, we proposed a DiffNet neural model for social recommendation. Our main contribution lies in designing a layer-wise influence diffusion part to model how users’ latent preferences are recursively influenced by the her trusted users. We showed that the proposed DiffNet model is time and storage efficient. It is also flexible when the user and item attributes are not available. The experimental results clearly showed the flexibility and effectiveness of our proposed models. E.g., DiffNet improves more than 15% over the best baseline with NDCG@10 on Flickr dataset. In the future, we would like to extend our model for temporal social recommendation, where the temporal changes of users’ interests are implicitly reflected from their temporal feedback patterns.
- ?? (si_) Wikipedia explanation of social influence. https://en.wikipedia.org/wiki/Social_influence. (????). Accessed Jan 10, 2019.
- Adomavicius and Tuzhilin (2005) Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. TKDE 17, 6 (2005), 734–749.
- Anagnostopoulos et al. (2008) Aris Anagnostopoulos, Ravi Kumar, and Mohammad Mahdian. 2008. Influence and correlation in social networks. In SIGKDD. 7–15.
- Bond et al. (2012) Robert M Bond, Christopher J Fariss, Jason J Jones, Adam DI Kramer, Cameron Marlow, Jaime E Settle, and James H Fowler. 2012. A 61-million-person experiment in social influence and political mobilization. Nature 489, 7415 (2012), 295–298.
- Bruna et al. (2014) Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Lecun. 2014. Spectral networks and locally connected networks on graphs. In ICLR.
- Defferrard et al. (2016) Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In NIPS. 3844–3852.
- Guo et al. (2015) Guibing Guo, Jie Zhang, and Neil Yorke-Smith. 2015. TrustSVD: Collaborative Filtering with Both the Explicit and Implicit Influence of User Trust and of Item Ratings.. In AAAI. 123–125.
- Guo et al. (2016) Guibing Guo, Jie Zhang, and Neil Yorke-Smith. 2016. A novel recommendation model regularized with user trust and item ratings. TKDE 28, 7 (2016), 1607–1620.
- Guo et al. (2017) Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. Deepfm: a factorization-machine based neural network for ctr prediction. In IJCAI. 1725–1731.
- Hamilton et al. (2017a) Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017a. Inductive representation learning on large graphs. In NIPS. 1024–1034.
- Hamilton et al. (2017b) William L Hamilton, Rex Ying, and Jure Leskovec. 2017b. Representation learning on graphs: Methods and applications. TCDE 40, 3 (2017), 52–74.
- He et al. (2017) Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In WWW. 173–182.
- Hu et al. (2008) Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. In ICDM. 263–272.
- Ibarra and Andrews (1993) Herminia Ibarra and Steven B Andrews. 1993. Power, social influence, and sense making: Effects of network centrality and proximity on employee perceptions. ASQ 38, 2 (1993), 277–303.
- Ioffe and Szegedy (2015) Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML (2015), 448–456.
- Jamali and Ester (2010) Mohsen Jamali and Martin Ester. 2010. A matrix factorization technique with trust propagation for recommendation in social networks. In RecSys. 135–142.
- Jiang et al. (2014) Meng Jiang, Peng Cui, Fei Wang, Wenwu Zhu, and Shiqiang Yang. 2014. Scalable recommendation with social contextual information. TKDE 26, 11 (2014), 2789–2802.
- Kempe et al. (2003) David Kempe, Jon Kleinberg, and Éva Tardos. 2003. Maximizing the spread of influence through a social network. In SIGKDD. ACM, 137–146.
- Kipf and Welling (2017) Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In ICLR.
- Koren (2008) Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In SIGKDD. 426–434.
- Koren et al. (2009) Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30–37.
- Kramer et al. (2014) Adam DI Kramer, Jamie E Guillory, and Jeffrey T Hancock. 2014. Experimental evidence of massive-scale emotional contagion through social networks. PNAS 111, 24 (2014), 8788–8790.
- Lewis et al. (2012) Kevin Lewis, Marco Gonzalez, and Jason Kaufman. 2012. Social selection and peer influence in an online social network. PNAS 109, 1 (2012), 68–72.
- Ma et al. (2011) Hao Ma, Dengyong Zhou, Chao Liu, Michael R Lyu, and Irwin King. 2011. Recommender systems with social regularization. In WSDM. 287–296.
- Mikolov et al. (2013) Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS. 3111–3119.
- Mnih and Salakhutdinov (2008) Andriy Mnih and Ruslan R Salakhutdinov. 2008. Probabilistic matrix factorization. In NIPS. 1257–1264.
- Monti et al. (2017) Federico Monti, Michael Bronstein, and Xavier Bresson. 2017. Geometric matrix completion with recurrent multi-graph neural networks. In NIPS. 3697–3707.
- Qian et al. (2014) Xueming Qian, He Feng, Guoshuai Zhao, and Tao Mei. 2014. Personalized recommendation combining user interest and social circle. TKDE 26, 7 (2014), 1763–1777.
- Qiu et al. (2018) JZ Qiu, Jian Tang, Hao Ma, YX Dong, KS Wang, and J Tang. 2018. DeepInf: Modeling influence locality in large social networks. In SIGKDD. 2110–2119.
- Rendle (2010) Steffen Rendle. 2010. Factorization machines. In ICDM. 995–1000.
- Rendle (2012) Steffen Rendle. 2012. Factorization machines with libfm. TIST 3, 3 (2012), 57.
- Rendle et al. (2009) Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In UAI. 452–461.
- Simonyan and Zisserman (2015) Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR.
- Su and Khoshgoftaar (2009) Xiaoyuan Su and Taghi M Khoshgoftaar. 2009. A survey of collaborative filtering techniques. AAI 2009, 4 (2009), 1–19.
- Sun et al. (2018) Peijie Sun, Le Wu, and Meng Wang. 2018. Attentive Recurrent Social Recommendation. In SIGIR. 185–194.
- Tang et al. (2013) Jiliang Tang, Xia Hu, and Huan Liu. 2013. Social recommendation: a review. SNAM 3, 4 (2013), 1113–1133.
- van den Berg et al. (2017) Rianne van den Berg, Thomas N Kipf, and Max Welling. 2017. Graph Convolutional Matrix Completion. ICLR (2017).
- Wang et al. (2015) Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015. Collaborative deep learning for recommender systems. In SIGKDD. ACM, 1235–1244.
- Wang et al. (2017) Xiang Wang, Xiangnan He, Liqiang Nie, and Tat-Seng Chua. 2017. Item silk road: Recommending items from information domains to social users. In SIGIR. ACM, 185–194.
- Wu et al. (2016) Le Wu, Yong Ge, Qi Liu, Enhong Chen, Bai Long, and Zhenya Huang. 2016. Modeling Users’ Preferences and Social Links in Social Networking Services: a Joint-Evolving Perspective. In AAAI. 279–286.
- Wu et al. (2018) Yuexin Wu, Hanxiao Liu, and Yiming Yang. 2018. Graph Convolutional Matrix Completion for Bipartite Edge Prediction. In KDIR. 51–60.
- Ying et al. (2018) Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In SIGKDD. 974–983.
- Zhao et al. (2014) Tong Zhao, Julian McAuley, and Irwin King. 2014. Leveraging social connections to improve personalized ranking for collaborative filtering. In CIKM. 261–270.
- Zheng et al. (2018) Lei Zheng, Chun-Ta Lu, Fei Jiang, Jiawei Zhang, and Philip S Yu. 2018. Spectral collaborative filtering. In RecSys. 311–319.