Nowadays, users tend to have accounts in multiple social networks simultaneously for different services. For example, users may use LinkedIn to hunt for a job while adopting Instagram to share their daily life. Identifying linked accounts across different social networks enables us to integrate dispersed user information and provides a comprehensive understanding of user behavior, which will benefit many downstream implications, such as user profile modeling and cross-platform recommendation [UILreview]. However, the correspondences among users’ different accounts, a.k.a. anchor links, are generally unavailable due to the independence of different social networks. Therefore, user identity linkage has become an increasingly popular area of research.
User information in social networks typically includes network structure (i.e., social connection), profile (e.g., screen name, location), and content (e.g., post), revealing different aspects of a single user. Some methods [WhatIsInAName, NameBehavior, ULink, PALE, IONE, FRUI] use a single type of information. A single type of information may be noisy and incomplete, and suffers from inconsistency across different social networks [UILreview]. Some methods [MNA, PNA, HYDRA, COSNET, MASTER, Mego2Vec] explore the mechanisms to fuse multiple types of information. Multiple types of information may be complementary to each other.
Existing works focusing on information fusion can be divided into two categories, embedding based methods [MASTER, Mego2Vec, LHNE, FactoidEmbedding] and non-embedding based methods [MNA, PNA, HYDRA, COSNET]. Non-embedding based methods usually define hand-crafted features for every single type of information independently and combine them in a supervised or semi-supervised fashion [MNA, HYDRA]. This paradigm fails to capture the deep semantic of user information and the rich interaction among different information. Unlike the former, embedding based methods [DeepWalk, node2vec]
seek to learn a low-dimensional vector representation for a user that preserves the characteristics of the original data. The vector representation can significantly reduce the cost of computation and storage, and can be easily incorporated with deep learning to obtain a flexible model.
Still, there remain two main problems for embedding based methods. First, none of them provide a general solution to fuse all three types of information, structure, profile and content. The embedding of network structure has been well-studied [PALE, IONE, DALAUP], and recently some works explore the joint embedding of structure and profile [MASTER, Mego2Vec], or structure and content [LHNE]. Nevertheless, the latter requires specific interaction mechanisms between two types of information, which is not extensible to multiple types of information. Second, there have been few attempts to explicitly model the impact of the neighborhood, i.e., the first-order neighbors of a single user. Although matched neighbors have been utilized by non-embedding based methods, a typical embedding based model is only designed to learn node embeddings for node-level alignment, and is not designed to learn neighborhood embeddings for neighborhood-level alignment. Furthermore, as the matched neighbors vary with candidate user pairs, the neighborhood embeddings should be adapted dynamically, which is significantly different from the case of node embedding that seeks for a fixed representation for each user.
To tackle the challenges above, this paper proposes a novel framework with INformation FUsion and Neighborhood Enhancement (INFUNE) for user identity linkage. INFUNE contains two components, an information fusion component and a neighborhood enhancement component. The information fusion component employs a group of encoders and decoders to preserve the characteristic of each type of information, and integrate them in the node embeddings. Based on the node embeddings, the potential matched neighbors of a given user pair can be identified, and the neighborhood enhancement component, a novel graph neural network model, is applied to learn adaptive neighborhood representations.
The main contributions of this work are summarized as follows.
An information fusion component is proposed to integrate different user information in a unified manner. To our best knowledge, this is the first attempt to fuse user information of structure, profile and content simultaneously for user identity linkage in an embedding based model.
To utilize the potential matched neighbors for user identity linkage, INFUNE employs a novel graph neural network to learn neighborhood representations that vary with candidate user pairs.
Extensive experiments are conducted to validate the performance of INFUNE. The results show the superiority of INFUNE by comparing with state-of-the-art models.
2 Related Work
As there are multiple types of user information including network structure, profile and content, the existing works can be divided into two main categories, one exploits only a single type of user information while the other aims to integrate multiple types of information.
For the first category of methods, linking users by comparing attributes of profiles is most widely studied [WhatIsInAName, NameBehavior, ULink]. Mu et al. [ULink] map multiple attributes to a common latent space, where matched users lie closer than unmatched ones. Apart from the profile, users’ generated content can be utilized to extract more features [HYDRA, MNA]. Kong et al. [MNA]
convert the posts of each user into a bag-of-words vector weighted by TF-IDF and calculated the cosine similarities. However, profile and content information are generally incomplete and inconsistent, while network structure is more accessible and more consistent across social networks. Many methods[PALE, IONE, GraphUIL] adopt network embedding to encode the structure information to low-dimensional vectors and predicted linked users via vector similarities. Nevertheless, the sparsity of the network structure prevents these methods to learn discriminative user representations.
To overcome the drawbacks of every single type of information, many researchers seek to combine them, leading to the second category of methods. Zhong et al. [CoLink] create independent models for profile and structure, and makes them reinforce each other iteratively using a co-training algorithm. Su et al. [MASTER] map users into a latent space that preserves both structure and profile similarities. Zhang et al. [Mego2Vec] learn the profile embeddings from character level and word level, and aggregate the information of neighbors using an attention mechanism. To combine structure and content information, Wang et al. [LHNE] extracted topics from content and defined a user-topic network to learn unified user embeddings. Li et al. [UUIL, SNNA, MSUIL] adopt TADW [TADW] to fuse structure and content information, and link users by aligning the distributions of social networks with known anchor links as learning guidance. However, the aforementioned methods are designed to integrate only two types of information and the well-designed interaction mechanisms between heterogeneous information can hardly extend to integrate multiple types of information.
Among all the methods mentioned above, [PALE, IONE, MASTER, Mego2Vec, GraphUIL, LHNE, UUIL, SNNA, MSUIL] are embedding based methods and [WhatIsInAName, NameBehavior, ULink, MNA, HYDRA, CoLink] are non-embedding based methods.
3 Problem Formulation
Let denote a social network, where is the set of users, is the set of social connections, is the set of user profiles, and is the set of user generated contents. Each user is associated with a profile and a content . Each contains several user attributes, such as screen name, location and description, depending on the dataset. Each contains a set of texts generated by a single user.
This paper focuses on the problem of linking users between two social networks. Without loss of generality, one is regarded as the source network, while the other is regarded as the target network, denoted as and , respectively. The problem of user identity linkage is defined as follows.
Definition 1 (User Identity Linkage).
Given two social networks and , the task of user identity linkage is to find a function such that ,
From the definition above, it is sufficient to assess the pairwise similarities among users and generate potential matched user pairs by ranking the similarities. The similarity can be evaluated from different perspectives since users possess several types of raw features. However, the information from different similarity indicators can be redundant or contradictory. For example, an individual may maintain similar friends and keep similar writing styles in multiple social networks, while using completely different screen names for privacy protection. This results in the agreement of structural similarity and content similarity, and their disagreement with profile similarity. Therefore, it is challenging to unify different similarity indicators for user identity linkage. A naive solution is to train a binary classifier with similarity vectors as the input, while this method fails to capture the complex relations among different similarity indicators. Recently, Hamilton et al.[LeskovecReview] point out that various network embedding models can be unified in an encoder-decoder model that reconstructs the pairwise similarities among nodes within a graph. This paper extends this framework to learn user embeddings that preserve multiple types of similarities simultaneously, and the unified similarities are evaluated based on the embeddings for the final task.
4 Proposed Method
The structure of INFUNE is presented in Figure 1. INFUNE contains two components, the information fusion component and the neighborhood enhancement component. The raw features of users, including structure, profile and content, together with known anchor links, are first preprocessed as different similarity matrices. The resulting similarity matrices are fed into the information fusion component to obtain the node embeddings with heterogeneous information. The node embeddings are ready for preliminary comparison. Based on the node embeddings, the neighborhood enhancement component first identifies potential matched neighbors of candidate user pairs and then learns adaptive neighborhood embeddings that reflect the overlapping degree of the neighborhoods of candidate user pairs. Finally, a weighted sum of node similarity and neighborhood similarity is evaluated as the unified similarity for user identity linkage.
4.1 Information Fusion Component
A simple scheme for information fusion is to learn the embeddings of a user for different features independently and unify them into a single vector. This usually requires to design a sophisticated collaboration mechanism, since simple methods like concatenation fail to capture the complex interactions among features. Worse still, the number of parameters grows linearly with the numbers of features and users, which is not scalable to large social networks with heterogeneous information. To address the problems above, an information fusion component is proposed as follows.
4.1.1 Component Overview
As shown in Figure 2, the information fusion component assigns each user a vector, named the unified embedding, to integrate multiple types of information in a single social network. The unified embeddings are fed to different pairs of encoders and decoders to preserve user similarities w.r.t. different information, and all pairs work similarly. By leveraging known anchor links as the supervised information, the unified embeddings of users from different social networks are mapped to a common latent space. The resulting embeddings are called the node embeddings. The details of the information fusion component are described as follows.
Let be the unified embedding of some user, where is the dimension of the embedding. is mapped to different feature spaces through different feature-specific encoders to preserve the characteristic of the corresponding raw features. Formally, , short for , the feature embedding of is defined by
can be any learnable linear or non-linear mapping, and for simplicity, a two-layer perceptron is adopted. The usage of encoders avoids explicitly maintaining embedding matrices for different features, which greatly reduces the number of parameters.
For any two users, and , a feature-specific similarity indicator is defined as follows,
where is called the ground truth similarity between and w.r.t. feature . Note that can be intra-network () or inter-network (), and can be symmetric or asymmetric, depending on the features (Cf. Section 4.1.2 and Section 4.1.3). Correspondingly, a feature-specific decoder is designed to reconstruct the user similarity between and w.r.t. feature , i.e.,
where can be some operator like inner product or cosine similarity, or a learnable module, and is called the reconstructed similarity between and w.r.t. feature .
Notably, different from most of the existing network embedding models that apply to a single graph, the embeddings of users from two social networks are passed through the same encoders and decoders, which can help achieve alignment in different feature spaces.
The discrepancy between the reconstructed similarity and the true value
is measured by a loss function, and the empirical loss over all user pairs is
where and are the number of users in and that of , respectively.
In this paper, is chosen to be the squared loss for all . Let be the reconstructed similarity matrix and the ground truth similarity matrix, respectively, then the objective can be rewritten in a compact matrix form,
where is the Frobenius norm.
The formulations above are not only designed for fusing the information of the raw features, but they can also be applied to incorporate the supervised information by constructing a binary matrix that indicates whether or not two users are matched (Cf. Section 4.1.4). Denoting the corresponding loss as , the overall objective for information fusion is
4.1.2 Structure Embedding
Asymmetric relations, e.g., follower-followee relations, ubiquitously exist in social networks. Regarding them as symmetric relations will fail to capture these features that are useful for user identity linkage [IONE]. Therefore, for the structure information, it is intuitive to define an intra-network similarity indicator that tells if there is a directed edge between two users. The similarity indicator is asymmetric, which requires to design an asymmetric decoder. The key is to model a directed edge. Most of the existing structural embedding methods [IONE] define two embeddings for a node, one as the source node embedding and the other as the target node embedding. However, this requires an extra embedding for a node and fails to model the connection between the two roles of a node in a network. Note that social connection can be regarded as a separate object that is independent of the nodes it links, and it is shared by the two social networks. This observation inspires us to explicitly modeling a directed edge by defining a transformation that maps the structural embedding , regarding as the source node embedding, to a target node embedding . For simplicity, is chosen to be a two-layer perceptron as in Eq. (2).
Based on the transformation , an asymmetric decoder can be defined as follows,
where the cosine similarity is adopted to measure the linking strength between two nodes, and the negative values are truncated to restrict the range of the reconstructed similarities to .
4.1.3 Profile and Content Embedding
Inter-network similarity indicators are adopted for profile and content information, since there can be several shared attributes of profiles between two social networks, and contents can be compared directly via many text similarity indicators. Screen names are selected to measure the profile similarity, as they have been proved to be effective in user identity linkage [WhatIsInAName]. The normalized Levenstein distance is used to compute the string similarities among screen names. To compute the content similarity, first all posts of a user are concatenated to a single document, and then documents of all users from two social networks are fed to a Doc2Vec model [doc2vec] to obtain the text embeddings, and finally, as in Eq. (8), the truncated cosine similarity is used to measure the similarities among the text embeddings.
Naturally, symmetric decoders for profile and content embeddings can be defined as follows,
4.1.4 Supervised Information
The unified embeddings of users from different social networks can not be compared directly as they lie in different vector spaces. Therefore, two encoders, and are introduced for the source network and the target network respectively to map different users to a common latent space, i.e.,
where and represent the unified embeddings of and , repectively.
Intuitively, can be defined as an indicator function that indicates if two users are matched. Similar to the decoders for profile and content, a symmetric decoder for the supervised information can be defined as follows,
Since the overall objective is the sum of similar loss functions for different information, it is sufficient to consider the optimization of a single loss function. To simplify the formulation, the superscript for all related symbols are omitted.
The key challenge for optimization is the sparsity of the ground truth matrix . Generally, only a small fraction of users are highly similar to a given user, while the rest are dissimilar. Regarding the similar and dissimilar users as positive samples and negative samples, respectively, directly optimizing tends to overfit on negative samples and underfit on positive samples, which prevents the models from learning discriminative embeddings of users for user identity linkage. Besides, the time complexity is , which is costly for large-scale social networks. Inspired by Mikolov et al. [word2vec], a negative sampling trick is introduced to address the problems above.
Formally, given a percentage , for any , the
-quantile of the-th row of is denoted as , and is split into two disjoint subsets,
representing the sets of similar users and dissimilar users, respectively. Then, can be reformulated as follows,
where is a normalization constant, is the number of negative samples sampled from the “noisy distribution” , and . It can be calculated that . Therefore, the time complexity of evaluating the loss function is reduced to . For a large percentage and a small number , the evaluation is more efficient than that of the original formulation. Specifically, if is an adjacency matrix, , which coincides with the time complexity of most existing structural embedding models [LINE]. is used as the final loss function, and all parameters are updated via gradient descent.
4.2 Neighborhood Enhancement Component
The node embedding obtained by the information fusion component can be directly applied for user identity linkage. However, this matching scheme ignores the effect of common neighbors, which can result in mistakes that the neighbors of the predicted matched users are mostly unmatched. To promote the precision of user identity linkage, a neighborhood enhancement component is applied to learn neighborhood embeddings that reflect the overlapping degree of neighborhoods of candidate user pairs.
An intuitive solution is to use a graph convolution network (GCN) [GCN] to aggregate the information of neighbors. However, GCN convolves the embeddings of matched neighbors and unmatched neighbors indiscriminately, which may bring in additional noise that hurts the precision of matching. Besides, the obtained neighborhood embeddings are fixed, while the common neighbors vary with candidate user pairs. To address the problems above, the neighborhood enhancement component, shown in Figure 3, aggregates the information of the matched and unmatched neighbors separately, and unify them to the neighborhood embeddings, which are adaptive with varying candidate user pairs.
First, given a user pair, potential matched neighbors are identified with the “” constraint [PNA], i.e., each user from the source network can be mapped to at most one user in the target network. Then, the neighborhood is split into two disjoint subsets, containing potential matched neighbors () and unmatched neighbors (), respectively. Formally, for and ,
where and stand for the sets of neighbors of and , respectively.
Second, GCN is applied to obtain the neighborhood embeddings of potential matched neighbors and unmatched neighbors. For , the two embeddings are defined by
Then, and are concatenated with to obtain an embedding that integrates the information of and . Finally, this embedding is fed to a two-layer perceptron to obtain the neighborhood embedding, i.e.,
A similar procedure can be applied to to obtain .
The ground truth similarity is defined as in Section 4.1.4, and the reconstructed similarity between and is defined as
The loss function of the neighborhood enhancement component is defined as in Eq. (13) and the parameters can be updated via gradient descent.
With the node embeddings and the neighborhood embeddings, the total similarity between and is defined to be a weighted sum of the node similarity and the neighborhood similarity,
is a tunable hyperparameter that measures the importance of the neighborhood similarity.
5 Experimental Evaluation
5.1 Dataset and Experimental Settings
This paper uses a dataset collected from two Chinese social networks, Douban (https://www.douban.com) and Weibo (https://www.weibo.com). Douban is a Chinese social networking service website that allows users to record information and create content related to films, books, etc. There have been 200 million registered users as of 2013. Weibo is a leading micro-blogging platform in China, with over 445 million monthly active users as of 2018. Users create original content or retweet as on Twitter.
This dataset contains network structure, profile and content information. An anchor link is constructed if there is a Weibo homepage address link in the profile of the Douban user. Compared with existing public datasets, our dataset contains many more users, resulting in more anchor links and richer user relationships. Besides, our dataset contains a large number of contents, which are not included in existing public datasets. These characteristics pose more challenges to user identity linkage task. The statistics are listed in Table 1.
The follower-followee relations are regarded as directed edges. Screen names are used as the profile information, and the missing values are imputed by empty strings. For content information, LTP[LTP], a Chinese language processing toolkit, is used for word segmentation. All posts of a user are merged to a single document and the bag-of-words model is used for text preprocessing.
The dimensions of user embeddings of all methods are set to be
. In our method, the numbers of hidden neurons of all encoders are set to be. For the negative sampling procedure, the percentage is set to be , and the number of negative samples is set to be . For the neighborhood enhancement component, only the top- similar users are selected to evaluate neighborhood similarities, and the weight is set to be according to the grid searching results and a detailed discussion is shown in Section 5.5.
All methods are evaluated at different ratios of the training set. The ratio ranges from to .
The hit-precision is selected as the evaluation metric to compare the top-candidates, which is well-established and widely-used in many real user linkage applications [ULink]. This paper sets and evaluates all competitive methods by computing the top- precision for each test user as follows,
where represents the position of the correctly identified user in the returned top- users. Then, the hit-precision is calculated on test users by .
To evaluate the performance of INFUNE, we compare it with several state-of-the-art methods listed as follows.
ULink [ULink]: a non-embedding based method that projects the raw feature vectors of user profiles to a common latent space.
PALE [PALE]: an embedding based method that first embeds the network structure to a low-dimensional space, and then learns a mapping function in a supervised manner.
GraphUIL [GraphUIL]: an embedding based method that applies a graph neural network to jointly capture local and global network structure information.
MASTER [MASTER]: an embedding based method that maps users into a latent space that preserves intra-network structure and profile similarities.
MEgo2Vec [Mego2Vec]: an embedding based method that learns profile embeddings from character level and word level, and aggregates the information of potential matched neighbors using attention mechanism.
The codes of ULink111http://www.lamda.nju.edu.cn/code_ULink.ashx and MEgo2Vec222https://github.com/BoChen-Daniel/MEgo2Vec-Embedding-Matched-Ego-Networks-for-User-Alignment-Across-Social-Networks are public and thus directly used in our experiments. Other baselines are implemented by ourselves according to the original papers. The code of INFUNE333https://github.com/hilbert9221/INFUNE is available online. To verify the effectiveness of the information fusion component and the neighborhood component, some variants of INFUNE are introduced as follows.
, and : variants of INFUNE using a single type of information, i.e., network structure, profile or content information.
, and : variants of INFUNE using the pairwise combinations of the three types of information.
: a variant of INFUNE without the neighborhood enhancement component.
Besides, the user representations of some baselines such as ULink and GraphUIL can be easily replaced by the unified embeddings generated by an unsupervised version of INFUNE. To further verify the effectiveness of the information fusion component, this paper feeds the unified embeddings to ULink and GraphUIL, and the resulting variants are denoted as and , respectively.
5.3 Comparisons with Baselines
shows the overall performance of all methods on the Douban-Weibo dataset. The proposed method performs clearly better than the baselines (+12.43% on average). Our model achieves higher hit-precision than MEgo2Vec and MASTER, probably because they ignore the content information. Besides, compared with MEgo2Vec, maybe our model considers the effect of not only potential matched neighbors but also that of unmatched neighbors. Compared with MASTER, maybe our model utilizes not only intra-network but also inter-network similarities to learn user embeddings. Note that our model performs a bit worse than MASTER at. Maybe INFUNE additionally fuses content information, requiring more training data to better map users from different social networks to a common space. With larger , INFUNE significantly outperforms MASTER, indicating that INFUNE can better leverage supervised information. ULink, PALE and GraphUIL perform the worst as they rely on a single type of information that suffers from inconsistency across social networks.
5.4 Effect of Information Fusion
Variants removing a single type of information and two types of information are compared with INFUNE to verify the effectiveness of information fusion, and the results are shown in Figure 5. Compared with INFUNE, the hit-precisions of , and decrease by 13.21%, 9.54% and 8.17% on average, respectively, which demonstrates that each single type of information is indispensable for user identity linkage. Among all information, the content information contributes the most according to the decrease of the hit-precision, which may be because content contains richer information than the others. The importance of structure and profile information depends on the ratio of the training set. When is larger than , the structure information contributes more than the profile information, while the situation is reversed when is smaller than . Besides, the performance of INFUNE is even worse than at . The results indicate that the alignment of network structure relies more on the known anchor links, while the alignment based on the profile information is less sensitive to the supervised information.
By removing two types of information, the hit-precisions decrease by 19.12-26.41%, which again verifies the effectiveness of information fusion. Judging from the decrease of the hit-precision, the combination of structure and content is the most effective with sufficient supervised information (). When , the combination of structure and profile outperforms the others. The performance of is close to that of , which indicates that the effect of the combination of profile and content is similar to that of the combination of structure and profile. Interestingly, the performance of is more stable than INFUNE with varying , which suggests that linking users based on the combination of structure and content requires more supervised information.
As shown in Figure 4, and significantly outperform ULink and GraphUIL, respectively, which again verifies that embeddings with multiple types of information are more effective than those with a single type of information. Still, INFUNE performs better than and , which indicates that our whole framework with two integrated components can better leverage heterogeneous information.
5.5 Effect of Neighborhood Enhancement
As shown in Figure 4, by removing the neighborhood enhancement component, the performance decreases by 2.20% on average, but the difference is less significant for and . Possible reasons are as follows. When is smaller than , the known common neighbors are sparse, and therefore, they are inadequate to help identify matched users. With sufficient supervised information, i.e. when , node similarities dominate the results of user identity linkage and the neighborhood information is less helpful.
The parameter plays an important role in leveraging neighborhoods for user identity linkage. is chosen from . When , INFUNE is reduced to . Figure 6 shows the effect of at selected ratios of the training set, i.e., . It is observed that despite different , the hit-precision first increases with , peaks at around , and finally decreases as grows. When is larger than , the performance of INFUNE tends to be worse than , that is, large may bring in a negative effect to the whole model. The results indicate that node similarities dominate the performance of the whole model, and neighborhood similarities with a small weight can improve the hit-precision by 1.49-3.03 %. This is in line with our common sense that dissimilar users may share many common neighbors in social networks, and therefore, common neighbors contribute less to identify matched users.
5.6 Visualization of Learned Embeddings
A dimension reduction algorithm t-SNE [t-SNE] is adopted to project the node embeddings to two-dimensional space to illustrate the complementarity of heterogeneous information. As users often publish hundreds of posts, it is hard to show the content consistency of matched users within limited space. For simplicity and clarity, the node embeddings generated by and are selected for visualization. Figure 7 visualizes the node embeddings of four randomly selected matched user pairs and one unmatched user pair. The vector spaces of the two types of embeddings are referred to as the profile space and the structure-profile space, respectively. The square points represent users from Douban, and triangle points represent the users from Weibo. Note that the embeddings of the same user can be completely different as they lie in different vector spaces, and thus, it is the relative position between users that matters.
User pairs with similar screen names lie closer to each other than the others in the profile space. Based on the profile information, correctly identifies the matched user pairs (Maleah K, MaleahK) and (Clyde, Clyde84), but mistakenly matches user pair (Harrison, Harrison-X). Worse still, fails to identify matched user pairs (tang, Vic-zuo) and (At_sea_cb, None) with completely different screen names or even missing screen names. Thus, the prediction of is not reliable and robust for users with noisy and incomplete profile information.
By adding the structure information, inherits the advantages of and eliminates its limitations over the given user pairs. First of all, the relative positions of the identified matched user pairs are consistent with those in the profile space, which means that they will not be incorrectly predicted as unmatched user pairs. Secondly, unmatched users Harrison and Harrison-X with similar screen names are separated from each other in the structure-profile space, which shows that structure information can help distinguish similar users from each other. Finally, matched user pairs with dissimilar screen names, i.e., (tang, Vic-zuo) and (At_sea_cb, None) are successfully clustered, which again demonstrates the complementarity of profile and structure information.
6 Conclusion and Future Work
This paper presents a novel framework with information fusion and neighborhood enhancement for user identity linkage. The information fusion component effectively learns node embeddings that integrate user information of structure, profile and content by reconstructing pairwise user similarities. The neighborhood enhancement component adopts a novel graph neural network to learn adaptive neighborhood embeddings that reflect the overlapping degree of the neighborhoods of candidate user pairs. The results of extensive experiments on real-world social network data validate the effectiveness of our method.
Currently, embedding mechanisms for heterogeneous contents, e.g., texts and images, are not yet considered as our dataset contains only textual content. A solution is to learn content embeddings separately by reconstructing intra-network similarities and to map them to a common space by utilizing the supervised information. Exploration for more sophisticated methods will be left for future work. Besides, the two components of our model are trained separately, as the neighborhood enhancement component relies on the discriminative node embeddings generated by a well-trained information fusion component. Future work includes jointly training the two components to promote performance. This work is supported by the National Key R&D Program of China (2018AAA0101203), and the National Natural Science Foundation of China (61673403, U1611262).