1. Introduction
Heterogeneous information networks (HINs) have received increasing attention in the past decade due to its ubiquity and capability of representing rich information (Shi et al., 2017; Sun and Han, 2013). Meanwhile, network embedding has emerged as a scalable representation learning method (Dong et al., 2017; Grover and Leskovec, 2016; Perozzi et al., 2014; Ribeiro et al., 2017; Tang et al., 2015a, b; Wang et al., 2016)
. Network embedding learns lowdimensional vector representations for nodes to encode their semantic information in the original network. The vectorized representations can be easily combined with offtheshelf machine learning algorithms for various tasks such as classification and link prediction
(Perozzi et al., 2014; Grover and Leskovec, 2016; Tang et al., 2015b; Hamilton et al., 2017), which provides a convenient approach for researchers and engineers to mine and learn from the networked data. To marry the advantages of HINs and network embedding, researchers have recently started to explore methods to embed heterogeneous information networks (Shang et al., 2016; Dong et al., 2017; Fu et al., 2017; Chang et al., 2015; Gui et al., 2016; Tang et al., 2015a; Shi et al., 2018), and have demonstrated the effectiveness of HIN embedding in applications including author identification (Chen and Sun, 2017), name disambiguation (Zhang and Hasan, 2017), proximity search (Liu et al., 2017), event detection (Zhang et al., 2017), etc.However, the heterogeneity in HINs brings in not only rich information but also potentially incompatible semantics, which poses special challenges to embed heterogeneous information networks. Take the moviereviewing network in Figure 1 as an example, where users review movies and list certain actors, directors, and genres as their favorites. Suppose user Stan likes both movies directed by Ang Lee (director) and musical (genre). Since Ang Lee has never directed any musical, nor is he semantically similar to musical, if this HIN were embedded into one metric space, musical and Ang Lee would be distant from each other, while the user Stan would not be simultaneously close to both of them, due to the triangle inequality property of metric spaces. We have also observed different extents of such incompatibility from realworld data as to be discussed in Section 4, which is consistent with the observation that different extents of correlation can exist within one HIN as per existing study (Shi et al., 2017). As a result, it can be expected that an algorithm would generate better embeddings if it additionally models such semantic incompatibility. We hence study the problem of comprehensive transcription of heterogeneous information networks, which purely aims to transcribe the rich and potentially incompatible information from HINs to the embeddings, without involving additional expertise, feature engineering, or installation of supervision.
With HINs comprehensives transcribed, one can again pipe the unsupervisedly learned embeddings to offtheshelf machine learning algorithms for a wide range of applications. Therefore, beyond the capability of preserving rich information, another motivation to study comprehensive transcription of HINs is to provide an easytouse approach to unleash the power of HINs in a wide variety of applications with no expertise or supervision required in the embedding learning process.
Traditional homogeneous network embedding methods (Grover and Leskovec, 2016; Perozzi et al., 2014; Ribeiro et al., 2017; Tang et al., 2015b; Wang et al., 2016) treat all the nodes and edges equally regardless of their types, which do not capture the essential heterogeneity of HINs. A couple of methods have recently been studied for embedding heterogeneous information networks (Shang et al., 2016; Dong et al., 2017; Fu et al., 2017; Chang et al., 2015; Gui et al., 2016; Tang et al., 2015a; Shi et al., 2018). Many of them build their algorithms on top of a set of metapaths (Shang et al., 2016; Dong et al., 2017), which often require users to specify the metapaths or leverage supervision to make the metapath selection. However, a set of metapaths specified or selected in this way often only reflects certain aspects of the HIN or is suitable for specific tasks. As a result, they are not always capable of transcribing HINs comprehensively. These methods are not as easytouse either because it involves the additional metapath generation process that entails expertise or supervision. Besides using metapaths, some approaches have been proposed to embed specific kinds of HINs (Gui et al., 2016; Tang et al., 2015a) for certain tasks or HINs with additional side information (Chang et al., 2015). These methods cannot be applied to comprehensively transcribe general HINs. Additionally, most existing HIN embedding methods (Shang et al., 2016; Dong et al., 2017; Gui et al., 2016; Tang et al., 2015a) employ only one metric space for embedding learning. This approach may suit downstream tasks that are related to certain partial information of an HIN with compatible semantics but could lead to information loss if the objective is to comprehensively transcript the entire HIN.
The problem of comprehensive transcription of HINs is challenging because it requires the modeling of heterogeneity that can be complex and incompatible. Besides, without the availability of supervision, proposed solutions need to capture the latent structure of the HINs and distinguish potentially incompatible semantics in an unsupervised way. To cope with these challenges, we propose heterogeneous information network embedding via edge representations, which is henceforth referred to as HEER. HEER builds edge embeddings atop node embeddings, which are further coupled with inferred heterogeneous metrics for each edge type. The inferred metrics capture which dimensions of the edge embeddings are more important for the semantic carried by their corresponding edge types. In turn, the information carried by edges of different types updates the node embeddings and edge embeddings with emphases on different typespecific manifolds. In this way, we can preserve different semantics even in the presence of incompatibility. Still take the moviereviewing network as example, by adopting heterogeneous metrics as in the lower part of Figure 1, Stan could be close to both musical(genre) and Ang Lee(director) under their respective metrics. Furthermore, the heterogeneous metrics are inferred by fitting the input HIN, so that semantic incompatibility is captured without additional supervision.
Specifically, with the availability of edge representations and coupled metrics, we derive loss function that reflects both the existence and the type of an edge. By minimizing the loss, the node embeddings, edge embeddings, and heterogeneous metrics are updated simultaneously, and thereby retain the heterogeneity in the input HIN. Different extents of incompatibility can also be modeled, where the more compatible two edge types are, the more similar their corresponding metrics would be.
Lastly, we summarize our contributions as follows:

We propose to study the problem of comprehensive transcription of HINs in embedding learning, which preserves the rich information in HINs and provides an easytouse approach to unleash the power of HINs.

We identify that different extents of semantic incompatibility exist in realworld HINs, which pose challenges to the comprehensive transcription of HINs.

We propose an algorithm, HEER, for the comprehensive transcription of HINs that leverages edge representations and heterogeneous metrics.

Experiments with realworld largescale datasets demonstrate the effectiveness of HEER and the utility of edge representations and heterogeneous metrics.
2. Related Work
Homogeneous network embedding. Meanwhile, network embedding has emerged as an efficient and effective representation learning approach for networked data (Grover and Leskovec, 2016; Ou et al., 2016; Perozzi et al., 2014; Ribeiro et al., 2017; Tang et al., 2015b; Wang et al., 2016; Hamilton et al., 2017; Dai et al., 2016; Zhang and Chen, Zhang and Chen; Ribeiro et al., 2017)
, which significantly spares the labor and sources in transforming networks into features that are more machineactionable. Early network embedding algorithms start from handling the simple, homogeneous networks, and many of them trace to the skipgram model
(Mikolov et al., 2013) that aims to learn word representations where words with similar context have similar representation (Grover and Leskovec, 2016; Perozzi et al., 2014; Ribeiro et al., 2017; Tang et al., 2015b). Besides skipgram, algorithms for preserving certain other homogeneous network properties have also been studied (Ou et al., 2016; Wang et al., 2016; Xu et al., 2018; Niepert et al., 2016; Veličković et al., 2018; Kipf and Welling, 2017). The use of edge representations for homogeneous network embedding is discussed in a recent work (AbuElHaija et al., 2017), but such edge representations are designed to distinguish the direction of an edge, instead of encoding richer semantics such as edge type in our case.Heterogeneous network embedding.
Heterogeneous information network (HIN) has been extensively studied since the past decade for its ubiquity in realworld data and efficacy in fulfilling tasks, such as classification, clustering, recommendation, and outlier detection
(Shi et al., 2017; Sun and Han, 2013; Sun et al., 2009; Yu et al., 2014; Zhuang et al., 2014). To marry the advantages of HIN and network embedding, a couple of algorithms have been proposed very recently for embedding learning in heterogeneous information networks (Shang et al., 2016; Dong et al., 2017; Fu et al., 2017; Chang et al., 2015; Gui et al., 2016; Tang et al., 2015a; Shi et al., 2018). One line of work first uses human expertise or supervision to select metapaths for a given task or limit the scope of candidate metapaths, and then proposes methods to transfer the semantics encoded in metapaths to the learned embedding (Shang et al., 2016; Dong et al., 2017; Fu et al., 2017). While this direction has been showed to be effective in solving problems that fit the semantics of the chosen metapaths, it differs from the research scope of ours because they mostly focus on providing quality representations for downstream tasks concerning the node types on the two ends of chosen metapaths, while we aim at developing methods to transcribe the entire HIN to embeddings as comprehensively as possible. Beyond metapaths, some approaches have been proposed to embed specific kinds of HINs (Gui et al., 2016; Tang et al., 2015a) with specific objectives such as representing event data or learning predictive text embeddings. Some other approaches study HINs with additional side information (Chang et al., 2015) that cannot be generalized to all HINs. Besides, all of these approaches embed the input HIN into only one metric space. Embedding in the context of HIN has also been studied for tasks with additional supervision (Chen and Sun, 2017; Liu et al., 2017; Pan et al., 2016). These methods either yield features specific to given tasks, and are outside of the scope of unsupervised HIN embedding that we study.A recent study (Shi et al., 2018) proposes a method by decomposing an HIN into multiple aspects before learning embedding, which also attains quality representations of HINs by alleviating the information loss arising from the rich, yet heterogeneous, and potentially conflicting semantics within the given networks. However, this approach embeds the derived aspects independently and completely forbids joint learning across aspects while our proposed method allows network components of varied compatibility to collaborate to different extents in the joint learning process.
3. Preliminaries
In this section, we define related concepts and notations.
Definition 3.1 (Heterogeneous Information Network).
An information network is a directed graph with a node type mapping and an edge type mapping . Particularly, when the number of node types or the number of edge types , the network is called a heterogeneous information network (HIN).
Given the typed essence of HINs, the network schema (Sun and Han, 2013) is used to abstract the metainformation regarding the node types and edge types in an HIN. Figure 2 illustrates the schema of a toy moviereviewing HIN.
In addition, we require that only one node type can be associated with a certain end of an edge type. That is, once an edge type is given, we would deterministically know the node types on its two ends. As an example, consider two edges with one representing director Fatih Akin living in Germany and another representing movie In the Fade being produced in Germany. Such requirement implies that these two edges must have distinct types – livesIn and isProducedIn – instead of just one type – isIn. For edge type , we denote , where means the node type pair is consistent with edge type . Additionally, define and .
Moreover, when the network is weighted and directed, we use to denote the weight of an edge with type that goes out from node toward . and respectively represent the outward degree of node (i.e., the sum of weights of all type edges going outward from ) and the inward degree of node (i.e., the sum of weights of all type edges going inward to ). For an unweighted edge, is trivially . For an undirected edge, we always have and .
Definition 3.2 (Node and Edge Representations in HIN Embedding).
Given an HIN , the problem of HIN embedding via edge representations learns a node embedding mapping and an edge embedding mapping , where and are the dimensions for node and edge embeddings, respectively. A node is thereby represented by a node embedding and a node pair is represented by an edge embedding .
With this definition, a node pair has its edge embedding even if no edge of any type has been observed between them. On the other hand, it is possible for node pair to be associated by multiple edges with different types, and we expect edge embedding to encapsulate such information of an HIN.
Finally, we define the problem of comprehensive transcription of a heterogeneous information network in embedding learning.
Definition 3.3 (Comprehensive Transcription of an HIN).
The comprehensive transcription of an HIN aims to learn the representations of the input HIN that retains the rich in the HIN as comprehensively as possible, in an approach that does not require additional expertise, feature engineering, or supervision.
4. Varied Extents of Incompatibility due to Heterogeneity
In this section, we look into the incompatibility in HINs using realworld data, and we take DBLP as an example.
DBLP is a bibliographical information network in the computer science domain (Tang et al., 2008), where authors write papers that are further associated with nodes of other attribute types. Since the measurable incompatibility in an HIN arises from the coexistence of multiple edge types, we dive down to the minimal case that involves two different edge types ( and ) joined by a common node type (). To quantify the incompatibility for this minimal case, we use the widely used generalized Jaccard coefficient to measure the similarity between the node groups reachable from a given node of type via the two edge types. Specifically, given node of type , the Jaccard coefficient for edge types and is given by , where is the reachability between nodes and via edge type and is the rownormalized adjacency matrix of edge type . Generalized Jaccard coefficient has a range of , and greater value implies more similarity, or equivalently, less incompatibility.
As an example, we consider four node types – author, paper, key term, and year – and two pairs of edge types – (i) authorship vs. publishing year of papers and (ii) authorship vs. term usage of papers. We illustrate the distributions over Jaccard coefficient using cumulative distribution function (CDF) for each of the two pairs in Figure
2(a). It can be seen that over of nodes have a generalized Jaccard coefficient smaller than between authorship and publishing year, while less than of nodes fall in the same category when it comes to authorship vs. term usage. In other words, we observe more incompatibility between authorship and publishing year than between authorship and term usage. However, this relationship is actually not surprising because papers published in the same year can be authored by any researchers who are active at that time, while key terms associated to certain research topics are usually used by authors focusing on these topics. With the presence of such varied extent of incompatibility, we would expect an embedding algorithm tailored for comprehensive transcription of HINs to be able to capture this semantic subtlety in HINs.In fact, by employing edge representation and heterogeneous metrics, the inferred metrics could be learned to be different for incompatible edge types. In turn, the information carried by these two edge types would be updating the node embeddings and edge embeddings with emphases on different manifolds. On the other hand, the subtlety of the different extent of incompatibility could also be captured in a way that the more compatible two edge types are, the more similar their inferred metrics should be.
5. Proposed Method
To provide an generalpurpose, easytouse solution to HIN embedding, we describe the HEER model in this section, where HEER stands for Heterogeneous Information Network Embedding via Edge Representations. Afterward, the model inference method is described subsequently.
5.1. The Heer Model
A learned embedding that effectively encodes the semantics of an HIN should be able to reconstruct this HIN. With the use of edge representation, we expect the embedding to infer not only the existence but also the type of edge between each pair of nodes. For edge type , we formulate the typed closeness of node pair atop their edge embedding as
(1) 
where is a edgetype–specific vector to be inferred that represents the metric coupled with this type. Edge types with compatible semantics are expected to share similar , while incompatible edge types make use of different to avoid the embedding learning on respective semantics to dampen each other.
To measure the capability of the learned embedding in reconstructing the input HIN, the difference between the observed weights and the typed closeness inferred from embedding are used, which leads to the objective to be minimized for edge type
(2) 
where
stands for the KullbackLeibler divergence.
Further, substituting Eq. (1) into Eq. (2) and taking all edge types into account and, the overall objective function becomes
(3) 
To formulate edge embeddings required by Eq. (3), we derive from the same embeddings of the associated nodes regardless of the involved edge type, so that we reach a unified model where the learning process involving multiple edge types can work together and mutually enhance each other if they embody compatible semantics. While there are many options to build edge embedding from node embedding, we expect our formulation not to be overcomplicated, so that the overall model could be computationally efficient and thereby easytouse. Moreover, in order for HEER to handle general HINs, it must also be able to handle directed and undirected edges accordingly. Considering these requirements, we decompose node embedding into two sections , where and are two column vectors of the same dimension, and build edge embedding on top of node embedding as
(4) 
where represents the Hadamard product. Besides Hadamard product, on can also build in a way similar to Eq. (4) using addition, subtraction, or outerproduct. We leave the exploration of this direction to future works.
5.2. Model Inference
The HEER model in Eq. (5) that we aim to infer can be structured as a neural network as illustrated in Figure 4, where and . Each pair of nodes gets their respective embeddings through the dense layer , which further compose edge embedding by function . The raw scores for all edge types are obtained through another dense layer
, followed by a type filter where the neuron for an edge type is connected to its corresponding neuron in the next layer only if this type is compatible with the node types of the input node pairs. Lastly, the loss is calculated by the typed closeness and the existence of edges in between the input node pair.
Since it is computationally expensive to compute the denominator in Eq. (1), we adopt the widely used negative sampling method (Mikolov et al., 2013), which enjoys lineartime computation. Specifically, each time, an edge between with type
is sampled from the HIN with probability proportional to its weight. Then
negative node pairs and negative node pairs are sampled, where each has the same type as and each has the same type as . The loss function computed from this sample becomeswhere
is the sigmoid function
.We adopt minibatch gradient descent with the PyTorch implementation to minimize the loss function with negative sampling, where each minibatch contains
sampled edges. We also use the node embeddings pretrained by the homogeneous network embedding algorithm LINE (Tang et al., 2015b) to initialize the node embeddings in HEER. The edgetype–specific scoring vector is initialized to be allone vectors.6. Experiments
In this section, we evaluate the embedding quality of the proposed method and analyze the utility of employing edge representation and heterogeneous metric using two large realworld HINs. We first perform an edge reconstruction task to directly quantify how well the embedding algorithms can preserve the information in the input HINs. Then, we conduct indepth case studies to analyze the characteristics of the proposed method.
6.1. Baselines
We compare the proposed HEER algorithm with baseline methods that fit the setting of our problem, i.e., the methods should be applicable to general HINs without the help of additional supervision or expertise.

Pretrained (LINE (Tang et al., 2015b)). This baseline uses the LINE algorithm to generate node embeddings, which are also used to initialize HEER. LINE is a homogeneous network embedding algorithm based on the skipgram model (Mikolov et al., 2013). We use inner product to compute the score of observing an edge between a pair of node embeddings following the original paper (Tang et al., 2015b).

AspEm (Shi et al., 2018). AspEm is a heterogeneous network embedding method that captures the incompatibility in HINs by decomposing the input HIN into multiple aspects with an unsupervised measure using datasetwide statistics. Embeddings are further learned independently for each aspect. This method considers the incompatibility in HINs but does not model different extent of incompatibility. Furthermore, it does not allow joint learning of embeddings across different aspects. Out of fairness, we let the number of aspects in AspEm to be two, in order to generate the final embedding with dimension that is identical to other methods. Inner product is also used to compute the score for this baseline.

UniMetrics (metapath2vec++ (Dong et al., 2017)). This is a partial model of HEER, where the metrics are not updated in the training process, i.e., they remain uniform as initialized. It is equivalent to the metapath2vec++ (Dong et al., 2017) using all edges as length1 metapaths without further selection. This method restricts the negative sampling to be done within the consistent node types, i.e., performs heterogeneous negative sampling, but still embeds all nodes into the same metric space regardless of types.

Pretrained Logit
. On top of the embeddings from the previous Pretrained model, we train a logistic regression (Logit) model for each edge type using the input network. Then, we compute scores for test instances of each edge type using the corresponding Logit model. This method models heterogeneous metrics but does not allow the node embeddings and the edge embeddings to be further improved according to the inferred metrics.
6.2. Data Description and Experiment Setups
In this section, we describe the two realworld HINs used in our experiments as well as experiment setups.
Datasets. We use two publicly available realworld HIN datasets: DBLP and YAGO.

DBLP is a bibliographical network in the computer science domain (Tang et al., 2008). There are five types of nodes in the network: author, paper, key term, venue, and year. The key terms are extracted and released by Chen et al. (Chen and Sun, 2017). The edge types include authorship (aut.), term usage (term), publishing venue (ven.), and publishing year (year) of a paper, and the reference relationship from a paper to another (ref.). We consider the first four edge types as undirected, and the last one as directed. The corresponding network schema is depicted in Figure 2(b) on page 2(b).

YAGO
is a largescale knowledge graph derived from Wikipedia, WordNet, and GeoNames
(Suchanek et al., 2007). There are seven types of nodes in the network: person, location, organization, piece of work, prize, position, and event. A total of edge types exist in the network, with five being directed and others being undirected. These edge types are illustrated together with the schema of the network in Figure 5.
We summarize the statistics of the datasets including the total number of nodes, the total number of edges, and the counts of each node type in Table 1.
Dataset  Node  Edge  Node type  Edge type 

DBLP  3,170,793  27,126,718  5  5 
YAGO  579,721  2,191,464  7  24 
Dataset  DBLP  YAGO  

Metric (MRR)  Aut.  Term  Ref.  Pub. venue  Pub. year  Microavg.  Macroavg.  Microavg.  Macroavg. 
Pretrained (LINE (Tang et al., 2015b))  0.7053  0.4830  0.8729  0.7488  0.4986  0.6307  0.6617  0.7454  0.6890 
AspEm (Shi et al., 2018)  0.7068  0.6010  0.8648  0.7612  0.6791  0.6976  0.7225  0.7832  0.6825 
UniMetrics (len1 metapath2vec++ (Dong et al., 2017))  0.7040  0.5772  0.8466  0.7534  0.6781  0.6812  0.7119  0.7437  0.6884 
Pretrained Logit  0.8187  0.6996  0.8072  0.8379  0.4889  0.7310  0.7304  0.8233  0.7012 
HEER  0.8964  0.7188  0.9573  0.9132  0.7421  0.8189  0.8456  0.8635  0.7185 
Experiment Setups. For all experiments and all methods, we set the total embedding dimension to be . That is, for HEER and its related baselines, and , and each of the two aspects in AspEm uses a dim embedding space. The pretrained model is always tuned to the best according to the performance in the edge reconstruction task to be introduced in Section 6.3. The negative sampling rate is always set to for all applicable models. We always rescale the pretrained embedding by a constant factor of before feeding them into HEER to improve the learning of heterogeneous metrics, which shares intuition with a previous study (Nickel and Kiela, 2017) in improving angular layout at the early stage of model training. The learning rate for gradient descent for HEER is set to
on both datasets. Note that we use the same set of hyperparameters for
HEER on both DBLP and YAGO in order to provide an easytouse solution to the problem of comprehensive transcription of HINs without the hassle of extensive parameter tuning.6.3. Edge Reconstruction Experiment
In order to directly quantify the extent to which an embedding algorithm can preserve the information in the input HINs, we devise the edge reconstruction experiments for both datasets. For each HIN, we first knock out a portion of edges uniformly at random, with a certain knockout rate . Embedding of the network after knockout is then learned using each compared method. The task is to reconstruct the edges being knocked out using the learned embedding models.
Specifically, for each edge that is knocked out from the network, suppose it is between node pair and of edge type , we randomly sample negative pairs that do not have type edges in the original full network, where is of the same node type as . For any model after training, a score can be calculated to reflect the likelihood for each of the node pairs to be associated by type edge in the current model. The reciprocal rank is then computed to measure the quality of the model, where the reciprocal rank is the reciprocal of the rank of the positive pair among all node. Similarly, another reciprocal rank is computed for the same node pair and other randomly sampled negative pairs with fixed but sampled . Finally, we report the mean reciprocal rank (MRR), which is computed by the mean of reciprocal ranks for the target test instances. In particular, the microaverage MRR and the macroaverage MRR are reported for both DBLP and YAGO, where the microaverage MRR is computed by the mean of all reciprocal ranks computed regardless of edge types, while the macroaverage MRR is derived by first computing the mean of reciprocal ranks for each edge type, and then averaging all these means across different edge types. Additionally, we also report the MRR for each edge type for DBLP, since DBLP involves only edge types, while YAGO has as many as edge types. We present the results with knockout rate in Table 2.
Modeling incompatibility benefits embedding quality. As shown in Table 2, the proposed HEER model outperformed all baselines in both datasets under both microaverage MRR and macroaverage MRR, which demonstrated the effectiveness of the proposed method. Even when looking at each edge type in DBLP, the MRR achieved by HEER was still the best. Besides, in DBLP, AspEm outperformed Pretrained and UniMetrics on most metrics. Recall that AspEm decomposed the HIN into distinct aspects using datasetwide statistics. As a result, it forbade semantically incompatible edge types to negatively affect each other in the embedding learning process and thereby achieved better results. In YAGO, the baselines considering heterogeneity did not always clearly outperform the simplest baseline Pretrained. We interpret this result by that YAGO has much more edge types than DBLP, which introduces even more varied extent of incompatibility, and the relatively simple approaches adopted by AspEm and UniMetrics in modeling incompatibility may not be enough to bring in significant performance boost. In contrast, armed with heterogeneous metrics finegrained to the edge type level, HEER outperformed Pretrained by a clear margin even in YAGO.
Heterogeneous metrics helps improving embedding quality. As a sanity check, the Pretrained Logit model helps rule out the possibility that HEER archives better results only by learning edgetype–specific metrics without actually improving embedding quality. From Table 2, it can be observed that by coupling with the additional edgetype–specific logistic regression and modeling heterogeneous metrics, the performance was improved on top of the Pretrained mode. This observation further consolidated the necessity of employing heterogeneous metrics for different edge types in solving the problem of comprehensive transcription of heterogeneous information networks. However, Pretrained Logit still performed worse than the proposed HEER mode, which implies that the inferred heterogeneous metrics of HEER indeed in return improved the quality of the node and edge embedding.
Heterogeneous negative sampling is not always enough to capture incompatibility. UniMetric performed better than the Pretained model in DBLP, it failed to make an absolute win over Pretained as other methods did in the YAGO dataset. Our interpretation of this result is that while the heterogeneous negative sampling as used by UniMetric does leverage certain typespecific information in HINs, it may not be always enough to model incompatibility and resolve the negative impact it brings to embedding quality. This observation is to be further corroborated in Section 6.4 by examining the capability of transcribing information implied by metapaths in each model.
Varying KnockOut Rate in Edge Reconstruction. Additionally, we vary the knockout rate on both datasets and report the microaverage MRR for the proposed HEER model and two baseline models that require less training time. As presented in Figure 6, HEER outperformed all baselines under at most knockout rates, which demonstrated the robustness of the proposed model. Besides, Pretrained Logit outperformed Pretrained at all knockout rates, which is also in line with the previous results we have presented. Notably, HEER did not outperform Pretrained Logit when the knockout rate . This is explainable because only a very small portion () of the original HIN was used for learning embedding when . With a bigger model size than Pretrained Logit, HEER was more prone to suffering from overfitting.
6.4. Case Studies
In this section, we conduct a series of indepth case studies to understand the characteristics of the proposed HEER model.
Learned heterogeneous metrics. HEER leverages heterogeneous metrics to model the different extent of incompatible semantics carried by different edge types. In this section, we analyzed the learned metrics in HEER to verify if they indeed captured the different semantics and thereby enriched the model capability.
To this end, we use heat maps to illustrate that are learned in the edge reconstruction experiments on both HINs. Specifically, for each dataset, we first standardize the elements of each to have zero mean and unit deviation, so that ’s have comparable scales after standardization for all . Then, we reorder the dimensions for better visualization and plot the heat maps in Figure 7.
Recall that the inferred metrics were set to be allone vectors in initialization, whereas Figure 7 shows that different metrics have generally reached different distributions over the dimensions after training. This implies that the inferred metrics of the HEER model indeed sensed the heterogeneity of the HINs, and were using different projected metric spaces to embed different semantics of the input network. Notably, it can be seen from the heat map of YAGO that edge types (isAffiliatedTo) and (playsFor) have similar inferred metrics. This is actually expected because these edge types are often associated with the relationship between professional sports players and their associated teams in YAGO. Besides, similar phenomenon can be observed between (isMarriedTo) and edge type (hasChild).
Embedded subnetwork with different edge types. In order to understand how the network is impacted by the introduction of heterogeneous metrics, we took a closer look at a subnetwork surrounding the British writer Thomas Hardy in the YAGO dataset. Multiple other writers having the influences relationship (, colored gray) with Thomas Hardy include Charles Dickens, John Fowles, and John Irving, while Florence Dugdale and Thomas Hardy enjoyed the isMarriedTo relationship (, colored red). Besides, Fowles and Irving are also influenced by Dickens. In Figure 8, we visualized this subnetwork under each embedding model with the inferred possibility of edge existence marked, where the embedding models are training using the entire network. It can be seen from Figure 8a that without distinguishing edge types, Pretrained assigned high possibilities to all edges. Meanwhile, with the learned heterogeneous metrics, the HEER model assigned a relatively low probability for Dugdale and Hardy under the metric for influences as in Figure 8b (note that Dugdale was also a writer), and a clearly higher probability under the metric for isMarriedTo as in Figure 8c.
Transcription of information implied by metapaths. When embedding each of the HINs, no metapaths of length greater than (i.e.., besides edges) were used, where a metapath is the concatenation of certain edge types used to reflect certain semantics of an HIN (Sun et al., 2011; Sun and Han, 2013; Shi et al., 2017). We would like to verify whether the input HIN can be transcribed so comprehensively that the signals implied by metapaths are also preserved in the embedding process, even without the use of guidance from metapaths.
To this end, we visualize the embedding results considering four different metapaths: (colored red), (colored cyan), (colored blue), and (colored yellow). In particular, we consider a given paper node and find all paper nodes connected to this given paper by the aforementioned four metapaths. Then we visualize the embeddings of all the nodes found in this way using the popular tDistributed Stochastic Neighbor Embedding (tSNE) (Maaten and Hinton, 2008) algorithm. Each node is colored according to the metapath it uses to connect to the given paper node. Additionally, we randomly downsampled the group of nodes reached by metapath because a year can have edges with tens of thousands of paper nodes. The visualization results are shown in Figure 9.
In Figure 8(b), the nodes with the same color are generally clustered together, which implies HEER can preserve the information implied by different metapaths in embedding learning even without the use of any metapath. As a comparison, we also visualized the same set of nodes in Figure 8(a) using embedding generated by UniMetrics (len1 metapath2vec++), which yields less distinct clusters, with red, cyan, and blue nodes mingled together. Recall that UniMetrics also considers edge types when conducting negative sampling, and is different from HEER only in that the former does not employ heterogeneous metrics leaning. This result again demonstrated that adopting heterogeneous negative sampling without learned heterogeneous metrics is not sufficient to preserve the heterogeneity in the HIN embedding process, and is therefore not ideal for solving the problem of comprehensive transcription of heterogeneous information networks.
6.5. Efficiency Study
For efficiency study, we plot out the loss and the performance of the proposed HEER algorithm against the number of epochs in the edge reconstruction experiment. We also illustrate the same curves of the UniMetrics (len1 metapath2vec++) model for comparison. The results are presented in Figure 10.
Judging from the curve for the loss again the number of epochs in Figure 10, HEER converges at a comparable rate with the skipgram–based UniMetrics (metapath2vec++). Besides, HEER took less than twice as much time to finish each epoch as metapath2vec++ did. This is expected because HEER only additionally requires onestep gradient descent for one when training on each sampled training example. As a result, the time complexity of HEER for each epoch differs from that of metapath2vec++ by a small constant factor. Combining the above two properties, HEER enjoys overall complexity linear to the number of nodes as skipgram–based algorithms do (Grover and Leskovec, 2016; AbuElHaija et al., 2017; Dong et al., 2017).
7. Conclusion and future works
We studied the problem of the comprehensive transcription of HINs in embedding learning, which preserves the rich information in HINs and provides an easytouse approach to unleash the power of HINs. To tackle this problem, we identify that different extents of semantic incompatibility exist in realworld HINs, which pose challenges to the comprehensive transcription of HINs. To cope with these challenges, we propose an algorithm, HEER, that leverages edge representations and heterogeneous metrics. Experiments and indepth case studies with large realworld datasets demonstrate the effectiveness of HEER and the utility of edge representations and heterogeneous metrics.
With the availability of edge representations proposed this paper, future works include exploration of more loss functions over edge representation, such as regression to model edges associated with ratings on HINs that have useritem reviews, or softmax to model HINs where at most one edge type can exist between a pair of node types. One may also explore alternate ways to build edge embedding using addition, subtraction, outerproduct, or deeper architectures. We leave the exploration of this direction to future works. Besides, it is also worthy of studying further boost the performance of HEER by incorporating higherorder structures such as network motifs, while retaining the advantage of HEER for being able to preserve the rich semantics from HINs.
Acknowledgments. This work was sponsored in part by U.S. Army Research Lab. under Cooperative Agreement No. W911NF0920053 (NSCTA), DARPA under Agreement No. W911NF17C0099, National Science Foundation IIS 1618481, IIS 1704532, and IIS1741317, DTRA HDTRA11810026, and grant 1U54GM114838 awarded by NIGMS through funds provided by the transNIH Big Data to Knowledge (BD2K) initiative (www.bd2k.nih.gov). Any opinions, findings, and conclusions or recommendations expressed in this document are those of the author(s) and should not be interpreted as the views of any U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon.
References
 (1)
 AbuElHaija et al. (2017) Sami AbuElHaija, Bryan Perozzi, and Rami AlRfou. 2017. Learning Edge Representations via LowRank Asymmetric Projections. In CIKM.
 Chang et al. (2015) Shiyu Chang, Wei Han, Jiliang Tang, GuoJun Qi, Charu C Aggarwal, and Thomas S Huang. 2015. Heterogeneous network embedding via deep architectures. In KDD.
 Chen and Sun (2017) Ting Chen and Yizhou Sun. 2017. TaskGuided and PathAugmented Heterogeneous Network Embedding for Author Identification. In WSDM.
 Dai et al. (2016) Hanjun Dai, Bo Dai, and Le Song. 2016. Discriminative embeddings of latent variable models for structured data. In ICML.
 Dong et al. (2017) Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In KDD.
 Fu et al. (2017) Taoyang Fu, WangChien Lee, and Zhen Lei. 2017. HIN2Vec: Explore Metapaths in Heterogeneous Information Networks for Representation Learning. In CIKM.
 Grover and Leskovec (2016) Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD.
 Gui et al. (2016) Huan Gui, Jialu Liu, Fangbo Tao, Meng Jiang, Brandon Norick, and Jiawei Han. 2016. Largescale embedding learning in heterogeneous event data. In ICDM.
 Hamilton et al. (2017) William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation Learning on Graphs: Methods and Applications. arXiv preprint arXiv:1709.05584 (2017).
 Kipf and Welling (2017) Thomas N Kipf and Max Welling. 2017. Semisupervised classification with graph convolutional networks. In ICLR.
 Liu et al. (2017) Zemin Liu, Vincent W Zheng, Zhou Zhao, Fanwei Zhu, Kevin ChenChuan Chang, Minghui Wu, and Jing Ying. 2017. Semantic Proximity Search on Heterogeneous Graph by Proximity Embedding. In AAAI.
 Maaten and Hinton (2008) Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using tSNE. JMLR 9, Nov (2008), 2579–2605.
 Mikolov et al. (2013) Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS.
 Nickel and Kiela (2017) Maximillian Nickel and Douwe Kiela. 2017. Poincaré embeddings for learning hierarchical representations. In Advances in Neural Information Processing Systems. 6341–6350.

Niepert
et al. (2016)
Mathias Niepert, Mohamed
Ahmed, and Konstantin Kutzkov.
2016.
Learning convolutional neural networks for graphs. In
ICML.  Ou et al. (2016) Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. 2016. Asymmetric Transitivity Preserving Graph Embedding. In KDD. 1105–1114.
 Pan et al. (2016) Shirui Pan, Jia Wu, Xingquan Zhu, Chengqi Zhang, and Yang Wang. 2016. Triparty deep network representation. In IJCAI.
 Perozzi et al. (2014) Bryan Perozzi, Rami AlRfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD.
 Ribeiro et al. (2017) Leonardo FR Ribeiro, Pedro HP Saverese, and Daniel R Figueiredo. 2017. struc2vec: Learning node representations from structural identity. In KDD.
 Shang et al. (2016) Jingbo Shang, Meng Qu, Jialu Liu, Lance M Kaplan, Jiawei Han, and Jian Peng. 2016. MetaPath Guided Embedding for Similarity Search in LargeScale Heterogeneous Information Networks. arXiv preprint arXiv:1610.09769 (2016).
 Shi et al. (2017) Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S Yu Philip. 2017. A survey of heterogeneous information network analysis. TKDE 29, 1 (2017), 17–37.
 Shi et al. (2017) Yu Shi, PoWei Chan, Honglei Zhuang, Huan Gui, and Jiawei Han. 2017. PReP: PathBased Relevance from a Probabilistic Perspective in Heterogeneous Information Networks. In KDD.
 Shi et al. (2018) Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, and Jiawei Han. 2018. AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks.. In SDM.
 Suchanek et al. (2007) Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In WWW.
 Sun and Han (2013) Yizhou Sun and Jiawei Han. 2013. Mining heterogeneous information networks: a structural analysis approach. SIGKDD Explorations 14, 2 (2013), 20–28.
 Sun et al. (2011) Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim: Meta pathbased topk similarity search in heterogeneous information networks. In VLDB.
 Sun et al. (2009) Yizhou Sun, Yintao Yu, and Jiawei Han. 2009. Rankingbased clustering of heterogeneous information networks with star network schema. In KDD.
 Tang et al. (2015a) Jian Tang, Meng Qu, and Qiaozhu Mei. 2015a. PTE: Predictive text embedding through largescale heterogeneous text networks. In KDD. ACM.
 Tang et al. (2015b) Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015b. Line: Largescale information network embedding. In WWW.
 Tang et al. (2008) Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnetminer: extraction and mining of academic social networks. In KDD.
 Veličković et al. (2018) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In ICLR.
 Wang et al. (2016) Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining.
 Xu et al. (2018) Linchuan Xu, Xiaokai Wei, Jiannong Cao, and Philip S Yu. 2018. On Exploring Semantic Meanings of Links for Embedding Social Networks. In WWW.
 Yu et al. (2014) Xiao Yu, Xiang Ren, Yizhou Sun, Quanquan Gu, Bradley Sturt, Urvashi Khandelwal, Brandon Norick, and Jiawei Han. 2014. Personalized entity recommendation: A heterogeneous information network approach. In WSDM. ACM.
 Zhang and Hasan (2017) Baichuan Zhang and Mohammad Al Hasan. 2017. Name Disambiguation in Anonymized Graphs using Network Embedding. In CIKM.
 Zhang et al. (2017) Chao Zhang, Liyuan Liu, Dongming Lei, Quan Yuan, Honglei Zhuang, Tim Hanratty, and Jiawei Han. 2017. TrioVecEvent: EmbeddingBased Online Local Event Detection in GeoTagged Tweet Streams. In KDD. ACM.
 Zhang and Chen (Zhang and Chen) Muhan Zhang and Yixin Chen. WeisfeilerLehman neural machine for link prediction. In KDD. 575–583.
 Zhuang et al. (2014) Honglei Zhuang, Jing Zhang, George Brova, Jie Tang, Hasan Cam, Xifeng Yan, and Jiawei Han. 2014. Mining querybased subnetwork outliers in heterogeneous information networks. In ICDM.
Comments
There are no comments yet.