1. Introduction
Knowledge graphs (KGs) such as DBpedia and Freebase that encode statements about the world around us have attracted growing attention from multiple fields, including question answering (Hamilton et al., 2018; Mai et al., 2019), knowledge inference (Neelakantan et al., 2015), recommendation systems (Ying et al., 2018), and so on. By their very nature KGs are far from complete as the state of the world evolves constantly. This has motivated work on automatically predicting new statements based on known statements. Among these inference tasks link prediction has become a main focus of statistical relational learning (SRL) (Koller et al., 2007).
A KG encodes structural information about entities and the abundant relations among them as a directed labeled multigraph, where entities are represented as nodes and relations between them as labeled, directed edges. Accordingly, in the Semantic Web context a statement in a KG can be represented as a triple , where is the head entity, the relation, and the tail entity, respectively. The connectivity among triples in KGs provides the basis for link prediction.
Since the symbolic representations of KGs prohibit them from directly being incorporated in many machine learning tasks, recently many studies have proposed to embed entities and relations of a KG into lowdimensional vector spaces
(Bordes et al., 2013; Yang et al., 2014; Lin et al., 2015; Trouillon et al., 2016; Schlichtkrull et al., 2018; Yan et al., 2019), which can be further unitized in multiple downstream tasks, e.g., the aforementioned link prediction. Along this line, there are two main branches (Wang et al., 2017): (1) translationbased methods, which predict the existence of a triple by measuring the distance between the head entity and the tail entity after a translation enforced by the corresponding relation, such as TransE (Bordes et al., 2013), TransD (Ji et al., 2015), and TransR (Lin et al., 2015) and (2) Semantic Matching Energy based methods, which measure the existence of a triple as the compatibility of two entities and their relation in latent vector space, e.g., RESCAL (Nickel et al., 2011), DistMult (Yang et al., 2014), ComplEx (Trouillon et al., 2016). More recently, there are some other ideas. For example, rather than defining a relation as a translation from the subject to the object, Sun et al. (Sun et al., 2019) thought of a relation as a rotation from the subject to the object in the complex vector space and proposed RotatE, which was the first model that can handle symmetry/antisymmetry, inversion, and composition relations simulatenoiusly. Their experiments demonstrated the effectiveness of this assumption. More details about these methods can be found in Section 4.Although there are multiple successful stories in both branches, these aforementioned models are all trained on individual triples independently regardless of their local neighborhood structures. Noticing this downside, Schlichtkrull et al. (Schlichtkrull et al., 2018) state that explicitly modeling local structure can be an important supplement to help recover missing statements in KGs. Inspired by the success of graph convolutional networks (GCN) (Kipf and Welling, 2016) in modeling structured neighborhood information of unlabeled and undirected graphs with convolution operations, the authors proposed a GCNbased method to model knowledge graphs (RGCN). In RGCN, which is an encoder, the embedding of each entity is learned based on its up to degree neighboring entities by using graph convolution layers. Then the encoder is trained jointly with a taskspecific decoder, e.g., a DistMultlike decoder, to predict links.
The experimental results of applying RGCN demonstrate the importance of integrating neighborhood information in knowledge graph embedding models. RGCN aims at learning entity embeddings even though it utilizes relationspecific weight matrices. The relation embeddings are learned in the taskspecific decoder while the learned relationspecific matrices in the encoder are discarded. Consequently, without a taskspecific decoder for learning relation embeddings, RGCN cannot directly support tasks such as link prediction. Even if an extra decoder is available, the encoderdecoder framework runs into another problem of repeated introduction of relationspecific parameters in both the encoder side (relationspecific weight matrices) and the decoder side (relation embeddings). As a result, the number of parameters increases.
To address the issue, we propose a novel model inspired by RGCN (Schlichtkrull et al., 2018), a GCNbased knowledge graph encoder framework which can learn entity embeddings and relation embeddings simultaneously by performing relationspecific transformations from head entity embeddings to tail entity embeddings, hence called TransGCN. In principle, any presumed transformation assumption from the subject to the object, such as translation assumption, rotation assumption, etc., can be exploited in the proposed framework. Take the translation assumption as an example, specifically in which translation operators acted by relations are resorted to connect entities in a KG. The basic idea of TransGCN is illustrated in Figure 1. In such a scenario, TransGCN first translates the embeddings of 1degree neighbors of one center entity with their specific relation embeddings. The resultant embeddings serve as the initial
embedding estimations
of the center entity . Then a convolutional operation is performed over these initially estimated embeddings to derive a new embedding for each , which encodes local structural information of the center entity. Similar to RGCN, aggregated structural information and selfloop information of a node are combined for entity embedding updates. Moreover, we also define a novel relation embedding convolution process so that the entity and relation embeddings can be handled in a layerbased manner as GCNs do.The research contributions of our work are as follows:

We propose that transformation assumptions in which relations are assumed as transformation operators transforming the subject entity to the object entity can be utilized to convert a heterogeneous neighborhood in a KG into a homogeneous neighborhood, which can be readily utilized within a GCNbased framework.

We develop a novel GCNbased knowledge graph encoder framework called TransGCN which can encode entity and relation embeddings simultaneously. Compared with RGCN, this method has less parameters and can be directly used for link prediction.

Based on the transformation assumptions behind TransE and RotatE, respectively, we instantiate our GCN framework. Experimental results on FB15K237 and WN18RR show that two TransGCN models achieve substantial improvements against the stateoftheart methods on both datasets.
The paper is structured as follows. In Section 2 we elaborate on the main idea of the TransGCN framework. Experimental details on FB15K237 and WN18RR are presented in Section 3. In Section 4, we introduce two branches of learning methods on graphs. One is the classic translationbased models and the other are GCNbased approaches. Section 5 concludes this work and suggests future research directions.
2. Proposed Architecture
RGCN model does not learn relation embeddings and thereby would not be directly utilizable for link prediction without a decoder. Moreover, RGCN model repeatedly introduces relationspecific parameters in both the encoder side and the decoder side, which results in an increase in the number of parameters. We argue that the encoder alone for knowledge graph applications should encode entity and relation embeddings at the same time to reduce the number of parameters (thus helping alleviate the problem of overfitting) and thereby to improve training efficiency.
To address the issues, we propose a unified encoder framework based on GCN to learn entity and relation embeddings simultaneously, in which a presumed transformation assumption performed by relations is used to convert a heterogeneous neighborhood in a KG to a homogeneous one. This is subsequently used in a traditional GCN framework. Both entity embeddings and relation embeddings are learned in a convolutional layerbased manner. A knowledge graph , where is the set of nodes/entities and is the set of labeled edges, contains statements in the form of a set of triples , where , , and represent the head entity, the relation, and the tail entity, respectively. In the following, we use the bold text to refer to embeddings and we will use and interchangeably.
2.1. Handling a Heterogeneous Neighborhood in a KG
Traditional GCNs (Kipf and Welling, 2016) operate on an unlabeled undirected graph which consists of nodes of the same type and relations of the same type. This means that each edge has the same semantics and the neighborhood of a node is homogeneous. We call this a homogeneous neighborhood. Homogeneity makes it easier to aggregate the local neighborhood information around a node. For example, in an undirected unlabeled academic collaboration network shown in Figure 2(a), simply summing up information from Wendy Hall, Dan Connolly, Ora Lassila and James A. Hendler as messages transmitted to Tim BernersLee is reasonable. There is no need to consider the differences in messages since their relations in such a graph are the same.
However, in a knowledge graph such as shown in 2(b), using such oversimplified summations would be problematic. Neighboring entities are linked to the center entity via different relations in different directions. For instance, the relation DeathCause is very different from the relation BirthPlace and their directions to Vantile_Whitfield are pointing in opposite directions. We call this a heterogeneous neighborhood. We argue that in order to make a KG be easily handled by a GCNbased framework, it is necessary to convert a heterogeneous neighborhood in a KG to a homogeneous one. In this work, we propose to approach this challenge by assuming relations in KGs are transformation operations which transform the head entity to the tail entity.
Common ways of transformation between two entities include translation, rotation, reflection, and so on. In any such a transformation, a statement in KGs can be interpreted as that the head entity is transformed to the tail entity by a relation. More specifically, the tail entity in a statement may be the head entity after being translated/rotated. Accordingly, the embedding of a tail entity can be estimated by the head entity after a relationspecific transformation operation. For ease of generality, a statement following a transformation assumption can be written as:
(1) 
where and are defined as two transformation operators, which vary from different assumptions. We will specify them later in 2.3 and 2.4. The diversity of relation types and the direction of relations are two main characteristics of heterogeneity of heterogeneous graphs, e.g. KGs. Obviously, in this equation, the fact that each relation type is encoded differently takes care of the diversity of relation types and the transformation operators are usually specially designed to address the relation direction.
Based on the transformation assumption, we define the estimations of a central entity derived from connected entities with corresponding relations as the embeddings of neighbors of the entity. Take TransE as an example. Given an entity with an outgoing triple , we define the estimation () of based on as the embedding of one neighbor of . Similarly, for an incoming triple of , the embedding of another neighbor is , which is another estimation of the central entity . More concretely, in Figure 2(b), the estimations of the entity from incoming triples can be expressed as , while embedding estimations from outgoing triples can be expressed as .
Formally, under any transformation assumption, the embedding estimations of an entity can be shown as follows:
(2) 
where denotes all the triples associated with , consisting of as incoming triples and as outgoing triples, and and both are the sets of the estimated embeddings derived from incoming and outgoing neighbors, respectively.
After these transformation operations along different triple paths, the resultant estimated embeddings for the center entity should have the same semantics to the true center entity, by which the heterogeneous neighborhood in a KG is converted to a homogeneous one that can be easily handled by the GCN framework.
2.2. Model Formulation
Our model can be regarded as an extension of RGCN (Schlichtkrull et al., 2018). In the following, we introduce how our model learns entity and relation embeddings at the same time. Like other existing GCN models (Kearnes et al., 2016; Schlichtkrull et al., 2018)
, our model can be formulated as a special case of Message Passing Neural Networks (MPNN)
(Gilmer et al., 2017), which provide a general framework for supervised/semisupervised learning on graphs.
In general, MPNN defines two phases: a message passing phase for nodes and a readout phase for the whole graph. Since in this paper we care about nodes and relations instead of the whole graph, we focus only on the message passing phase. Basically, this message passing phase of a node is executed times to aggregate multihop neighborhood information and is composed of message passing functions and node update functions , where denotes the lth hidden layer. mainly aggregates messages from local neighbors, while combines with selfloop information in the previous step. Both of these two functions are differentiable. In addition, Gilmer et al. (Gilmer et al., 2017) indicated that one could also learn edge features by introducing similar functions for all the edges in a graph, but so far only Kearnes et al. (Kearnes et al., 2016) have implemented this idea. To fit it into our task, we instantiate and for message propagation and entity embedding update for each entity , and additionally introduce the update rule for a relation.
(3) 
(4) 
where
denotes the hidden representation of entity
in the th layer with a dimensionality of . is a layerspecific matrix. and are defined in Eq. 2. is an entityrelated normalization constant that could be the total degree of .is the activation function, e.g.,
.Basically, there are two terms in Eq. 3
that are used to encode local structural information for entity update representing messages from incoming relations and outgoing relations, respectively. The messages from incoming/outgoing relations are first accumulated by an elementwise summation and then are passed through a linear transformation. Then in the next step (Eq.
4), these messages are combined with selfloop information by simply adding them up to update entities. This idea is inspired by the skipconnections in ResNet (He et al., 2016) so that our model can perform at least as well as the simple transformationbased model instantiated in this framework. Figure 3 illustrates the computation graph for an entity. Typically, Eq. 3 considers the firstorder neighbors of entities. One could simply stack multiple layers to allow for multihop neighbors.In addition, we realize that every update of entity embeddings in Eq. 3 and Eq. 4 may transform the original vector space. Consequently, the relationships between relations and entities would be affected, which makes it impossible to perform presumed transformation operations between them in the next layer. To address this problem, instead of applying a similar message passing mechanism for relations as in Eq. 3 and Eq. 4, for ease of efficiency, we introduce a transformation matrix operated on relation embeddings for each layer. We assume that the introduced matrix can project relation embeddings into a vector space that has the same relation to the new entity vector space as they have before. Note that this is a soft restriction on the vector space; one could choose other more strict restrictions as well. For example, enforce constraints on the basis vectors of entities and relations so that these two vector spaces are ensured to be the same. Following the soft restriction, the update rule of relations in each layer is formed as follows:
(5) 
where is the hidden state of relation in the th layer with a dimensionality of . is a linear transformation across relations in the th layer.
In the following subsections, we instantiate our TransGCN framework by using two different transformation assumptions. One is based on the translation assumption and TransE is selected owe to its simplicity and popularity, while the other follows the rotation assumption and RotatE is chosen to achieve this assumption.
2.3. TransEGCN model
Under the translation assumption, the relation is assumed to serve as a translation from the head entity to the tail entity. For an entity , Eq. 1 can be instantiated as follows:
(6) 
where and are and , respectively.
Like in TransE, the score function for a triple is defined according to:
(7) 
where , and are the embeddings of , and in the last layer, respectively.
Similar to previous studies (Wang et al., 2014)
, this model is trained with negative sampling. For each existing triple in a KG, a certain number of negative samples (e.g., one positive triple with 10 negative samples) are constructed by replacing either the head entity or the tail entity randomly. Positive samples are expected to have high scores while negative samples are expected to have low scores. A marginbased ranking function is written as the loss function for training:
(8) 
where is used to obtain the maximum between and , is the margin, is the set of observed triples in a KG, and is the set of negative samples associated with the positive sample . It is noteworthy that in this implementation, all the embeddings are in the real vector space.
2.4. RotatEGCN model
Another assumption recently explored in knowledge graph embedding is rotation. Sun et al. assumed that the tail entity is derived from the head entity after being rotated performed by a relation in the complex vector space (Sun et al., 2019). Accordingly, we can formalize the neighbors of an entity :
(9) 
where and are and , respectively. More specifically, is the elementwise product in the complex space and is the complex conjugate of . . Note that here the existence of and rather than different transformation operators guarantees the relation direction is considered naturally.
Similarly, the distance function serves as the score function:
(10) 
To keep consistent with RotatE, we adopt selfadversarial negative sampling to train the model rather than vanilla negative sampling. The main argument of selfadversarial negative sampling is that negative triples should have different probabilities of being drawn as training continues, e.g. many triples may be obviously false, thus not contributing any meaningful information. Therefore, a probability distribution
is used to draw negative samples according to the current embedding model.(11) 
where is a constant which controls the temperature of sampling and
is the sigmoid function.
Then the above probability of a negative sample is treated as the weight of the sample to help construct the loss function. For a positive sample , the loss function can be written as follows:
(12) 
where all the embeddings are in the complex vector space.
3. Experiment
To test the performance of our models, we evaluate our TransGCN models on the task of link prediction on two datasets: FB15K237 and WN18RR.
3.1. Datasets
In previous studies, the performance of link prediction methods was commonly evaluated on two datasets, namely FB15K from Freebase and WN18 from WordNet. However, there are inverse triples in both training and testing data, resulting in methods showing better performance on these datasets by means of memorizing these affected triples rather than having a better ability of prediction. Therefore, we use the two filtered data sets: FB15K237 and WN18RR, proposed in (Toutanova and Chen, 2015) and (Dettmers et al., 2018), respectively, in which all the inverse triplet pairs were removed. These two datasets have been shown to be more challenging for models to perform link prediction (Schlichtkrull et al., 2018). Table 1 shows basic statistics for these two datasets.
Dataset  FB15k237  WN18RR 

Entities  14,541  40,943 
Relations  237  11 
Training triples  272,115  86,835 
Validation triples  17,535  3,034 
Test triples  20,466  3,134 
3.2. Experiment Setup
Evaluation metrics.
In the testing phase, for each triple, we replace the head entity with all other entities in current KG, and calculate scores for those replaced triples and the original triple using the scoring function specified in section 2. Since some of the replaced triples might also appear in either training, validation or test set, we then filter these triples out and produce a filtered ranking which we denote as the filtered setting. Then those triples are ranked in a descending order of scores and the rank of the correct triple in this ranking list is used for evaluation. The whole procedure is repeated while replacing the tail entity instead of the head entity. Following previous studies (Bordes et al., 2013)
, we adopt Mean Reciprocal Rank (MRR) and Hits@k as evaluation metrics. We report filtered MRR scores as well as Hits at 1, 3, and 10 for the
filtered setting. For all the metrics, higher values mean better performance.Baselines.
Six baselines (, , , , and ) are selected for the evaluation. is a standard translationbased model, which is simple but performs well on most datasets. This model is wrapped in our TransEGCN model to achieve the conversion from heterogeneous neighbors to homogeneous neighbors. , as a factorization model, also shows promising performance on standard datasets. Furthermore, our model is compared with (Trouillon et al., 2016), one powerful stateoftheart model for link prediction, and (Schlichtkrull et al., 2018), a strong baseline of modeling directed labeled graph. uses a multilayer convolutional network to model the iterations between entities and relations(Dettmers et al., 2018). is the most recent KGE model, which is built on the rotation assumption (Sun et al., 2019). This model is exploited in our RotatEGCN model to derive homogeneous neighbors.
Implementation details.
To optimize our TransGCN models, we used the Adam optimizer (Kingma and Ba, 2014) and fixed the learning rate . The best parameters were selected when filtered MRR achieved the best performance on respective validation sets. First, for both models, the embeddings of entities and relations produced by these two base models, i.e. TransE and RotatE, were used to initialize the embeddings needed in our models. For TransE, the embeddings pretrained by Nguyen et. al in (Nguyen et al., 2018) were utilized. Then, the number of layers in GCN was selected by comparing the experimental results on validation set. Finally,
was the best choice for both datasets. For RotatE, we trained this model by using the implementation provided by the authors to gain initial embeddings of entities and relations. Then most of the hyperparameter values of RotatE remained unchanged except that we ignored the batch size, since in GCN the batch size is achieved by setting graph batch size, which we leave as default. The only tuned parameter was the number of layers
. Finally, the best parameter settings in our experiment are on FB15K237 and on WN18RR.3.3. Results
Main Results.
The results for both datasets are reported in Table 2. Results on the baseline models , , , , and are taken from (Sun et al., 2019), and RGCN’s results are taken from (Schlichtkrull et al., 2018).
FB15K237  WN18RR  

MRR(Filtered)  Hit@1  Hit@3  Hit@10  MRR(Filtered)  Hit@1  Hit@3  Hit@10  
DistMult  0.241  0.155  0.263  0.43  0.39  0.44  0.49  0.447 
TransE  0.294      0.465  0.226      0.501 
TransEGCN  0.315  0.229  0.324  0.477  0.233  0.203  0.338  0.508 
ComplEx  0.247  0.158  0.275  0.428  0.44  0.41  0.46  0.51 
RGCN  0.248  0.153  0.258  0.417         
ConvE  0.325  0.237  0.356  0.501  0.43  0.40  0.44  0.52 
RotatE  0.338  0.241  0.375  0.533  0.476  0.428  0.492  0.571 
RotatEGCN  0.356  0.252  0.388  0.555  0.485  0.438  0.51  0.578 
In Table 2, one important observation is that our TransEGCN model and RotatEGCN model both outperformed their base models, i.e. TransE and RotatE, on both datasets in terms of all the metrics by noticeable margins, which demonstrates the effectiveness of our proposed framework. Besides, the improvements restate the significance of explicitly incorporating local structural information in knowledge graph embedding learning. Moreover, compared with all the other baselines, the RotatEGCN model was consistently better while the TransEGCN model performed differently on the two datasets. To be specific, TransEGCN performed better than ComplEx on FB15K237 while worse on the other dataset. This can be interpreted by the difference between TransE and ComplEx that TransE is not good at dealing with relation types except 1to1 relations, as pointed out by researchers before. During the training process, for each triple (h,r,t), TransE enforces to be as close as possible to , which would be problematic when dealing with 1toN, Nto1, and NtoN relations. For example, given a 1toN relation , we have two triples and . If holds, and should have the same vector representations. To meet this requirement, is close to the center of all the positive tails at the end of training instead of a particular tail (which may be the correct prediction). Therefore, the performance of TransE dropped extremely on WN18RR, where there are four times more entities but 20 times less relations than those in FB15K237. The superior performance of RotatEGCN model over TransEGCN model indirectly showed the importance of a base model used in our framework.
Comparison with RGCN
It is necessary to elaborate on the comparison between our models and the RGCN model that inspired our work. The experimental results showed that our models (TransEGCN, RotatEGCN model) both consistently yielded better results with improvements of 10.8% and 6.7% in terms of on FB15K237, respectively. We believe the improvements are attributed to two reasons. First, thanks to the idea of converting heterogeneous neighbors into homogeneous neighbors in KGs, proposed in this paper, it successfully captured both local structural information by considering entities and relations in the neighborhood and semantic information residing within transformation operators. Besides, by doing so, relations in a KG were just modeled once and simultaneously with entities, and relationspecific matrices in RGCN being replaced by shared matrices potentially facilitated the encoding of more complex latent information. Thus, fewer parameters were needed to learn in our models, which helps alleviate the problem of overfitting. In total, our TransEGCN model has fewer parameters than RGCN in terms of basis decomposition regularization and fewer parameters in terms of blockdiagonal decomposition, where denotes the number of basis matrices, denotes the number of layers, denotes the dimension of a hidden layer, and denotes the number of relations. As for our RotatEGCN model, we followed the implementation proposed by (Sun et al., 2019). They used real numbers to express complex numbers by treating the first half dimensions of entity embeddings as the real part and the last half as the imaginary part. Therefore, the dimensions of entities are doubled in the complex vector space. Finally, our RotatEGCN model has fewer parameters than RGCN (basis decomposition) and fewer parameters than RGCN (blockdiagonal decomposition), respectively, in which is the number of entities.
Performance on Entities of different degrees
Figure 4 depicts the performance of our models on FB15K237 validation set as functions of the entity degree. It can be observed that in the beginning the performance of both models increased a lot with the increasing size of neighborhood, while after a threshold, it dropped significantly. We believe it showed that a few neighbors were only able to provide limited local structural information, thus leading to poor performance; by contrast, too many neighbors brought too much mixed information, which made models hard to optimize. In the future, more work should focus on how to deal with these two extreme conditions.
MRR  Hit@10  

TransEGCN1  0.315  0.474 
TransEGCN2  0.297  0.453 
TransEGCN3  0.273  0.421 
RotatEGCN1  0.347  0.546 
RotatEGCN2  0.356  0.555 
RotatEGCN3  0.331  0.525 
Analysis of multihop neighbors
Table 3 describes the prediction performance of our two models on FB15K237 in terms of multihop neighbors, namely 1hop, 2hop and 3hop neighbors. TransEGCN model favored 1hop neighbors while RotatE was able to leverage more neighborhood information. The difference lies in that RotatE has a stronger ability to deal with complex relations and capture more accurate entity and relation information. But both models performed the worst when 3hop neighbors were considered, which were even worse than the base models. We think this may be caused by spectral convolutional filters, since it has been proven to have a smooth effect that could dilute the useful information(Li et al., 2018a).
4. Related Work
Here we review previous work as it relates to our model.
4.1. Transformationbased Models
Until now, there exist two transformation assumptions in the literature  Translation and Rotation. A multitude of studies have explored these assumptions to achieve knowledge graph embedding learning for downstream tasks.
Translationbased models, also known as translational distance models, employed distancebased functions to model entities and relations in a KG. The key idea behind this kind of models is that for a positive triple , the head entity should be as close as possible to the tail entity through the relation, serving as a translation.
The most representative model is TransE (Bordes et al., 2013) because of its simplicity and efficiency. It encodes the observed triples in a KG and projects entities and relations into the same vector space. TransE directly implemented the vanilla idea of translation, which enforces when holds. However, Wang et al. (Wang et al., 2014)
argued that TransE cannot deal with Nto1, 1toN and NtoN relations and proposed a new model called TransH, which introduces a hyperplane
for each relation and requires that the projected head entity on should be close to the projected tail entity on after a translation . TransR (Lin et al., 2015) follows a similar idea, but it introduced relationspecific translation spaces. In such a way, relations and entities can be represented in respective vector spaces. TransD (Ji et al., 2015) and TranSparse (Ji et al., 2016) are two other alternative approaches to simplifying TransR. In addition, another branch of improving TransE is to relax the strict restriction of , such as TransF (Feng et al., 2016). For example, TransM assigned each triple with a relationspecific weight , and redefined the scoring function as . For a comprehensive review of these methods, please refer to (Wang et al., 2017). Our TransEGCN model was based on this translation assumption and TransE from the first branch was exploited for carrying out translation operations.Rotation assumption was recently exploited by Sun et al.(Sun et al., 2019). Motivated by the Euler’s identify that indicates a rotation in the complex plane can be achieved a unitary complex number, Sun et al. proposed a RotatE model, which projected both entities and relations into the complex vector space and treated each relation as a rotation from the head entity and the tail entity. The most attractive characteristics of RotatE is its ability to model and to infer multiple relation patterns, including symmetry/antisymmetry, inversion and composition. This model also adopted a distancebased score function to evaluate the compatibility of two entities and their relations, as shown in 10. Robust experimental results on benchmark datasets demonstrated the effectiveness of RotatE.
4.2. Graph Convolutional Networks
Our TransGCN framework is primarily motivated by plenty of works on modeling largescale graph data using GCNs (Kipf and Welling, 2016)
. Generally, GCN can be classified into: (1) spectralbased approaches, which introduce spatial filters from the graph signal processing perspective
(Shuman et al., 2013; Li et al., 2018b); (2) spatialbased approaches, which simply interpret a graph convolutional operation as aggregating information from neighbors (Gilmer et al., 2017; Hamilton et al., 2017; Gao et al., 2018). Although spectralbased methods seem appealing in that they can be supported by the spectral graph theory, in practice spatialbased methods perform better in terms of efficiency, generality and flexibility (Wu et al., 2019).Interestingly, Kipf and Welling (Kipf and Welling, 2016) discovered that when approximated by the order Chebyshev polynomials, the graph convolution is localized in space. That is, to some degree spatialbased approaches are the same as spectralbased approaches. Based on this, they introduced a simple but efficient message propagation rule conditioned on nodes and adjacency matrix of a graph for the semisupervised node classification task.
To extend the GCN model (Kipf and Welling, 2016) to directed labeled graphs(Schlichtkrull et al., 2018) proposed an RGCN model, which is the first work that applied the GCN framework to knowledge graphs for link prediction. The main contribution of this work lies in the introduction of relationspecific weight matrices in each layer of a neural network such that relationspecific messages can be propagated over graphs for entity update. The message propagation method for node is defined as follows:
(13) 
where denotes a relationspecific weight matrix in the th layer, another layerspecific weight matrix, the set of relation types and the set of neighbors of node in terms of relation .
To perform the task of link prediction, RGCN, as an encoder, must cooperate with a decoder, such as DistMult. Although this method achieves promising performance in this task, there are some limitations. This RGCN model alone cannot learn relation embeddings, which are very important for knowledge graph applications, since they only define message propagation strategies for node update. On the other hand, despite the fact that RGCN with an extra decoder can learn relation embeddings for link prediction task, relation information is repeatedly incorporated in both encoder side and decoder side. As a result, the number of parameters increases. In this paper, we concentrated on solving these issues by finding a more reasonable way to extend traditional GCN to KGs.
5. Conclusion
In this paper we proposed a unified GCN framework (TransGCN) to learn embeddings of relations and entities simultaneously.To handle the heterogeneous characteristics of knowledge graphs when using traditional GCNs, we came up with a novel way of converting a heterogeneous neighborhood into a homogeneous neighborhood by introducing transformation assumptions, e.g., translation and rotation. Under these assumptions, a relation is treated as a transformation operator transforming a head entity to a tail entity. Translation and rotation assumptions were explored and TransE and RotatE model were wrapped in TransGCN framework, respectively. Any other transformationbased method could work as transformation operations. By doing so, nearby nodes with associated relations were aggregated as messages propagated to the center node as traditional GCNs did, which benefited the entity embedding learning. In addition, we explicitly encoded relations in the same GCN framework so that relation embeddings can be seamlessly encoded with entities at the same time. In this sense, our TransGCN framework can be interpreted as a new (knowledge) graph encoder which produces both entity embeddings and relation embeddings. This encoder can be further incorporated into an encoderdecoder framework for any other tasks. Experimental results on two datasets  FB15K237 and WN18RR showed that our unified TransGCN models, both TransEGCN and RotatEGCN models consistently outperformed the baseline  RGCN model by noticeably large margins in terms of all metrics, which demonstrated the effectiveness of the conversion idea in dealing with heterogeneous neighbors. Additionally, both models performed better than their base models, i.e., TransE and RotatE, showing the significance of explicitly modeling local structural information in knowledge graph embedding learning.
In this paper, although relations are encoded and learned in our GCN framework, they are updated simply by being passed through a separated linear transformation. In the future, we plan to explore approaches to directly operating convolutions on relations so that the local structure of graphs could also play a role in relation embedding learning. In addition, a weighting mechanism should be studied to measure unequal contributions of neighbors.
References
 Translating embeddings for modeling multirelational data. In Advances in neural information processing systems, pp. 2787–2795. Cited by: §1, §3.2, §4.1.

Convolutional 2d knowledge graph embeddings.
In
Proceedings of the 32th AAAI Conference on Artificial Intelligence
, pp. 1811–1818. External Links: Link Cited by: §3.1, §3.2.  Knowledge graph embedding by flexible translation. In Fifteenth International Conference on the Principles of Knowledge Representation and Reasoning, Cited by: §4.1.
 Largescale learnable graph convolutional networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1416–1424. Cited by: §4.2.
 Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pp. 1263–1272. Cited by: §2.2, §2.2, §4.2.
 Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1024–1034. Cited by: §4.2.
 Embedding logical queries on knowledge graphs. In NeurIPS, Cited by: §1.

Deep residual learning for image recognition.
In
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, Cited by: §2.2. 
Knowledge graph embedding via dynamic mapping matrix.
In
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
, Vol. 1, pp. 687–696. Cited by: §1, §4.1.  Knowledge graph completion with adaptive sparse transfer matrix. In Thirtieth AAAI Conference on Artificial Intelligence, Cited by: §4.1.
 Molecular graph convolutions: moving beyond fingerprints. Journal of computeraided molecular design 30 (8), pp. 595–608. Cited by: §2.2, §2.2.
 Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §3.2.
 Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §1, §2.1, §4.2, §4.2, §4.2.
 Introduction to statistical relational learning. MIT press. Cited by: §1.
 Deeper insights into graph convolutional networks for semisupervised learning. In ThirtySecond AAAI Conference on Artificial Intelligence, Cited by: §3.3.

Adaptive graph convolutional neural networks
. In ThirtySecond AAAI Conference on Artificial Intelligence, Cited by: §4.2.  Modeling relation paths for representation learning of knowledge bases. arXiv preprint arXiv:1506.00379. Cited by: §1, §4.1.
 Relaxing unanswerable geographic questions using a spatially explicit knowledge graph embedding model. In Proceedings of 22nd AGILE International Conference on Geographic Information Science, Jun. 1710, Limassol, Cyprus, Cited by: §1.
 Compositional vector space models for knowledge base inference. In 2015 aaai spring symposium series, Cited by: §1.
 A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network. In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACLHLT), pp. 327–333. Cited by: §3.2.
 A threeway model for collective learning on multirelational data.. In ICML, Vol. 11, pp. 809–816. Cited by: §1.
 Modeling relational data with graph convolutional networks. In European Semantic Web Conference, pp. 593–607. Cited by: §1, §1, §1, §2.2, §3.1, §3.2, §3.3, §4.2.

The emerging field of signal processing on graphs: extending highdimensional data analysis to networks and other irregular domains
. IEEE Signal Processing Magazine 30, pp. 83–98. Cited by: §4.2.  RotatE: knowledge graph embedding by relational rotation in complex space. In International Conference on Learning Representations, External Links: Link Cited by: §1, §2.4, §3.2, §3.3, §3.3, §4.1.
 Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, pp. 57–66. Cited by: §3.1.
 Complex embeddings for simple link prediction. In International Conference on Machine Learning, pp. 2071–2080. Cited by: §1, §3.2.
 Knowledge graph embedding: a survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering 29 (12), pp. 2724–2743. Cited by: §1, §4.1.
 Knowledge graph embedding by translating on hyperplanes. In TwentyEighth AAAI conference on artificial intelligence, Cited by: §2.3, §4.1.
 A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596. Cited by: §4.2.
 A timeaware inductive representation learning strategy for heterogeneous graphs. Cited by: §1.
 Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575. Cited by: §1.
 Graph convolutional neural networks for webscale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’18, New York, NY, USA, pp. 974–983. External Links: ISBN 9781450355520, Link, Document Cited by: §1.