1 Introduction
Knowledge Graphs (KGs) are able to provide unstructured knowledge in the simple and clear triple format <head, relation, tail>
. They are essential in supporting many natural language processing applications. Since KGs are constructed separately from heterogeneous resources and languages, they might use different expressions to indicate the same entity. As a result, different KGs often contain complementary contents and cross-lingual links. It is essential to integrate heterogeneous KGs into a unified one, thus increasing the accuracy and robustness of knowledge-driven applications.
To this end, many efforts have been paid to study the problem of Entity Alignment, which aims at linking entities with the same identities from different KGs. Earlier approaches for entity alignment usually rely on manually created features [24], which is labor-intensive and time-consuming. Recent studies focused on embedding-based approaches as they are capable of representing and preserving the structures of KGs in low-dimensional embedding spaces. Generally speaking, there are two categories of approaches: translation-based and GNN-based. The translation-based models [34, 60, 7, 35] extend the idea of trans-family models, e.g. TransE, for knowledge graph embedding to learn the embedding of entities and relations in KGs. This kind of method is good at learning knowledge embedding but not satisfactory in sparse graph entity alignment. Recently GNN-based models [44, 32] employ the Graph Convolutional Network (GCN) [16] to make better use of the pre-aligned seeds to learn the entity embedding by utilizing the neighbor information so as to resolve such limitations and have achieved promising results. Some recent works [18] further jointly learned relational knowledge and neighborhood consensus [30] to get more robust and accurate predictions. However, since the learning objectives of relational knowledge and neighborhood consensus are different, it will lead to different optimization directions. As a result, the model would fail to learn the useful information due to the overfitting problem. For example, we can see in Figure 1 that existing models tend to wrongly align New York State in English and New York City in Chinese due to the strong hint of neighborhood consensus given by the same neighbors. However, in such a situation, the difference of relation semantic information between adjoin and locatedIn is more crucial in alignment to distinguish these two different entities, which cannot be treated properly without making a balance between the two learning objectives.
To address this problem, we propose Relational Knowledge Distillation for Entity Alignment (RKDEA), a GCN based model with relational knowledge distillation framework for entity alignment. Following previous studies [46, 47, 48, 26], we use a Highway Gated GCN model to learn the entity embedding. To decide the portion of relational knowledge embedding and neighborhood consensus in the training objective, we take advantage of the knowledge distillation [12] mechanism. More specifically, we first separately train two models with objectives of learning relational knowledge and neighborhood consensus. Next, we take the model with relational knowledge objective as teacher and the model with neighborhood consensus objective as student. Then we employ a relational distillation method to transfer relation information from the teacher model to the student. To effectively control the overall training objectives, we propose an adaptive temperature mechanism instead of treating it as a static hyper-parameter as previous studies [12, 31, 54, 27] did to adjust the weight of two kinds of information. We conduct extensive evaluations on several publicly available datasets that are widely used in previous studies. The experimental results and further analysis demonstrate that RKDEA can better integrate knowledge embedding and neighborhood consensus and thus outperforms state-of-the-art methods by an obvious margin.
2 Related Work
2.1 Knowledge Graph Entity Alignment
Knowledge Graph has a wide scope of application scenarios like similarity search [58, 41, 45, 23], information extraction [40], record de-duplication [19], and can help analyze different kinds of of data such as health [59], spatial [57, 21, 50] and text [39]. To automatically capture deep resemblance of graph structure information between heterogeneous KGs, recent studies focused on embedding-based approach. Based on the methodology, they could be categorized into two types: translation-based and GNN-based ones.
Translation-based Approaches For the representation learning of a single KG, there have been many studies, such as TransE [3],TransH [42], TransR [20] and TransD [14]. These methods utilize a scoring function to model relational knowledge and therefore obtain entity and relation embeddings. Translation-based approaches are based on such studies. MTransE [7] applies TransE into entity alignment task with various transition techniques between different KGs. JAPE [34]
presents a way to combine structure and attribute information to jointly embed entities into a unified vector space. This kind of method is capable of capturing complex relation semantic information with the help of triple-level modeling. However, it is difficult for them to perceive the structural similarity of neighborhood information.
GNN-based Approaches
Recently, the Graph Neural Network(GNN) has achieved tremendous success on the applications related to network embedding. GNN-based entity alignment methods incorporate neighborhood information with GNNs to provide global structural information. GCN-Align
[44] directly applies the Graph Convolutional Network (GCN) as embedding module on entity alignment. MuGNN [5] proposes self and cross-KG attention mechanisms to better capture the structure information in the KGs. RDGCN [46] leverages relations to improve entity alignment with dual graphs. GMNN [49] and NMN [48] incorporate long-distance neighborhood information to strengthen the entity embeddings. SSP [26] jointly models KG global structure and local semantic information via flexible relation representation. KECG [18] trains model with knowledge embedding and neighborhood consensus objectives alternately.Nevertheless, all these methods fail to address the problem of balancing the different learning objectives of relational knowledge and neighborhood consensus. Comparing with them, our approach could jointly learn knowledge embedding and neighborhood consensus in a more structured, fine-grained way via knowledge distillation.
2.2 Knowledge Distillation
Knowledge Distillation(KD) is a branch of transfer learning, which indicates transferring knowledge from a complex model (teacher) to a concise model (student). Typically, KD aims at transferring a mapping from inputs to outputs learned by teacher model to student model. By leveraging KD, the student model could learn implicit knowledge by incorporating an extra objective of the teacher’s outputs so as to gain better performance. It was first introduced to neural network by
[12]. [31]employs additional linear transformation in the middle of network to get a narrower student.
[54, 13, 38] transfer the knowledge in attention map to get robuster and more comprehensive representations. Recently, some works [53, 2, 9] have demonstrated that distilling models of identical architecture, i.e., self-distillation, can further improve the performance of neural network.The challenges for graph representation learning lies in the heterogeneous nature of different graphs. Some recent approaches of applying KD have brought convincing results in solving heterogeneity problem. [17] proposes a novel graph data transfer learning framework with generalized Spectral CNN. [8] transfers similarity of different structure for metric learning. [27] indicates the effectiveness of distance-wise and angle-wise distillation loss in knowledge transfer between different structures. The objective of our work is similar to above studies but there are still many issues to be addressed to propose a reasonable distillation mechanism for the task of entity alignment between KGs.
3 Preliminary
Formally, a knowledge graph is defined as , where indicate sets of entities, relations and attributes, respectively. and denote sets of relation triples and attribute triples, respectively. In this paper, we focus on relation information irrespective of attributes. So the KG could be simplified to , where is the set of relation triples.
Given two heterogeneous KGs, and , Entity Alignment aims at finding entity pairs , and that represent same meaning semantically. In practice, there are always some pre-aligned entity and relation pairs provided as seed alignment. The seed alignments and where means equivalent in semantics, denote semantically equivalent pairs.

4 Methodology
In order to better utilize relational knowledge and neighborhood information, we propose a knowledge distillation based framework to consider knowledge embedding and neighborhood consensus simultaneously.
As shown in Figure 2, our framework consists of three components:
-
A pre-trained two-layers GCN with highway gates as the teacher model to provide relational knowledge, whose objective function is similar to TransE;
-
A two-layers GCN with highway gates as the student model to learn the local graph structure by neighborhood consensus with seed alignments;
-
A knowledge distillation mechanism to transfer relational knowledge from teacher model to student model, specifically an objective of minimizing distance-wise distillation loss.
4.1 Highway Gated GCN
For both the teacher and student models, we utilize a GCN [16] based model to learn the representations of entities and relations. Specifically, we use the highway gated GCN which could capture long-distance neighborhood information by stacking multiple GCN layers as the basic building block for our model. The input of highway gated GCN model is an entity feature matrix , where is the number of entities and is entity feature dimension. For each GCN layer, the forward propagation is calculated as Equation (1):
(1) |
where is the hidden state of the -th GCN layer and ,
is an activation function chosen as
, is an adjacency matrix derived from the connectivity matrix of graphand an identity matrix
of self-connection, denotes the diagonal node degree matrix of , and denote the weights and dimensions of features in layer , respectively.Following [33], we utilize layer-wise highway gates in forward propagation. With the help of stacked GCN layers, rich neighborhood knowledge indicating graph structure information could be captured in learning entity embeddings. The detailed calculating process is as Equation (2):
(2) | |||
(3) |
where
is a sigmoid function;
andare weight matrix and bias vector of transform gate
, respectively; denotes element-wise multiplication; represents the carry gate for vanilla input of each layer opposite to the transform gate for transformed input.4.2 Knowledge Embedding Model
As shown in Figure 2, the pre-trained teacher model aims at learning the knowledge embedding. In this paper, we choose the objective function of TransE [3] as an example. Note that it could also be replaced with other translation-based methods. The relation triple is denoted as a translation equation , where represents head entity, relation and tail entity, respectively. For each triple , we take normalized as scoring function. Following the previous studies, we apply negative sampling to generate negative unreal triples in the pre-training process. The objective function of the knowledge embedding teacher model is shown as Equation (4):
(4) |
where , and are the embedding representations of entities and relations, denotes the aggregation of triples in two KGs, represents the negative sampled triples set derived from , is a margin hyper-parameter with positive values. For the sake of preserving semantic, we construct negative samples by randomly replacing the head or tail entity of an existing triple with other entity with similar semantic.
4.3 Neighborhood Consensus Model
The neighborhood consensus student model has the similar structure to the knowledge embedding teacher model. The only difference between them is the learning objective. While the teacher model learns relational knowledge in triples, the student model learns local graph structure information in neighbors. In order to calculate the neighborhood similarities between entities from different KGs, we utilize an energy function of distance of neighborhood aggregated entity embeddings. Specifically, given an entity pair , the similarity measure denotes as . The learning objective of the neighborhood consensus student model is to minimize the margin-based ranking loss in Equation (5):
(5) |
where denotes positive part of element, is a margin hyper-parameter, and represent sets of positive and negative entity pairs, respectively. For the negative sampling, we choose the nearest neighbors as the negative corresponding entities rather than random sampling. Specifically, given an existing pair , we replace () with the entity () that is closest to () on distance.
4.4 Relational Knowledge Distillation
To integrate relational knowledge and neighborhood information via knowledge distillation, we need to address two issues: (i) how to learn the structural information along with contents via the distillation approach; (ii) how to propose a learning objective that minimizes the difference between the teacher and student models. Therefore, we need to dynamically adjust the contribution of knowledge embedding and neighborhood consensus during distillation.
To keep the relational knowledge in the process of distillation, we borrow the idea of energy function proposed in [27]. The basic idea is that it first randomly samples instances from the training instances. Next the energy function is applied to these instances to describe the relationship between them. Then the loss can be calculated as Equation (6):
(6) |
where are randomly sampled training instances; is the distance measure of potential relational knowledge between teacher and student models; and are the output representations of input in teacher and student models, respectively. With such a training loss, the relational knowledge could be kept in the process of distillation.
Following this formulation, we utilize the same L2 distance measure with the teacher model between head and tail entities in triples as energy function to keep the potential relational information as shown in Equation (7):
(7) |
where is a normalization factor to scale the distance of different vector spaces to the same scale. We empirically define as Equation (8):
(8) |
where represents all triples in two KGs and is the number of triples in .
In order to improve the robustness of outliers, we propose to utilize Huber loss
[15] rather than MSE loss as difference measure between the teacher and the student, which is shown as Equation (9):(9) |
Therefore, the objective of knowledge distillation is specified as Equation (10):
(10) |
where and denote the embedding representations of the teacher and student model, respectively.
Next, we introduce how to dynamically adjust the contribution of alignment loss (for neighborhood consensus) and knowledge distillation loss (for knowledge embedding) in the learning objective of the student model. It is controlled with the hyper-parameter Temperature denoted as . As mentioned above, these two kinds of information may be adversarial due to different optimization directions. To address this problem, should be dynamically adjusted during the training process. Intuitively, in the early stage of training, it should focus on learning the relational knowledge where is more important; while in the late stage when two losses become very small, it should concentrate on the alignment loss to avoid overfitting to relational knowledge. Therefore, instead of using a static value of , we set adaptive value of as shown in Equation (11):
(11) |
where denotes the loss value without gradient and .
5 Experiment
5.1 Datasets
Following many previous studies, we evaluate all methods on the popular DBP15K and DWY100K datasets.
-
DBP15K [34]
is composed of three cross-lingual datasets derived from DBpedia representing three language pairs of KGs respectively, which are DBP
(Chinese to English), DBP (Japanese to English), and DBP (French to English). Each dataset consists of two KGs with hundreds of thousands of relation triples and 15K pre-aligned seed entity pairs along with seed relation pairs. -
DWY100K [35] contains two large-scale cross-domain datasets derived from DBpedia, Wikidata, and YAGO3, denoted as DWY (DBpedia to Wikidata) and DWY (DBpedia to YAGO3). Similar to DBP15K, DWY100K means 100K seed entity alignments in each dataset.
The detailed statistics are shown in Table 1. In all experiments, we utilize 30% of seed alignments in training, which is consistent with previous studies.
Datasets | #Ent. | #Rel. | #Rel. Triples | |
DBP | ZH | 66,469 | 2,830 | 153,929 |
EN | 98,125 | 2,317 | 237,674 | |
DBP | JA | 65,744 | 2,043 | 164,373 |
EN | 95,680 | 2,096 | 233,319 | |
DBP | FR | 66,858 | 1,379 | 192,191 |
EN | 105,889 | 2,209 | 278,590 | |
DWY | DBpedia | 100,000 | 330 | 463,294 |
Wikidata | 100,000 | 220 | 448,774 | |
DWY | DBpedia | 100,000 | 302 | 428,952 |
YAGO3 | 100,000 | 31 | 502,563 |
5.2 Baselines
To better verify the effectiveness of our proposed approach, we compare it with several state-of-the-art embedding-based models. For knowledge embedding oriented methods, we choose MTransE [7], IPTransE [60], SEA [28], RSN4EA [10] as the representatives, which align entities based on relational knowledge. For neighborhood consensus oriented methods, we choose GCN-Align [44], MuGNN [5], KECG [18] and AliNet [36], which apply graph neural network to aggregate neighborhood information for alignment. Among them, KECG explicitly models relational knowledge and neighborhood information with two learning objectives as our RKDEA does. These two methods are the most related ones to our proposed approach.
Since our work focuses on transferring relational knowledge rather than developing a better model for knowledge graphs entity alignment, we only utilize structure information for baselines for fair comparison according to the comprehensive survey [55]. Although there are some other studies in this topic [35, 29, 25, 51, 37, 22], they mainly focus on utilizing other information, such as attribute and description or applying data enhancement, e.g. bootstraping strategy and machine translation. Such approaches are orthogonal to our work and they could also be enhanced on the basis of RKDEA. Therefore, we exclude the comparison with them here.
For ablation study, we design two variants of our RKDEA, i.e., RKDEA (w/o KD) that does not employ knowledge distillation, RKDEA (w/o Temp.) that does not incorporate temperature factor to control the training process.
5.3 Implementation Details
In experiments, we choose the hyper-parameters by grid search as following: Learning rate is among {0.0001, 0.0005, 0.001, 0.005, 0.01}, are in {1.0, 2.0, 3.0}. Specifically, the optimal values for these hyper-parameters are for knowledge embedding teacher model, for neighborhood consensus student model, , . Following previous studies [46, 47, 48], the dimensions of embedding vectors in both teacher model and student model are set to 300 and we use the pre-trained glove
of entity names as input features. For DBP15K, the negative samples are updated each 50 epochs and the numbers of negative samples are set to
and , respectively. For DWY100K, the negative samples are updated every 10 epochs and the numbers of negative samples are set to and, respectively. Following previous work, we use Hits@1 and Hits@10 as the main evaluation metrics.
5.4 Results
Models | DBP | DBP | DBP | ||||||
---|---|---|---|---|---|---|---|---|---|
Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | |
MTransE | 0.308 | 0.614 | 0.364 | 0.279 | 0.575 | 0.349 | 0.244 | 0.556 | 0.335 |
IPTransE | 0.406 | 0.735 | 0.516 | 0.367 | 0.693 | 0.474 | 0.333 | 0.685 | 0.451 |
SEA | 0.424 | ).796 | 0.548 | 0.385 | 0.783 | 0.518 | 0.400 | 0.797 | 0.533 |
RSN4EA | 0.508 | 0.745 | 0.591 | 0.507 | 0.737 | 0.590 | 0.516 | 0.768 | 0.605 |
GCN-Align | 0.413 | 0.744 | 0.549 | 0.399 | 0.745 | 0.546 | 0.373 | 0.745 | 0.532 |
MuGNN | 0.494 | 0.844 | 0.611 | 0.501 | 0.857 | 0.621 | 0.495 | 0.870 | 0.621 |
KECG | 0.478 | 0.835 | 0.598 | 0.490 | 0.844 | 0.610 | 0.486 | 0.851 | 0.610 |
AliNet | 0.539 | 0.826 | 0.628 | 0.549 | 0.831 | 0.645 | 0.552 | 0.852 | 0.657 |
(w/o RKD) | 0.438 | 0.802 | 0.564 | 0.462 | 0.811 | 0.574 | 0.446 | 0.822 | 0.583 |
(w/o Temp.) | 0.573 | 0.857 | 0.677 | 0.576 | 0.873 | 0.681 | 0.564 | 0.862 | 0.673 |
RKDEA | 0.603 | 0.872 | 0.703 | 0.597 | 0.881 | 0.698 | 0.622 | 0.912 | 0.721 |
Models | DWY | DWY | ||||
---|---|---|---|---|---|---|
Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | |
MTransE | 0.281 | 0.520 | 0.363 | 0.252 | 0.493 | 0.334 |
IPTransE | 0.349 | 0.638 | 0.447 | 0.297 | 0.558 | 0.386 |
SEA | 0.518 | 0.802 | 0.616 | 0.516 | 0.736 | 0.592 |
RSN4EA | 0.607 | 0.793 | 0.673 | 0.689 | 0.878 | 0.756 |
GCN-Align | 0.506 | 0.772 | 0.600 | 0.597 | 0.838 | 0.682 |
MuGNN | 0.616 | 0.897 | 0.714 | 0.741 | 0.937 | 0.810 |
KECG | 0.632 | 0.900 | 0.728 | 0.728 | 0.915 | 0.798 |
AliNet | 0.690 | 0.908 | 0.766 | 0.786 | 0.943 | 0.841 |
(w/o RKD) | 0.577 | 0.848 | 0.659 | 0.671 | 0.889 | 0.751 |
(w/o Temp.) | 0.703 | 0.921 | 0.773 | 0.818 | 0.961 | 0.870 |
RKDEA | 0.756 | 0.973 | 0.821 | 0.823 | 0.971 | 0.879 |
Table 2 shows the results of experiments on DBP15K and DWY100K. It can be seen that RKDEA achieves promising performance on both cross-lingual and cross-domain datasets, indicating the effectiveness of our proposed framework. Moreover, RKDEA achieves significant improvement over the compared baseline methods on the DBP15K dataset. The reason is that those baseline methods fail to incorporate complex relational knowledge due to the sparsity of DBP15K while our RKDEA is capable of exploiting fine-grained relational knowledge. Although KECG and HyperKA methods also explicitly learn knowledge embedding and neighborhood consensus, they fail to propose an effective way to integrate two different objectives. Meanwhile, with the help of knowledge distillation, RKDEA can effectively and flexibly incorporate relational knowledge into neighborhood consensus model and thus achieves much better performance.
In the large-scale dataset DWY100K, RKDEA also significantly outperforms all other methods. Since DWY100K is much larger than DBP15K with fewer relations and the graph structure is more similar, the neighborhood consensus plays a more important role, and the performance gain by utilizing knowledge distillation is less than DBP15K. Even though, RKDEA still reports reasonable results due to the properly designed techniques.
5.5 Effectiveness of Knowledge Distillation
Models | DBP | DBP | DBP | ||||||
---|---|---|---|---|---|---|---|---|---|
Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | |
KECG (w/o KE) | 0.430 | 0.791 | 0.551 | 0.446 | 0.807 | 0.567 | 0.432 | 0.815 | 0.559 |
KECG (w/ Init.) | 0.481 | 0.823 | 0.589 | 0.473 | 0.823 | 0.583 | 0.461 | 0.826 | 0.574 |
KECG | 0.478 | 0.835 | 0.598 | 0.490 | 0.844 | 0.610 | 0.486 | 0.851 | 0.610 |
KECG (w/ KD) | 0.513 | 0.853 | 0.627 | 0.516 | 0.861 | 0.633 | 0.535 | 0.877 | 0.651 |
HyperKA (w/o KE) | 0.518 | 0.814 | 0.623 | 0.535 | 0.834 | 0.640 | 0.529 | 0.859 | 0.645 |
HyperKA (w/ Init.) | 0.569 | 0.847 | 0.659 | 0.551 | 0.853 | 0.659 | 0.572 | 0.878 | 0.681 |
HyperKA | 0.572 | 0.865 | 0.678 | 0.564 | 0.865 | 0.673 | 0.597 | 0.891 | 0.704 |
HyperKA (w/ KD) | 0.581 | 0.868 | 0.693 | 0.584 | 0.879 | 0.691 | 0.601 | 0.894 | 0.711 |
To further analyze the importance of relational knowledge and the effectiveness of relational knowledge distillation, we integrate our knowledge distillation techniques with KECG and HyperKA models, producing four variants with different knowledge embedding methods for each model. Specifically, (w/o KE) denotes variants without knowledge embedding, (w/ Init.) denotes variants with entity embedding initialization of pre-trained knowledge embedding model and (w/ RKD) denotes variants with our relational knowledge distillation. As shown in Table 4, our distillation method (w/ RKD) yields the best performances for both KECG and HyperKA.

In order to illustrate the performance gain brought by our proposed knowledge distillation approach, Figure 3 shows the performance comparison among the models with and without relational knowledge distillation on DBP15K. The results clearly show that by introducing relational knowledge distillation, all three models, KECG, HyperKA and RKDEA achieve significant performance gain.
These results demonstrate that our proposed methods could be adopted to improve other existing KG alignment models and therefore further prove the potential and effectiveness of our proposed relational knowledge distillation method.
5.6 Impact of Adaptive Temperature Factor
The adaptive temperature mechanism is one of the core contributions in RKDEA. To explore the effect of incorporating the temperature factor, we conduct ablation study by comparing with RKDEA (w/o Temp.) on both cross-lingual and cross-domain datasets. Figure 4 shows the comparison of Hits@1 change curve with the iteration increasing between these two models during the training process. The results illustrate that RKDEA (w/o Temp.) converges faster while RKDEA achieves higher Hits@1 at the end of training process.

Actually, the temperature is a weight decay mechanism in the training process. In the early stage of training, when GNN is not well trained, the distilled relational knowledge is instructive for entity alignment and makes the model converge quickly into a relatively good state. However, as the training progress moves on, relational knowledge and neighborhood information may lead to different objectives. Therefore, if the contribution of distilled relational knowledge stays the same, the model will fall into trade-off between two directions: overfit the relational knowledge but underfit the neighborhood consensus or overfit the neighborhood consensus but underfit the relational knowledge. Consequently, involving excessive information could be potentially harmful while the adaptive temperature mechanism can avoid it by controlling the contribution of distilled relational knowledge.
6 Conclusion
In this paper, we study the problem of entity alignment over heterogeneous KGs. We propose a GCN based framework with knowledge distillation techniques to take advantage of the complex relational knowledge by jointly learning entity embedding and neighborhood consensus. With the help of relational knowledge distillation, our model can effectively and flexibly model relational knowledge and neighborhood information. Furthermore, by automatically adjusting the temperature parameter, our proposed model can dynamically control the contribution of different objectives and avoid overfitting. Experimental results on several popular benchmarking datasets show that the proposed solutions outperform the state-of-the-art methods by an obvious margin.
References
- [1] Ba, J., Caruana, R.: In: NeurIPS. pp. 2654–2662 (2014)
- [2] Bagherinezhad, H., Horton, M., Rastegari, M., Farhadi, A.: Label refinery: Improving imagenet classification through label progression. CoRR abs/1805.02641 (2018)
- [3] Bordes, A., Usunier, N., García-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NeurIPS. pp. 2787–2795 (2013)
- [4] Bucila, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: SIGKDD. pp. 535–541 (2006)
- [5] Cao, Y., Liu, Z., Li, C., Liu, Z., Li, J., Chua, T.: Multi-channel graph neural network for entity alignment. In: ACL. pp. 1452–1461 (2019)
- [6] Chen, B., Zhang, J., Tang, X., Chen, H., Li, C.: Jarka: Modeling attribute interactions for cross-lingual knowledge alignment. In: PAKDD. vol. 12084, pp. 845–856 (2020)
- [7] Chen, M., Tian, Y., Yang, M., Zaniolo, C.: Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In: IJCAI. pp. 1511–1517 (2017)
- [8] Chen, Y., Wang, N., Zhang, Z.: Darkrank: Accelerating deep metric learning via cross sample similarities transfer. In: McIlraith, S.A., Weinberger, K.Q. (eds.) AAAI. pp. 2852–2859 (2018)
- [9] Furlanello, T., Lipton, Z.C., Tschannen, M., Itti, L., Anandkumar, A.: Born-again neural networks. In: ICML. vol. 80, pp. 1602–1611 (2018)
- [10] Guo, L., Sun, Z., Hu, W.: Learning to exploit long-term relational dependencies in knowledge graphs. In: ICML. vol. 97, pp. 2505–2514 (2019)
- [11] Hao, Y., Zhang, Y., He, S., Liu, K., Zhao, J.: A joint embedding method for entity alignment of knowledge bases. In: CCKS. vol. 650, pp. 3–14 (2016)
- [12] Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015)
- [13] Huang, Z., Wang, N.: Like what you like: Knowledge distill via neuron selectivity transfer. CoRR abs/1707.01219 (2017)
- [14] Ji, G., He, S., Xu, L., Liu, K., Zhao, J.: Knowledge graph embedding via dynamic mapping matrix. In: ACL. pp. 687–696 (2015)
-
[15]
Karasuyama, M., Takeuchi, I.: Nonlinear regularization path for the modified huber loss support vector machines. In: IJCNN. pp. 1–8 (2010)
- [16] Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
-
[17]
Lee, J., Kim, H., Lee, J., Yoon, S.: Transfer learning for deep learning on graph-structured data. In: AAAI. pp. 2154–2160 (2017)
- [18] Li, C., Cao, Y., Hou, L., Shi, J., Li, J., Chua, T.: Semi-supervised entity alignment via joint knowledge embedding model and cross-graph model. In: EMNLP. pp. 2723–2732 (2019)
- [19] Li, Y., Li, J., Suhara, Y., Wang, J., Hirota, W., Tan, W.: Deep entity matching: Challenges and opportunities. ACM J. Data Inf. Qual. 13(1), 1:1–1:17 (2021)
- [20] Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI. pp. 2181–2187 (2015)
- [21] Liu, Y., Ao, X., Dong, L., Zhang, C., Wang, J., He, Q.: Spatiotemporal activity modeling via hierarchical cross-modal embedding. IEEE Trans. Knowl. Data Eng. 34(1), 462–474 (2022)
- [22] Liu, Z., Cao, Y., Pan, L., Li, J., Chua, T.: Exploring and evaluating attributes, values, and structures for entity alignment. In: EMNLP. pp. 6355–6364 (2020)
-
[23]
Lu, J., Lin, C., Wang, J., Li, C.: Synergy of database techniques and machine learning models for string similarity search and join. In: CIKM. pp. 2975–2976 (2019)
- [24] Mahdisoltani, F., Biega, J., Suchanek, F.M.: YAGO3: A knowledge base from multilingual wikipedias. In: CIDR (2015)
- [25] Mao, X., Wang, W., Xu, H., Lan, M., Wu, Y.: MRAEA: an efficient and robust entity alignment approach for cross-lingual knowledge graph. In: WSDM. pp. 420–428 (2020)
- [26] Nie, H., Han, X., Sun, L., Wong, C.M., Chen, Q., Wu, S., Zhang, W.: Global structure and local semantics-preserved embeddings for entity alignment. In: IJCAI. pp. 3658–3664 (2020)
- [27] Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: CVPR. pp. 3967–3976 (2019)
- [28] Pei, S., Yu, L., Hoehndorf, R., Zhang, X.: Semi-supervised entity alignment via knowledge graph embedding with awareness of degree difference. In: WWW. pp. 3130–3136 (2019)
- [29] Pei, S., Yu, L., Zhang, X.: Improving cross-lingual entity alignment via optimal transport. In: IJCAI. pp. 3231–3237 (2019)
- [30] Rocco, I., Cimpoi, M., Arandjelovic, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: NeurIPS. pp. 1658–1669 (2018)
- [31] Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: Hints for thin deep nets. In: ICLR (2015)
- [32] Schlichtkrull, M.S., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: ESWC. pp. 593–607 (2018)
- [33] Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks (2015)
- [34] Sun, Z., Hu, W., Li, C.: Cross-lingual entity alignment via joint attribute-preserving embedding. In: ISWC. vol. 10587, pp. 628–644 (2017)
- [35] Sun, Z., Hu, W., Zhang, Q., Qu, Y.: Bootstrapping entity alignment with knowledge graph embedding. In: IJCAI. pp. 4396–4402 (2018)
- [36] Sun, Z., Wang, C., Hu, W., Chen, M., Dai, J., Zhang, W., Qu, Y.: Knowledge graph alignment network with gated multi-hop neighborhood aggregation. In: AAAI. pp. 222–229 (2020)
- [37] Tang, X., Zhang, J., Chen, B., Yang, Y., Chen, H., Li, C.: BERT-INT: A bert-based interaction model for knowledge graph alignment. In: IJCAI. pp. 3174–3180 (2020)
- [38] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: NeurIPS. pp. 1195–1204 (2017)
- [39] Tian, B., Zhang, Y., Wang, J., Xing, C.: Hierarchical inter-attention network for document classification with multi-task learning. In: IJCAI. pp. 3569–3575 (2019)
- [40] Wang, J., Lin, C., Li, M., Zaniolo, C.: Boosting approximate dictionary-based entity extraction with synonyms. Inf. Sci. 530, 1–21 (2020)
- [41] Wang, J., Lin, C., Zaniolo, C.: Mf-join: Efficient fuzzy string similarity join with multi-level filtering. In: ICDE. pp. 386–397 (2019)
-
[42]
Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: AAAI. pp. 1112–1119 (2014)
- [43] Wang, Z., Li, J., Tang, J.: Boosting cross-lingual knowledge linking via concept annotation. In: IJCAI. pp. 2733–2739 (2013)
- [44] Wang, Z., Lv, Q., Lan, X., Zhang, Y.: Cross-lingual knowledge graph alignment via graph convolutional networks. In: EMNLP. pp. 349–357 (2018)
- [45] Wu, J., Zhang, Y., Wang, J., Lin, C., Fu, Y., Xing, C.: Scalable metric similarity join using mapreduce. In: ICDE. pp. 1662–1665 (2019)
- [46] Wu, Y., Liu, X., Feng, Y., Wang, Z., Yan, R., Zhao, D.: Relation-aware entity alignment for heterogeneous knowledge graphs. In: IJCAI. pp. 5278–5284 (2019)
- [47] Wu, Y., Liu, X., Feng, Y., Wang, Z., Zhao, D.: Jointly learning entity and relation representations for entity alignment. In: EMNLP. pp. 240–249 (2019)
- [48] Wu, Y., Liu, X., Feng, Y., Wang, Z., Zhao, D.: Neighborhood matching network for entity alignment. In: ACL. pp. 6477–6487 (2020)
- [49] Xu, K., Wang, L., Yu, M., Feng, Y., Song, Y., Wang, Z., Yu, D.: Cross-lingual knowledge graph alignment via graph matching neural network. In: ACL. pp. 3156–3161 (2019)
- [50] Yang, J., Zhang, Y., Zhou, X., Wang, J., Hu, H., Xing, C.: A hierarchical framework for top-k location-aware error-tolerant keyword search. In: ICDE. pp. 986–997 (2019)
- [51] Yang, K., Liu, S., Zhao, J., Wang, Y., Xie, B.: COTSAE: co-training of structure and attribute embeddings for entity alignment. In: AAAI. pp. 3025–3032 (2020)
- [52] Ye, R., Li, X., Fang, Y., Zang, H., Wang, M.: A vectorized relational graph convolutional network for multi-relational network alignment. In: IJCAI. pp. 4135–4141 (2019)
- [53] Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In: CVPR. pp. 7130–7138 (2017)
-
[54]
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In: ICLR (2017)
- [55] Zeng, K., Li, C., Hou, L., Li, J.Z., Feng, L.: A comprehensive survey of entity alignment for knowledge graphs. AI Open 2, 1–13 (2021)
- [56] Zhang, Q., Sun, Z., Hu, W., Chen, M., Guo, L., Qu, Y.: Multi-view knowledge graph embedding for entity alignment. In: IJCAI. pp. 5429–5435 (2019)
- [57] Zhang, Y., Chen, Y., Yang, J., Wang, J., Hu, H., Xing, C., Zhou, X.: Clustering enhanced error-tolerant top-k spatio-textual search. World Wide Web 24(4), 1185–1214 (2021)
-
[58]
Zhang, Y., Wu, J., Wang, J., Xing, C.: A transformation-based framework for KNN set similarity search. IEEE Trans. Knowl. Data Eng.
32(3), 409–423 (2020) -
[59]
Zhao, K., Zhang, Y., Wang, Z., Yin, H., Zhou, X., Wang, J., Xing, C.: Modeling patient visit using electronic medical records for cost profile estimation. In: Pei, J., Manolopoulos, Y., Sadiq, S.W., Li, J. (eds.) DASFAA. pp. 20–36 (2018)
- [60] Zhu, H., Xie, R., Liu, Z., Sun, M.: Iterative entity alignment via joint knowledge embeddings. In: IJCAI. pp. 4258–4264 (2017)
- [61] Zhu, Q., Zhou, X., Wu, J., Tan, J., Guo, L.: Neighborhood-aware attentional representation for multilingual knowledge graphs. In: IJCAI. pp. 1943–1949 (2019)
Comments
There are no comments yet.