TTMF: A Triple Trustworthiness Measurement Frame for Knowledge Graphs

09/25/2018 ∙ by Shengbin Jia, et al. ∙ 0

The Knowledge graph (KG) uses the triples to describe the facts in the real world. It has been widely used in intelligent analysis and understanding of big data. In constructing a KG, especially in the process of automation building, some noises and errors are inevitably introduced or much knowledges is missed. However, learning tasks based on the KG and its underlying applications both assume that the knowledge in the KG is completely correct and inevitably bring about potential errors. Therefore, in this paper, we establish a unified knowledge graph triple trustworthiness measurement framework to calculate the confidence values for the triples that quantify its semantic correctness and the true degree of the facts expressed. It can be used not only to detect and eliminate errors in the KG but also to identify new triples to improve the KG. The framework is a crisscrossing neural network structure. It synthesizes the internal semantic information in the triples and the global inference information of the KG to achieve the trustworthiness measurement and fusion in the three levels of entity-level, relationship-level, and KG-global-level. We conducted experiments on the common dataset FB15K (from Freebase) and analyzed the validity of the model's output confidence values. We also tested the framework in the knowledge graph error detection or completion tasks. The experimental results showed that compared with other models, our model achieved significant and consistent improvements on the above tasks, further confirming the capabilities of our model.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the era of big data, people face enormous challenges in acquiring information and knowledge. A knowledge graph (KG) lays the foundation for the knowledge-based organization and intelligent application in the Internet age with its powerful semantic processing capabilities and open organization capabilities. In recent years, the research and applications of large-scale knowledge graph libraries have attracted increasing attention in academic and industrial circles. The knowledge graph aims to describe the various entities or concepts and their relationships existing in the objective world, which constitutes a huge semantic network map [1]. It usually stores knowledge in the form of triples (head entity, relationship, tail entity), which can be simplified to .

The construction of the preliminary KG has mainly relied on manual labeling [2] [3], which required a large amount of human annotation or expert supervision, which is extremely labor-intensive and time-consuming. However, there is significant real-world knowledge and the production speed is very fast. Manual annotation can no longer meet the speed of updating and growth of the KG [3]. Therefore, an increasing number of researchers are committed to automatically extracting structured information directly from unstructured Internet web pages, such as Open information extraction [4] [5] [6], NELL [7], and so on. At present, the automated construction of the KG has occupied a large proportion. There have been existing amounts of widely utilized, large-scale knowledge graphs, such as Freebase [8], DBpedia [9], and Wikidata111

However, some noises and errors will inevitably be introduced in the process of automation. References [10] and [11] verify the existence and problems of errors in the KG. Existing knowledge-driven learning tasks or applications, such as knowledge representation learning and reasoning [12] [13], knowledge graph completion [14], knowledge graph error detection [11] [15] [16], intelligent question answering [17] and information retrieval [18], assume knowledge in the existing KG is completely correct and therefore bring about potential errors [19] [20].

For a piece of knowledge in KG, especially from a professional field, it is difficult to clearly determine whether it is true when it is not tested in practice or is not strictly and mathematically proven. For this reason, we introduce the concept of triple trustworthiness for the KG. The triple trustworthiness indicates the degree of certainty that the knowledge expressed by the triple is true. The triple’s confidence value is set to be within the interval [0, 1]. The smaller the value is, the greater the probability of the triple is in error. Based on this, we can find possible errors in the existing KG and improve its quality of the KG. At the same time, for an unseen triple outside the KG, the closer the value is to 1, the greater the probability that the triple will be described as a true fact, by which a new correct triple can be identified and be supplemented for the KG. Therefore, the coverage of the KG can be improved.

The goal of this paper is to study how to use appropriate methods to evaluate the trustworthiness of a knowledge triple. In the KG, the same relationship can occur between different entities, and multiple relationships can associate with the same entity at the same time. There are intricate and complex relationships among the triples. Based on the above characteristics, we propose a unified knowledge graph triple trustworthiness measurement framework (TTMF), which is a crisscrossed neural network-based structure. We measure the trustworthy probability that the knowledge expressed by the triple may actually exist, from multiple levels, including the entity level (correlation strength between an entity pair), the relationship level (translation invariance of relation vectors), and the KG global level (reasoning proof of triple related reachable paths). Corresponding to different levels, we generate three different questions and focus on solving them by designing three kinds of Estimators to form a pool. Next, a comprehensive triple confidence value is output through a Fusioner.

The main contributions of this article include:

  • We propose a unified knowledge triple trustworthiness measurement framework for the KG that makes comprehensive use of the triple semantic information to globally infer information. We can achieve three levels of measurement and an integration of confidence value at the entity level, relationship level, and the knowledge graph global level.

  • The framework has good scalability, which can flexibly expand more Estimators to further enhance the framework’s computing power. The framework can be applied over a wide range of scenarios. The confidence value calculated by the framework is tested by using it for the knowledge graph error detection and knowledge graph completion. Experiments have shown that they have achieved good results.

  • We have improved a ResourceRank algorithm that can better measure the potential strength of the relationships between entities. We propose a path selection algorithm based on the semantic distance, which can effectively evaluate the reliability of the path in the KG. These algorithms are beneficial to our framework.

This paper is organized as follows. In Section II, we will provide a review of related work. Section III, describes the model architectures used in this work. Section IV, describes the experiment, results, and discussions. Section V provides with the conclusion.

Figure 1: The unified triple trustworthiness measurement framework for KG.

2 Related Work

The concept of “Trustworthiness” has been applied to knowledge graph related tasks to some extent. Reference [19] proposed a triple confidence awareness knowledge representation learning framework, which improved the knowledge representation effect. There were three kinds of triple credibility calculation methods using the internal structure information of the KG. This method used only the information provided by the relationship, ignoring the related entities. The NELL [7]

constantly iterated the extracting template and kept learning new knowledge. It used heuristics to assign confidence values to candidate relations and continuously updated the values through the process of learning. This method was relatively simple but lacked semantic considerations. Dong et al. 

[3] constructed a large-scale probabilistic knowledge base known as Knowledge Vault, where the reliable probability of a triple was a fusion. Several extractors provided a reliability value; meanwhile, a probability could be computed by the prior models, which were fitted with existing knowledge repositories in Freebase. This method was tailored for their knowledge base construction and did not have good generalization capabilities. Li et al. [14] used the neural network method to embed the words in ConceptNet and provide confidence scores to unseen tuples to complete the knowledge base. This method considered only the triples themselves, ignoring the global information provided by the knowledge base.

The above models used the triple trustworthiness to solve various specific tasks. It can be seen that the triple trustworthiness measurement is important for applications and research. However, at present, there is a lack of systematic research on the knowledge triple trustworthiness calculation method. Our work is devoted to this basic research and proposes a unified measurement framework that can facilitate a variety of tasks.

In this article, we verify the effect of the triple trustworthiness on the two tasks of knowledge graph error detection and knowledge graph complementation. Next, we introduce the related works of these tasks.

The Knowledge graph error detection (KGED) task is dedicated to identifying whether a triple is in error. The existence of noise and errors in the KG is unavoidable. Therefore, error detection is especially important for KG construction and application. Traditional methods [7] [15] [16] were still based on manual detection, and the cost was considerable. Recently, some people have begun to study automatic KG error detection methods [3] [21] [10]. The error detection can actually be regarded as a special case of the trustworthiness measurement, which is divided into two kinds of Boolean value types: “true (trusted)” and “error (untrusted)”.

The Knowledge graph completion (KGC) is aimed at predicting links between entities to find new and unseen relation triples through existing knowledge graphs. During completing, we must not only determine whether there is a relationship between two entities but also predict the specific type of relationship. Previous methods mainly included Path ranking algorithms based [22] [23] or Probabilistic graphical models based [24] [25]. In recent years, embedding-based methods [14] [26] [27] have gained a significant amount of attention. Whether two entities have a potential relationship could be predicted by simple functions of their corresponding embeddings indicating they had good efficiency and prospect.

Finally, we introduce Knowledge representation learning (KRL) technology, which is the preprocessing part of our framework construction. Knowledge triples are formal expressions of facts described in natural language. To be able to input triples into the models, we need to vectorize them. Thus, the KRL comes into being. It aims to project the entities and relations in the KG into a dense, real-valued and low-dimensional semantic embeddings. Based on this, we can efficiently measure the semantic correlations of entities and relations. It not only is the foundation of our model construction but also can be directly applied to the tasks of knowledge graph error detection and knowledge graph complementation. The KRL has been a research hotspot in recent years. The main models include TransE [26], TransH [28], TransR [29], TransD [30], PTransE [31], ComplEx [32] and others.

3 The Triple Trustworthiness Measurement Framework

The unified triple trustworthiness measurement framework for design is presented, as shown in figure 1. It is a crisscrossing neural network-based structure. Longitudinally, it can be divided into two levels. The upper is a pool of multiple trustworthiness estimate cells (Estimator). The output of these Evaluators forms the input of the lower-level fusion device (Fusioner). The Fusioner is a Multi-layer perceptron (MLP) to generate the final confidence value for each triple. Viewed laterally, for a given triplet

, we consider the triple trustworthiness from three levels and correspondingly answer three hierarchical questions. 1) Is there a possible relationship between entity pairs ? 2) Can a certain relationship r occur between entity pairs ? 3) From a global perspective, can other relevant triples in the KG reason that the triple is trustworthy? For these questions we designed three kinds of Estimators, as described below.

3.1 Is there a possible relationship between the entity pairs?

We propose an algorithm named ResourceRank, to measure the likelihood of an undetermined relationship occuring between a given entity pair . This likelihood is one of the important bits of information for evaluating the trustworthiness of the triple. If a pair of entities has a heavily weak relevance, the trustworthiness of the triples formed by the entity pair will be greatly compromised.

Figure 2: The graph of resource allocation in the ResourceRank algorithm.

As shown in figure 2, there are dense edges (relationships) between node (entity) and node , that is, there is a high association strength between and , which shows that the likelihood of a relationship between should be great. To characterize the association strength between an entity pair, we use the idea of Resource allocation [19] [31] [33] [34] to design the ResourceRank algorithm. The algorithm assumes that the association between entity pairs will be stronger, and more resources are passed from the head entity through the graph to the tail entity . The amount of resources passed ingeniously reflects the association strength between the entities. The ResourceRank algorithm mainly includes three steps:

  • Constructing a directed diagram centered on the head entity .

  • Iterating the flow of resources in the diagram until it converges and calculates the resource retention value of the tail entity.

  • Synthesizing other features and constructing feature vectors as framework input.

Specific details are described below:

Each entity is abstracted into a node. If there is a relationship between the entities and , a directed edge will exist between them. Therefore, the KG can be mapped as a directed graph, as shown in figure 2. We start from the head entity and search the graph deeply along the direction of the edge to obtain a subgraph centered on . This subgraph will have the following characteristics. (1) This subgraph is weakly connected, that is, starting from allows every node in the subgraph to be reached. (2) In the initial state, the resource amount of is 1, the amount of the other node resource is 0, and the sum of all nodes in the subgraph is always 1. (3) There may be multiple relationships between entity pairs but only one directed edge in the subgraph. Depending on the number of these relationships, each edge will have a different width. The larger the width is, the more resource flows through the edge. (4) To facilitate the subsequent operations, we set the search depth to to limit the range of the subgraph.

The resource owned by node will flow through all associated paths to each entity node in the entire subgraph. The total amount of resources aggregated into the tail entity through one or more paths indicates how much information is transferred from to . It is more likely that a relationship exists between and if the resource value is larger. If a node does not exist in this directed subgraph, then its resource is 0.

Next, we simulate the resource flow in the directed subgraph until it is distributed steadily. At this time, the value of the resource on the tail entity is . We use the PageRank [35] [36] algorithm to iterate the information flow, and the of a node is calculated as follows:


Where, is the set of all nodes that have outgoing links to the node , is the out-degree of the node and is the total number of nodes, and is generally taken as 0.85.

In addition, each entity has different states in the directed subgraph. These states also provide evidence for the judgment of the relationship between entities. To better measure the probability of a relationship between two entities, we also consider the characteristics shown in table 1.

Features Description
ID_h In-degree of head node.
OD_h Out-degree of head node.
ID_t In-degree of tail node.
OD_t Out-degree of tail node.
Dep Depth from head node to tail node.
Table 1: Supplemental feature set for ResourceRank algorithm

Considering the above six indicators comprehensively, we can construct a feature vector . After being activating, the vector is transformed into a probability value as , indicating the likelihood that there may be some relationship between the head entity and the tail entity . This transformation is:



is a nonlinear activation function,

and are parameter matrices that can be trained during model training.

In general, we calculate using the ResourceRank algorithm based on the principle of information flow, and the is within the range [0, 1]. The closer it is to 1, the more likely it is that there is a relationship between and , which allows the assessment of the trustworthiness of the triple at the entity level to answer the questions shown in the title.

3.2 Can the determined relationship occur between the entity pair ?

When we measure whether there is a relationship between entity pairs, we cannot tell what kind of relationship the entity pairs have. For a given triple , we next calculate the possibility of such a relation occurring between the entity pair . Here, we use the Translation-based energy function (TEF) algorithm.

Inspired by the translation invariance phenomenon in the word embedding space  [37]  [38], the relationship in the KG is regarded as a certain translation vector between entities; that is, the relational vector is as the translating operations between the head entity embedding and the tail entity embedding  [26]. As illustrated in figure 3, in the vector space, the same relational vector can be mapped to the same plane and freely translated in the plane to remain unchanged. The triples (BinLaden, Religion, Islam) and (Obama, Religion, Protestantism) should be all correct. However, according to translational invariance of relation vectors, (BinLaden, Religion, Protestantism) must be wrong.

Figure 3: Effects display of the Translation-based energy function.

An ideal true triple , it should satisfy . The energy function value should be infinitely close to 0. The quality of the tuples in the reality, though, is different. Nevertheless, there is an essential law that is consistent; that is, the higher the degree of fit between , , and , the smaller the value of will be. This condition is sufficient and necessary. We believe that the smaller the value is, the probability that the relationship is established between the entity pair will be greater, and the trustworthiness of will be better, and vice versa.

The TEF method specifically operates as follows:

First, knowledge representation learning technology is used to implement a low-dimensional distributed representation of entities or relations. Second, we compute

for each triple. Finally, a deformation of the sigmoid function is used to convert

into a probability value, which represents the probability that the entity pair may constitute the relationship . The conversion formula is as follows:


Here, is a threshold related to the relationship . When = , the probability value is 0.5. If ¡ , then ¿ 0.5. The

is a hyperparameter used for smoothing and can be adjusted dynamically along with the model training.

The probability indicates the trustworthiness that the relationship occurs between the entity pair , which answers the second question. This Estimator focuses on the relations and judges the triple trustworthiness from the relation level.

3.3 Can other relevant triples in the KG infer that the triple is trustworthy?

We have evaluated the triple trustworthiness from two aspects above. Next, we will use the relevant triples in the KG to further infer the credibility of the target triple.

Inspired by “social identity” theory [39] [40], we make an image metaphor: regarding the knowledge graph as a social group, each triple is an individual in society. The degree of recognition of other individuals in society to the targeted individuals (target triples) reflects whether the target individual can properly integrate into the society (i.e., the KG).

There are many substantial multi-step relation paths from head entities to tail entities, and the reachable paths reflect the complex inference patterns among the triples in the KG and indicate the semantic relevance among the entities. Therefore, the semantic correlation information provided by these reachable paths will be an important evidence for judging the triple trustworthiness. Thus, we construct a Reachable paths inference algorithm.

Figure 4: The inference instances for triple trustworthiness.

For example, as shown in figure 4, there are multiple multi-step reachable paths between entity pairs “Bin Laden” and “Saudi Arabia”. According to the path “Bin Laden Riyadh Saudi Arabia”, we can firmly infer the fact triple (Bin Laden, Nationality, Saudi Arabia). In addition, we suppose there is a pseudo-triple (Bin Laden, Religion, Christianity) in the KG. The related paths should be very few and illogical, and we should doubt the credibility of this tuple. In contrast, we can find the correct triple (Bin Laden, Religion, Islam) based on multiple reachable paths to simultaneously implement the error detection and trust knowledge supplementation.

To Exploit the reachable path for inferring triple trustworthiness, we need to address two key challenges:

3.3.1 Reachable Paths Selection

In a large-scale knowledge graph, the number of reachable paths associated with a triple may be enormous. We cannot weigh all the paths by balancing the processing costs. In addition, not all reachable paths are meaningful and reliable. For example, the path “Bin Laden Pakistan Saudi Arabia” provided only scarce evidence to reason about the credibility of the triple (Bin Laden, Nationality, Saudi Arabia). Therefore, it is necessary to examine the reliability of each path from which to choose the most efficient reachable paths to use.

Previous works believed that the paths that led to lots of possible tail entities were mostly unreliable for the entity pair. They proposed a path-constraint resource allocation algorithm to measure the reliability of relation paths [19] [31]. Such a method ignored the semantic information of the paths. However, we find that the reliability of the reachable path is actually a consideration of the semantic relevance of the path with the target triple. Therefore, we propose a Semantic distance-based path selection algorithm. The algorithm is described as follows:

0:    The knowledge graph (KG);A given target triple .
0:    Multiple reachable paths most relevant to target triple in semantics.
1:  Search the reachable paths from to in the KG and store in ;
2:  For each , calculate1) the semantic distance between the target relation and all relations in , As ; 2) the semantic distance between the target tail entity and all head entities in , as ; 3) the semantic distance between the target head entity and all tail entities in , as ;
3:  Calculate the average distance;
4:  Based on the , select first paths with the highest scores.
5:  Return
Algorithm 1 Reachable Paths Selecting Algorithm

3.3.2 Reachable Paths Representation

After the paths are selected, it is necessary to map each reachable path to the low-dimensional vector for subsequent calculations. The previous methods [19] [31] merely considered the relations in the paths. Here, we consider a triple in the paths as a whole, including not only the relationships but also the head, tail entities, since the entities can also provide significant semantic information. The embeddings of the three elements of each triple are concatenated as a node in the paths. Therefore, a reachable path is transformed into an ordered sequence = . The semantic information contained in the path can be analyzed using sequence analysis tools.

Recurrent neural networks (RNNs) are good at capturing temporal semantics of a sequence. Long short-term memory (LSTM) [41] is a variant of RNNs, and it has a wide range of applications with good results.

Figure 5: A LSTM cell.

In this paper, we apply LSTM networks to learn the representation of the reachable paths. The LSTM architecture consists of a set of recurrently connected subnets, known as memory cells, which is used to compute the current hidden vector based on the previous hidden vector , the previous cell vector and the current input embedding . Each time-step is a LSTM memory cell. Fig 5 illustrates a single LSTM memory cell [42], which is implemented as the follows:


where, , , and are the input gate, forget gate, output gate and cell vectors, respectively, is the bias, is the logistic sigmoid function, and are the trainable parameter matrixes.

The LSTM layer encodes by considering forward information from to . We use the output vector of the last time to represent the semantic information of each reachable path.

We stitch the output of the reachable paths together to form a vector , which will be used as the output of the Estimator and the subsequent input to the Fusioner.

After the above two challenges are solved, we can use the reachable paths to infer the trustworthiness of the target triple on the KG global level.

3.4 Fusing the Estimators

We designed a Fusioner based on a multi-layer perceptron [43] to output the final confidence values of the triples. We have described three different Estimators above. A simple way to combine them is to splice them into a feature vector for each triple and,


The vector

will be inputted into the Fusioner and transformed passing multiple hidden layers. The output layer is a binary classifier by assigning a label of

= 1 to true tuples and a label of = 0 to fake ones. A nonlinear activation function (logistic sigmoid) is used to calculate as,


Where is the hidden layer, and are the parameter matrices to be learned in the hidden layer, and and

are the parameter matrices of the output layer. The model’s learning loss function is defined as follows,


4 Experiments

In this paper, we focus on Freebase [44], which is one of the most popular knowledge graphs, and we perform our experiments on the FB15K [26], which is a typical benchmark knowledge graph extracted from Freebase.

4.1 Training Settings

There are no explicit labelled errors in the FB15K. Considering the experience that most errors in real-world KGs derive from the misunderstanding between similar entities, we consider the methods described in [19] to generate fake triples as negative examples automatically with less human annotation. Three kinds of fake triples may be constructed for each true triple: one by replacing the head entity, one by replacing the relationship, and one by replacing the tail entity. We assign a label of 1 to positive examples and a label of 0 to negative examples. We also assure that the number of generated negative examples should be equal to that of positive examples.

We implement the neural network using the Keras library

222 The dimension of the entity and relation embeddings is 100. The batch size is fixed to 50. We use early stopping [45] based on the performance on the validation set. The number of LSTM units is 100. Parameter optimization is performed with the Adam optimizer [46], and the initial learning rate is 0.001. In addition, to mitigate over-fitting, we apply the dropout method [47] to regularize our model.

In addition, there are some adjustable parameters during the model training. We set = 4 and = 3. The relation-specific threshold can be searched via maximizing the classification accuracy on the validation triples, which belong to the relation .

4.2 Interpreting the Validity of Trustworthiness

To verify whether the triple trustworthiness output from our model is valid, we perform the following analysis. First, we display the triple confidence values in a centralized coordinate system, as shown in figure 6. The left area shows the distribution of the values of the negative examples, while the right area shows that of the positive examples. It can be seen that the confidence values of the positive examples are mainly concentrated in the upper region (¿ 0.5). In contrast, the values of the negative examples are mainly concentrated in the lower region (¡ 0.5) and are consistent with the natural law of judging triple trustworthiness, proving that the triple confidence values output from our model are valid.

Figure 6: The scatter plot of the triple confidence values distribution.
Figure 7:

The various value cures of precision and recall with the triple confidence values.

In addition, by dynamically setting the threshold for the triple confidence values (only if the value of a triple is higher than the threshold can it be considered trustworthy.), we can measure the curves of the precision and recall of the output, as shown in figure 7. As the threshold increases, the precision continues to increase, and the recall continues to decrease. When the threshold is adjusted within the interval [0, 0.5], there is no obvious change in the recall, and it remains at a high level. However, if the threshold is adjusted within the interval [0.5, 1] , the recall tends to decline. In particular, the closer the threshold is to 1, the greater the decline rate will be. These show that the positive examples universally have higher confidence values (¿ 0.5). Moreover, the precision has remained at a relatively high level, even when the threshold is set to a small value, which indicates that our model can identify the negative instances well and assign them a small confidence value.

4.3 Knowledge Graph Error Detection

The Knowledge graph error detection task is to detect possible errors in the knowledge graph according to their triple trustworthy scores. Exactly, it aims to predict whether a triple is correct or not, which could be viewed as a triple classification task [48].

We construct a test set following the same protocol as shown in Section 4.1 and give several evaluation results. (1) The accuracy of the classification. The decision strategy for classification is simple; if the confidence value of a testing triple computed by each method is below the threshold 0.5, it is predicted as negative, otherwise, it is positive. (2) The maximum F1-score. For a given threshold, we can measure the precision, recall and F1-score of the output.

Models Accuracy F1-score
MLP 0.833 0.846
Bilinear 0.861 0.869
TransE 0.868 0.876
TransH 0.912 0.913
TransD 0.913 0.913
TransR 0.902 0.904
PTransE 0.941 0.942
Ours_TransE 0.977 0.975
Ours_TransH 0.978 0.979
Ours_PTransE 0.981 0.982
Table 2: Evaluation results on the Knowledge graph error detection.

As shown in table 2, our model has better results in terms of accuracy and the F1-score than the other models.

The Bilinear model [14] [49] [50] and Multi layer perceptron (MLP) model [3]  [14]

have been widely applied to the KG related tasks. They can calculate a score for the validity of triples through operations, such as tensor decomposition and nonlinear transformations. Here we convert the scores to the confidence values using the sigmoid function. Compared with the Bilinear and MLP models, our model shows improvements of more than 10% in the two evaluation indicators.

We use the TEF method (as illustrated in Section 3.2) to transform the output of the embedding-based models of TransE, TransH, TransD, TransR, and PTransE into triple confidence values. These embedding-based models are better than the traditional method, but their results are affected by the quality of the embeddings. In comparison, our model does not rely on word embeddings. We introduce different embeddings into our model, as shown by Ours_TransE, Ours_TransH, and Ours_PTransE, which have very subtle effects. Since our model makes full use of the internal semantic information of the triple and the global inference information of the knowledge graph, it is more robust to achieve the three-level measure of trustworthiness.

Fig 8 shows that the area under the curve (AUC) of our model (Ours_TransE) is much larger than the other approaches. Our model achieves an AUC that is 11% higher than that of the TransE, and is more than 12% greater than that of the traditional method MLP.

Figure 8: The AUC (areas under the precision-recall curves) of each model.

4.4 Knowledge Graph Completion

The Knowledge graph completion task aims to complete a triple when any one of the head, tail or relationship is missing. For example, given two parts of a triple , we consider whether there is sufficient trustworthiness to convince us that it is correct when choosing an entity e from the sets of entities to form the candidate triple .

We use test triples in the FB15K as seeds (assuming they are unknown) and divide them into three categories: all pairs of head and relation , all pairs of head and tail , and all pairs of tail and relation . Then, we replace all empty positions with the objects in the entity set or relationship set. We then calculate the confidence value for a given complemented triple

. When the value is greater than the threshold (¿ 0.5), we judge it to be correct. We conduct two measures as our evaluation metrics. (1) The recall of true triples in our test set. Since the built test set we built is incomplete, if the candidate triple that we identified as true appears in the seed concentration, then it must be correct. If not, we cannot guarantee that it must be wrong. (2) The average trustworthiness score across each set of true triples (Quality) 


Recall Quality Recall Quality Recall Quality
MLP 0.970 0.791 0.912 0.735 0.978 0.844
Bilinear 0.936 0.828 0.904 0.807 0.973 0.907
TransE 0.960 0.796 0.927 0.759 0.959 0.786
TransH 0.935 0.826 0.927 0.811 0.955 0.850
TransD 0.942 0.838 0.909 0.804 0.954 0.853
TransR 0.964 0.872 0.921 0.829 0.972 0.868
PTransE 0.944 0.841 0.973 0.888 0.957 0.863
Ours 0.987 0.943 0.977 0.923 0.994 0.959
Table 3: Evaluation results on the Knowledge graph completion.

By analyzing the results of table 3, we find that our model has a better effect on the three types of completion problems than the other methods. Our model achieves a higher recall compared to other models, which shows that it can more accurately find the correct triple in the test set. In addition, the average trustworthiness score of our model is higher than that the others, which shows that our model can better identify the correct instances and with high confidence values.

4.5 Analyzing the Effects of Single Estimators

To measure the effect of single Estimators, we separate each Estimator as an independent model to calculate the confidence values for triples. The results in the knowledge graph error detection test set are shown in table 4. It can be found that the accuracy obtained by each model is above 0.8, which proves the effectiveness of each Estimator. Among them, the Reachable paths reasoning based method (PathsReasoning) achieves better results than the other two Estimators. After combining all the Estimators, the accuracy obtained by the global framework (TTMF) has been greatly improved, which shows that our framework has good flexibility and scalability. It can well integrate multiple aspects of information to obtain a more reasonable trustworthiness.

It is worth emphasizing that our framework is flexible and easy to extend. The newly added estimators can train the parameters together with the framework. In addition, the confidence value generated by a single estimator can be extended to the feature vector straightly.

Models Accuracy
TEF(TransE) 0.868
ResourceRank 0.811
PathsReasoning 0.881
Combination 0.977
Table 4: Evaluation results of each single estimator on the Knowledge graph error detection.

5 Conclusion

In this paper, to eliminate the deviation caused by the errors in the KG to the knowledge-driven learning tasks or applications, we establish a unified knowledge graph triple trustworthiness measurement framework. This framework is a crisscrossing neural network structure to calculate the confidence values for the triples in the KG. This trustworthiness can be used to detect and eliminate errors in the KG and identify new unseen triples to supplement the KG. The framework evaluates the trustworthiness of the triples from three perspectives and synthetically uses the triple semantic information and the global inference information of the knowledge graph. Experiments were conducted on the popular knowledge graph Freebase, and the generated triple confidence values were used for the Knowledge graph error detection and Knowledge graph completion tasks. The experimental results confirmed the capabilities of the framework model. The source code and dataset of this paper can be obtained from In the future, we will explore adding more estimators to the framework to further improve the effectiveness of the trustworthiness. We will also try to apply the trustworthiness to more knowledge-based applications.


  • [1] Z. L. Xu, Y. P. Sheng, L. R. He, and Y. F. Wang, “Review on knowledge graph techniques,” Journal of University of Electronic Science and Technology of China, 2016.
  • [2] L. Qiao, L. Yang, D. Hong, L. Yao, and Z. Qin, “Knowledge Graph Construction Techniques,” Journal of Computer Research & Development, 2016.
  • [3] X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang, “Knowledge vault: a web-scale approach to probabilistic knowledge fusion,” Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’14, 2014.
  • [4] M. Banko, M. Cafarella, and S. Soderland, “Open information extraction for the web,” in

    the 16th International Joint Conference on Artificial Intelligence (IJCAI 2007)

    , 2007, pp. 2670–2676. [Online]. Available:
  • [5] A. Fader, S. Soderland, and O. Etzioni, “Identifying relations for open information extraction,” in

    Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP 2011)

    , 2011, pp. 1535–1545.
  • [6] S. Jia, M. Li, Y. Xiang, and Others, “Chinese Open Relation Extraction and Knowledge Base Establishment,” ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), vol. 17, no. 3, p. 15, 2018.
  • [7] A. Carlson, J. Betteridge, and B. Kisiel, “Toward an Architecture for Never-Ending Language Learning.” In Proceedings of the Conference on Artificial Intelligence (AAAI) (2010), 2010.
  • [8] K. Bollacker, R. Cook, and P. Tufts, “Freebase: A shared database of structured general human knowledge,” Proceedings of the national conference on Artificial Intelligence, 2007.
  • [9] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives, “DBpedia: A nucleus for a Web of open data,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4825 LNCS, 2007, pp. 722–735.
  • [10] J. Liang, Y. Xiao, Y. Zhang, S.-w. Hwang, and H. Wang, “Graph-Based Wrong IsA Relation Detection in a Large-Scale Lexical Taxonomy.” in AAAI, 2017, pp. 1178–1184.
  • [11] S. Heindorf, M. Potthast, B. Stein, and G. Engels, “Vandalism detection in wikidata,” in ACM International on Conference on Information and Knowledge Management, 2016, pp. 327–336.
  • [12] S. Guan, X. Jin, Y. Jia, Y. Wang, and C. Xueqi, “Knowledge graph oriented knowledge inference methods: A survey,” Ruan Jian Xue Bao/Journal of Software, vol. 29(10), pp. 1–29, 2018.
  • [13] B. Yang, W.-t. Yih, X. He, J. Gao, and L. Deng, “Embedding Entities and Relations for Learning and Inference in Knowledge Bases,” 2014. [Online]. Available:
  • [14] X. Li, A. Taheri, L. Tu, and K. Gimpel, “Commonsense Knowledge Base Completion,” in Meeting of the Association for Computational Linguistics, 2016, pp. 1445–1455.
  • [15] J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum, “YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia,” Artificial Intelligence, vol. 194, pp. 28–61, 2013.
  • [16] J. Lehmann, “Dbpedia: A large-scale, multilingual knowledge base extracted from wikipedia,” Semantic Web, vol. 6, no. 2, pp. 167–195, 2015.
  • [17] D. Lukovnikov, A. Fischer, and J. Lehmann, “Neural network-based question answering over knowledge graphs on word and character level,” in International Conference on World Wide Web, 2017, pp. 1211–1220.
  • [18] B. Han, L. Chen, and X. Tian, “Knowledge based collection selection for distributed information retrieval,” Information Processing and Management, vol. 54, no. 1, pp. 116–128, 2018.
  • [19] R. Xie, Z. Liu, and M. Sun, “Does William Shakespeare REALLY Write Hamlet? Knowledge Representation Learning with Confidence,” arXiv preprint arXiv:1705.03202, 2017.
  • [20] M. V. Manago and Y. Kodratoff, “Noise and knowledge acquisition,” in International Joint Conference on Artificial Intelligence, 1987, pp. 348–354.
  • [21]

    M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich, “A Review of Relational Machine Learning for Knowledge Graph,”

    Proceedings of the IEEE, 2015.
  • [22] N. Lao and W. W. Cohen, “Relational retrieval using a combination of path-constrained random walks,” in Machine Learning, 2010.
  • [23] N. Lao, T. Mitchell, and W. W. Cohen, “Random walk inference and learning in a large scale knowledge base,” Proceedings of EMNLP 2011, 2011.
  • [24] S. Jiang, D. Lowd, and D. Dou, “Learning to refine an automatically extracted knowledge base using Markov logic,” in Proceedings - IEEE International Conference on Data Mining, ICDM, 2012.
  • [25] J. Pujara, H. Miao, L. Getoor, and W. Cohen, “Knowledge graph identification,” in International Semantic Web Conference, 2013, pp. 542–557.
  • [26] A. Bordes, N. Usunier, J. Weston, and O. Yakhnenko, “Translating Embeddings for Modeling Multi-Relational Data,” Advances in NIPS, 2013.
  • [27] K. Toutanova, X. V. Lin, W.-T. Yih, H. Poon, and C. Quirk, “Compositional Learning of Embeddings for Relation Paths in Knowledge Bases and Text,” in Proceedings of the 54nd Annual Meeting on Association for Computational Linguistics (ACL 2016), 2016, pp. 1434–1444.
  • [28]

    Z. Wang, J. Zhang, J. Feng, and Z. Chen, “Knowledge Graph Embedding by Translating on Hyperplanes,”

    AAAI Conference on Artificial Intelligence, 2014.
  • [29] Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu, “Learning Entity and Relation Embeddings for Knowledge Graph Completion,” Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Learning, 2015.
  • [30] G. Ji, S. He, L. Xu, K. Liu, and J. Zhao, “Knowledge Graph Embedding via Dynamic Mapping Matrix,” in Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, 2015, pp. 687–696.
  • [31] Y. Lin, Z. Liu, H. Luan, M. Sun, S. Rao, and S. Liu, “Modeling Relation Paths for Representation Learning of Knowledge Bases,” 2015.
  • [32] T. Trouillon, C. R. Dance, É. Gaussier, J. Welbl, S. Riedel, and G. Bouchard, “Knowledge graph completion via complex tensor factorization,” The Journal of Machine Learning Research, vol. 18, no. 1, pp. 4735–4772, 2017.
  • [33] T. Zhou, J. Ren, M. Medo, and Y. C. Zhang, “Bipartite network projection and personal recommendation,” Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 2007.
  • [34] L. Lü and T. Zhou, “Link prediction in complex networks: A survey,” Physica A: statistical mechanics and its applications, vol. 390, no. 6, pp. 1150–1170, 2011.
  • [35] L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank citation ranking:bringing order to the web.” Technical report, Stanford Digital Library Technologies Project, 1998.
  • [36] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener, “Graph structure in the Web,” Computer Networks, vol. 33, no. 1, pp. 309–320, 2000.
  • [37] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and their Compositionality,” in Advances in neural information processing systems, 2013, pp. 3111–3119. [Online]. Available:
  • [38] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” IJCAI International Joint Conference on Artificial Intelligence, 2013.
  • [39] J. C. Turner and P. J. Oakes, “The significance of the social identity concept for social psychology with reference to individualism, interactionism and social influence,” British Journal of Social Psychology, vol. 25, no. 3, pp. 237–252, 1986.
  • [40] P. James, “Despite the terrors of typologies: the importance of understanding categories of difference and identity,” Interventions, vol. 17, no. 2, pp. 174–195, 2015.
  • [41] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  • [42] A. Graves and J. Schmidhuber, “Framewise phoneme classification with bidirectional lstm and other neural network architectures,” Neural Networks, vol. 18, no. 5-6, pp. 602–610, 2005.
  • [43] J. B. Hampshire and B. Pearlmutter, “Equivalence proofs for multi-layer perceptron classifiers and the bayesian discriminant function,” in Connectionist Models.   Elsevier, 1991, pp. 159–172.
  • [44] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor, “Freebase: a collaboratively created graph database for structuring human knowledge,” in Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD 2008), 2008, pp. 1247–1250. [Online]. Available:
  • [45] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech Recognition with Deep Recurrent Neural Networks,” in arXiv preprint arXiv:1303.5778.   IEEE, 2013, pp. 6645–6649. [Online]. Available:
  • [46] D. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv preprint arXiv:1412.6980, pp. 1–13, 2014. [Online]. Available:
  • [47] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
  • [48] R. Socher, D. Chen, C. D. Manning, and A. Ng, “Reasoning with neural tensor networks for knowledge base completion,” in Advances in neural information processing systems, 2013, pp. 926–934.
  • [49] M. Nickel, V. Tresp, and H.-P. Kriegel, “A Three-Way Model for Collective Learning on Multi-Relational Data,” in ICML, 2011.
  • [50] B. Yang, W.-t. Yih, X. He, J. Gao, and L. Deng, “Embedding Entities and Relations for Learning and Inference in Knowledge Bases,” 2014.