Essence of Factual Knowledge

by   Ruoyu Wang, et al.
Shanghai Jiao Tong University

Knowledge bases are collections of domain-specific and commonsense facts. Recently, the sizes of KBs are rocketing due to automatic extraction for knowledge and facts. For example, the number of facts in WikiData is up to 974 million! According to our observation, current KBs, especially domain KBs, show strong relevance in relations according to some topics. These patterns can be used to conclude and infer for part of facts in the KBs. Therefore, the original KBs can be minimzed by extracting patterns and essential facts.



There are no comments yet.


page 1

page 2

page 3

page 4


Mining Commonsense Facts from the Physical World

Textual descriptions of the physical world implicitly mention commonsens...

Presence-absence reasoning for evolutionary phenotypes

Nearly invariably, phenotypes are reported in the scientific literature ...

Robotic Following of Flexible Extended Objects: Relevant Technical Facts on the Kinematics of a Moving Continuum

The paper offers general technical facts on the kinematics of a moving c...

A Probabilistic Model with Commonsense Constraints for Pattern-based Temporal Fact Extraction

Textual patterns (e.g., Country's president Person) are specified and/or...

Neural Datalog Through Time: Informed Temporal Modeling via Logical Specification

Learning how to predict future events from patterns of past events is di...

Time-Aware Language Models as Temporal Knowledge Bases

Many facts come with an expiration date, from the name of the President ...

Part Whole Extraction: Towards A Deep Understanding of Quantitative Facts for Percentages in Text

We study the problem of quantitative facts extraction for text with perc...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Knowledge bases are collections of domain-specific and commonsense facts. Recently, the sizes of KBs are rocketing due to automatic extraction for knowledge and facts. For example, the number of facts in WikiData is up to 974 million! According to our observation, current KBs, especially domain KBs, show strong relevance in relations according to some topics[2, 1]. These patterns can be used to conclude and infer for part of facts in the KBs. Therefore, the original KBs can be minimzed by extracting patterns and essential facts.

In this paper, we introduce a framework for extracting knowledge essence and reducing overall volume of KBs by mining semantic patterns in relations. Facts are formalized as first-order predicates and patterns are induced as Horn rules.

Table I and Rule (1), (2) show an example of such extraction. By extracting the rules from listed facts, both table I(b) and I(c) can be inferred from other tables and then be removed.

parent child
james harry
lily harry
harry sirius
harry albus
ginny sirius
ginny albus
(a) parent/2
father child
james harry
harry sirius
harry albus
(b) father/2
mother child
lily harry
ginny sirius
ginny albus
(c) mother/2
(d) male/1
(e) female/1
TABLE I: Partial data on server configurations and status

The remaining is organized as follows: Section II analysed properties of rules as equivalence classes. Essence extraction problem is formally defined in Section III. And Section IV introduces the basic framework for essence extraction. Finally Section VI concludes the paper.

Ii Properties of Horn Rules

Ii-a Semantic Length and Fingerprint of a Rule

First-order Horn rules are adopted in our technique to describe semantic patterns in relations. They can further be decomposed into equivalence classes. Elements in each of the classes are arguments that are assigned to the same variables, and if some argument is assigned to constants, then the corresponding equivalence class only consists of the argument and the constant. For example, Rule (1) is decomposed to the following equivalent classes (number in the brackets denotes the argument index of certain relation, starting from 0):

The length of a rule is defined by the follwoing equation:


where is one of these equivalence classes.

Fingerprints of rules are based on the equivalence classes with labels of arguments in the head that it applies to. For example, the last column of the above example shows the label of head arguments.

Lemma 1.

Two rules are semantically equivalent if and only if their fingerprints are identical.


(Necessity)If two rules are semantically equivalent, they can be written in syntactically identical form. Thus equivalence classes of corresponding variables or constants are identical.

(Sufficiency)Each equivalence class tells position of one variable. Therefore, equivalence of all classes ensures that the set of predicates in both rules are identical. The labels of head arguments further determine that the head predicates are the same. Thus, the two rules are identical. ∎

Ii-B Search Space for Rules

Let be the search space for first-order Horn rules. Some elements in make no sense and should be excluded. If some predicate in the body is identical to the head, then the predicate in the body is redundant. These rules are trivial rules. If some subset of the body does not share any variable with the remaining part (include the head), then the rule is either redundant nor unsatisfiable. The subset is called independent fragment. The new search space excluding these two types of rules is written as .

Ii-C Extension on Rules

Definition 2 (Limited Variable, Unlimited Variable, Generative Variable).

A variable is unlimited in some Horn rule if there is only one argument in that is assigned to it. A variable is limited in if there exist at least two arguments in that are assigned to it. A variable is generative if there exist arguments in both the head and body of that are assigned to it.

Searching for rules starts from most general forms, i.e. rules only with head predicate and arguments in the predicate are all unique unlimited variables. To construct new rules, new equivalence conditions are added to the equivalence classes. Syntactically, these operations fell in five extension operations, which is noted by :

Case 1: Assign an existing limited variable to some argument.

Case 2: Add a new predicate with unlimited variables to the rule and then assign an existing limited variable to one of these arguments.

Case 3: Assign a new limited variable to a pair of arguments.

Case 4: Add a new predicate with unlimited variables to the rule and then assign a new limited variable to a pair of arguments. In this case, the two arguments are not both selected from the newly added predicate.

Case 5: Assign a constant to some argument.

According to the rule extension, , if , then is the extension of , and is the origin of (denoted as since one may have multiple origins). Neighbours of a rule in consist of all its extensions and origins. The above extension operations can be used to search on . Let has only a head predicate and all arguments of are unlimited variables, every element in can be searched from some . To prove this we define a property link between predicates in a certain rule: If two predicates and in a rule share a limited variable , then and are linked by in , written as , or in short . Moreover, if there is a sequence of predicates , then there is a linked path between and , written as: . With this property, we can prove the search completeness as follows:

Lemma 3.

, every predicate in has a linked path with the head of .


Suppose a predicate in rule has no linked path with the head. Then is not itself the head. Let , every predicate in has no linked path with the head. Then the fragment noted by does not share any variables with remaining predicates. Namely, denotes an independent fragment in rule . According to the definition of , we have , which contradicts with . ∎

Lemma 4.

(Search Completeness)Let has only a head predicate and all arguments of are unlimited variables, , such that .


Suppose in . During the searching process of , when is already in a intermediate status , an extension of can be constructed by adding a new predicate and turning corresponding variables to . Thus, predicate is introduced into . Therefore, if and is already in a intermediate status, then can be introduced into . According to Lemma 3, all predicates in has linked path with its head. Each predicate can be introduced into the rule iteratively starting from the head predicate where arguments are all different unlimited variables. Other limited variables and constants can be added to the rule by other extension operations to finally construct . ∎

Rules with independent fragments will not be constructed starting from , as the extension operations do not introduce new predicates without any shared variables with other predicates.

Iii Problem Definition

Definition 5 (Essential Knowledge Extraction).

Let be the original KB, which is a finite set of atoms. The extraction on is a triple , where (for “Hypothesis”) is the set of first-order Horn rules, (for “Necessary”) is a subset of , and (for “Counter Examples”) is a subset of the complement of subject to CWA. satisfies ( is logical entailment):

  • is minimal

where is the number of predicates in , and so be . is defined as the sum of lengths of all rules in it.

Definition 6 (Minimum Vertex Cover Problem).

Let be an undirected graph. A minimum vertex cover of is a minimum subset of such that .

Complexity of essence extraction can be proved by reducing minimum vertex cover problem to relational compression. Let be the graph in the vertex cover problem. By the following settings we create a relational knowledge base aligning with : Let be a unary predicate in for each ; let be a unary predicate in for edges; add two constants and to and six predicates , , , , , to for each ; add the following predicates to : ; and add the following constants to : .

Fig. 1: Vertex Cover Example

For example, Figure 1 shows a graph with three vertices and two edges. The corresponding setting of relational compression is as follows:

  • , , , , , , , , , , , , , ,

By reducibility from minimum vertex cover problem to relational compression we can prove the latter is NP-hard. The details are as follows:

Lemma 7.

is not in .


Let , then , where . Thus, the number of predicates this rule entails is . Taking constants into consideration, the number of counter examples this rule entails is also . The size reduced is , no actual reduction. Therefore, it does not reduce the size of knowledge base. It is not in . ∎

Lemma 8.

Predicates of can only be entailed by the following rules: .


Let rule be: , the length of which is 1. Then the number of predicates it entails is , where is the number of edges connected to vertex . There are no counter examples entailed by this rule. Thus the size it reduces is . If , this rule can be used to reduce the size of knowledge base.

According to Lemma 7, cannot be entailed by axioms, and since there is no other predicate in , can only be entailed by some . ∎

Lemma 9.

Let . All predicates in are provable after compression. That is, , where is the set of all provable predicates.


According to Lemma 8, proof of relies only on predicates of . No matter predicates of is provable or not, the rules of can always be applied to prove . Suppose such that . Then there is another predicate and , where and correspond to some edge in and its duplicate, since these two predicates are both entailed by some rule if one of them is entailed by the rule. Then a new rule can be applied to entail these two predicates to further reduce the size of given result. However, according to definition of relational compression, output cannot be further reduced. Contradiction occurs. ∎

Lemma 10.

1.1 Let be the solution of minimum vertex cover problem. Let be a rule set and . Let be a rule set and . Then and .


According to Lemma 8 and 9, all edges are provable and only provable by vertices, and this is equal to the setting that all edges are covered and only covered by vertices for minimum vertex cover problem. Thus entails in a minimum cost. and . ∎

Theorem 11.

Relational compression is NP-hard.


Let be the set of minimum vertex cover of . According to the lemmas above, . All the operations involved with reducibility are with polynomial cost. Thus minimum vertex cover problem can be polynomially reduced to relational compression. Relational compression is NP-hard. ∎

Iv Extraction Framework

To tell whether a fact is provable by others, we employ a directed graph to encode dependency among facts with respect to inference. , where each vertex is either a fact in or an assertion of truth under no condition. if is involved in the proof of by some rule. if can be inferred by some rule with empty body. The extraction for essence is given by Algorithm 1.

1:Knowledge Base
2:Summarization on :
6:while  do
9:     Update graph with respect to
10:end while
Algorithm 1 Essence Extraction

If the dependency graph is a DAG, then essential predicates are represented by the vertices with zero in-degree. However, if cycles appear in , then at least one vertex in each cycle should be included in . This assertion is proved bellow:

Lemma 12.

If some cycle in is not overlapping with other cycles, then at least one vertex should be included in .


A vertex in the dependency graph is guaranteed provable if it is in or all of its in-neighbours are guaranteed provable. In the following proof, we assume that all other parts in are guaranteed provable except the cycles. If none of vertices in a single cycle (not overlapping with other cycles) is included in , then for each of these vertices, there is one in-neighbour not guaranteed provable. Thus, none of vertices in the cycle is guaranteed provable. At least one vertex should be selected in . ∎

Lemma 13.

If some cycles in are overlapping, then at least one vertex should be included in .


Suppose two cycles are overlapping in . If none of vertices in these cycles is in , then none of them are guaranteed provable. If one of the vertices in the non-overlapping part is in , then from this vertex to the one before intersection, all of these vertices are guaranteed provable. The other cycle is remained equivalent to circumstances of non-overlapping cycle and at least one of these vertices should be in . If one of the vertices in the overlapping part is in , then both cycles are guaranteed provable. In this case, still, at least one vertex is selected in each cycle. Cases are similar for more than two over lapping cycles. ∎

Lemma 14.

If there are cycles in the dependency graph, then at least one vertex should be included in .


It is clear by Lemma 12 and 13. ∎

In the framework, two components may be implemented in different strategies according to specific domains: findSingleRule and CoverCycle. To implement findSingleRule, pruning techniques are needed as the search space is large and useful candidates are sparse in the space. Given that semantic correlations may be strong in domain specific KBs, cycles are predicted to be large and frequent. Therefore, efficient coverage procedure is also required in the framework.

V Restore the Original KB

As the dependency graph implies, if all the in-neighbours of some vertex in is in the KB, the vertex can be inferred by applying some rule in . Thus, in order to restore the original KB, we can iteratively apply each rule on current database until there is no more records inferred. Inference by a single rule can be done without full join in the relational data model. The algorithm is shown in Algorithm 2.

3:Inferred target records
4: equivalence classes of columns determined by
6:for body functor in  do
7:      predicates in that complies to constant restrictions in
9:end for
10:for equivalence class  do
11:     Filter by columns in
12:     Update indices in
13:end for
15:for grouped row in  do
16:      empty relation for inferred records
17:     for target column  do
18:          one condition column in that is equivalent to
20:     end for
21:     if there are unassigned target columns in  then
22:         Assign each of these columns of all values in
23:     end if
25:end for
Algorithm 2 Inference by a Single Pattern

The cost of single inference is proportional to the number of equivalence classes and to the size of relevant relations. The number of equivalence classes is proportional to the number of columns in . The cost for single inference is:

where is the arity of functor . From the implication of the dependency graph, inference operations are the same as visiting vertices along paths in the graph. Thus, the maximum number of iterations is no larger than the maximum length of simple paths in . The overall cost of decompression is:

where the maximum length of simple path in .

Lemma 15.

When has reached to its maximum, the worst case cost of restoring is .


According to the definition, . When , all vertices in form one single simple path. In this case, there can only be one rule in and only one relation in . And the maximum number of arguments is also , otherwise the rule cannot summarize . Therefore, the worst case cost is:

Other cases are the same. ∎

Vi Conclusion

In this paper, we introduced a framework for extracting essence from factual knowledge. Theoretical proofs are also given for key properties of the framework. To put it into practice, more concrete work is required to design and analyze in findSingleRule and CoverCycle.


  • [1] C. Belth, X. Zheng, J. Vreeken, and D. Koutra (2020)

    What is normal, what is strange, and what is missing in an knowledge graph

    In The Web Conference, Cited by: §I.
  • [2] L. A. Galárraga, C. Teflioudi, K. Hose, and F. Suchanek (2013) AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In Proceedings of the 22nd international conference on World Wide Web, pp. 413–422. Cited by: §I.