REMI: Mining Intuitive Referring Expressions on Knowledge Bases

11/04/2019 ∙ by Luis Galárraga, et al. ∙ 11

A referring expression (RE) is a description that identifies a set of instances unambiguously. Mining REs from data finds applications in natural language generation, algorithmic journalism, and data maintenance. Since there may exist multiple REs for a given set of entities, it is common to focus on the most intuitive ones, i.e., the most concise and informative. In this paper we present REMI, a system that can mine intuitive REs on large RDF knowledge bases. Our experimental evaluation shows that REMI finds REs deemed intuitive by users. Moreover we show that REMI is several orders of magnitude faster than an approach based on inductive logic programming.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

A referring expression (RE) is a description that identifies a set of entities unambiguously. For instance, the expression “x is the capital of France” is an RE for Paris, because no other city holds this title. The automatic construction of REs is a central task in natural language generation (NLG). The goal of NLG is to describe concepts in an accurate and compact manner using structured data such as a knowledge base (KB). REs also find applications in automatic data summarization, algorithmic journalism, virtual smart assistants, and KB maintenance, e.g., in query generation. Quality criteria for REs is context-dependent. For instance, NLG and data summarization aim at intuitive, i.e., short and informative descriptions. In this vibe, it may be more intuitive to describe Paris as “the city of the Eiffel Tower” than as “the resting place of Victor Hugo”. Indeed, the world-wide prominence of the Eiffel Tower makes the first RE more interesting and informative to an average user.

Some approaches can mine intuitive REs from semantic data (Dale, 1992; Reiter and Dale, 1992; Horacek, 2003; Krahmer et al., 2003). Conceived at the dawn of the Semantic Web, these methods are not suitable for current KBs for three main reasons. Firstly, they were exclusively designed for NLG, which often focuses on mining REs on scenes111The exhaustive description of a place and its objects. Such kind of datasets usually have much fewer predicates and instances than today’s KBs. While RE mining can be naturally formulated as an inductive logic programming (ILP) task, our experimental evaluation shows that RE mining challenges ILP solutions because ILP can lead to extremely long rules. Besides, ILP is traditionally not concerned with the intuitiveness of its results. Secondly, most existing approaches are limited to conjunctive expressions on the attributes of the entities, e.g., . However, our experience with today’s KBs suggests that this state-of-the-art language does not encompass all possible intuitive expressions. For instance, to describe Johann J. Müller, we could resort to the fact that he was the supervisor of the supervisor of Albert Einstein, i.e., , which goes beyond the traditional language bias due to the existentially quantified variable . Thirdly, state-of-the-art RE miners define intuitiveness for REs in terms of number of atoms. In that spirit, the single-atom REs and are equally concise and desirable as descriptions for Paris, even though the latter may not be informative to users outside France. We tackle the aforementioned limitations with a solution to mine intuitive REs on large KBs. How to use such REs is beyond the scope of this work, however we provide hints about potential use cases. In summary, our contributions are:

  • [leftmargin=*]

  • We propose a scheme to quantify the intuitiveness of entity descriptions from a KB in number of bits, i.e., we estimate their Kolmogorov complexity 

    (Zellner et al., 2001).

  • We study to which extent descriptions of low estimated Kolmogorov complexity are deemed intuitive by users.

  • We present REMI, a method to mine intuitive (concise and informative) REs on large KBs. REMI extends the state-of-the-art language bias for REs by allowing additional existentially quantified variables as in . This design choice increases the chances of finding intuitive REs for a set of target entities.

2. Preliminaries

2.1. RDF Knowledge Bases

This work focuses on mining REs on RDF222Resource Description Framework knowledge bases (KBs). A KB is a set of assertions in the form of triples (also called facts) with , , and . In this formulation, is a set of entities such as London, is a set of predicates, e.g., cityIn, is a set of literal values such as strings or numbers, and is a set of blank nodes, i.e., anonymous entities. An example of an RDF triple is . KBs often include assertions such as that state the class of an entity. Furthermore, for each predicate we define its inverse predicate as the relation consisting of all facts such that 333To be RDF-compliant, is defined only for triples with .

2.2. Referring Expressions

2.2.1. Atoms.

An atom is an expression such that is a predicate and , are either variables or constants. We say an atom has matches in a KB if there exists a function from the variables of the atom to constants in the KB such that . The operator returns a new atom such that the constants in the input atom are untouched, and variables are replaced by their corresponding mappings according to . We call a bound atom and a matching assignment.

2.2.2. Expressions.

We say that two atoms are connected if they share at least one variable argument. We define a subgraph expression , rooted at a variable , as a conjunction of transitively connected atoms such that (1) there is at least an atom that contains as first argument444If is the second argument as in , we can rewrite the atom as and (2) for , every pair of atoms is transitively connected via at least another variable besides . Examples of subgraph expressions rooted at are:

  1. [leftmargin=*]

A subgraph expression has matches in a KB if there is an assigment from the variables in the expression to constants in the KB such that for in the subgraph expression. Subgraph expressions are the building blocks of referring expressions, thus we define an expression as a conjunction of subgraph expressions rooted at the same variable such that the expressions have only —called the root variable—as common variable. An expression has matches in a KB if there is an assigment that yields matches for every subgraph expression of . Finally, we say an expression with root variable is a referring expression (RE) for a set of target entities in a KB iff the following two conditions are met:

  1. [leftmargin=*]

  2. , i.e., for every target entity , there exists a matching assignment in that binds the root variable to .

  3. , in other words, no matching assignment binds the root variable to entities outside the set of target entities.

For example, consider a KB with accurate and complete information about countries and languages, as well as the following expression rooted at and consisting of two subgraph expressions:

We say that is an RE for entities in because matching assignments can only bind to these two countries.

3. Remi

Given an RDF KB and a set of target entities , REMI returns the most intuitive RE—a conjunction of subgraph expressions—that describes unambiguously the input entities in . We define intuitiveness for REs as a trade-off between compactness and informativeness, thus intuitive REs should be simple, i.e., they should be concise and resort to concepts that users are likely to know and understand. We first quantify intuitiveness for REs in number of bits in Section 3.1. We then elaborate on REMI’s language bias and algorithm in Sections 3.2 and 3.3 respectively.

3.1. Quantifying intuitiveness

There may be multiple ways to describe a set of entities unambiguously. For instance, and are both REs for Paris. Our goal is therefore to quantify the intuitiveness of such expressions without human intervention. We say that an RE is more intuitive than an RE , if , where denotes the Kolmogorov complexity; so we define intuitiveness as the inverse of complexity. The Kolmogorov complexity of a string (e.g., an expression) is a measure of the absolute amount of information conveyed by and is defined as the length in bits of ’s shortest effective binary description (Zellner et al., 2001). If denotes such binary description and is the program that can decode into , where denotes length in bits. The exact calculation of requires us to know the shortest way to compress as well as the most compact program for decompression. Due to the intractability of ’s calculation, applications can only approximate it by means of potentially suboptimal encodings and programs, that is:

Once we fix a compression/decompression scheme , applications only need to worry about the term . To account for the dimension of informativeness, our proposed encoding builds upon the observation that intuitive expressions resort to prominent predicates and entities. For example, it is very natural and informative to describe Paris as the capital of France, because the notion of capital is well understood and France is a very prominent entity. In contrast, it would be more complex to describe Paris in terms of its twin cities, because this concept is less prominent than the concept of capital city. In other words, prominent predicates and entities are informative because they relate to concepts that users often recall. We can thus devise a code for predicates and entities by constructing a ranking by prominence. The code for a predicate (entity ) is the binary representation of its position in the ranking and the length of its code is . In this way, prominent concepts are encoded with fewer bits. We now define the estimated Kolmogorov complexity of a single-atom subgraph expression as:

In the formula , where is the position of predicate in the ranking of predicates. It follows that . The term

accounts for the chain rule of the Kolmogorov complexity. It measures the conditional complexity of

given predicate . It is calculated as the logarithm of ’s rank in the ranking of objects of predicate . For instance, let us assume that is the predicate city mayor. The chain rule models the fact that once the concept of mayor has been conveyed, the context becomes narrower and the user needs to discriminate among fewer concepts, in this example, only city mayors. The chain rule also applies to subgraph expressions with multiple atoms. For instance, the complexity of corresponds to the following formula:

We highlight that the code for party must account for the fact that this predicate appears in a first-to-second-argument join with the predicate mayor. Hence, the second term in the sum amounts to , the log of the rank of party among those predicates that allow for first-to-second-argument joins with mayor in the KB. Likewise, the complexity of the Socialist party in the third term is derived from a ranking of all the parties that have mayors among their members, i.e., all the bindings for the variable in . If a city can be unambiguously described by its non-prominent mayor , it may be simpler to instead omit the person and describe her by paying the price of an additional predicate and a well-known party.

We can now estimate the Kolgomorov complexity of a referring expression as the sum of the complexities of the individual subgraph expressions, namely as . Consider as an example the following RE:

It follows that we can calculate as:

It is vital to remark, however, that this formula makes a simplification. Consider the RE for Switzeland. While adds the complexity of the predicate officialLang twice, an optimal code would count the predicate once and encode its multiplicity. In fact, this optimal code would be applied to every common sub-path with multiplicity in the list of subgraph expressions. This fact worsens the quality of as an approximation of for such kind of expressions, however it is not a problem in our setting as long as we use for comparison purposes.

Lastly, we discuss how to rank concepts by prominence. Wikipedia-based KBs provide information about the hyperlink structure of the entity pages, thus one alternative is to use the Wikipedia page rank (PR). The downside of this metric is that it is undefined for predicates. A second alternative is frequency, i.e., number of mentions of a concept. Frequency could be measured in the KB or extracted from exogenous sources such as a crawl of the Web or a search engine. Even though search engines may quantify prominence more accurately (by providing real-time feedback and leveraging circumstantiality), we show that endogenous sources are good enough for this goal. In line with other works that quantify popularity for concepts in KBs (Thalhammer et al., 2016), we use (i) the number of facts where a concept occurs in the KB (fr), and (ii) the concept’s page rank in Wikipedia (pr). We denote the resulting complexity measures using these prominence metrics by and respectively. We use fr whenever pr is undefined.

3.2. Language Bias

Most approaches for RE mining define REs as conjunctions of atoms with bound objects, thus we call this language bias, the state-of-the-art language bias. REMI extends this language by allowing atoms with additional existentially quantified variables. This design decision allows us to replace tail entities with high Kolmogorov complexity with entities that are more prominent and hence more intuitive. For instance, consider the RE for Johann J. Müller. Saying that he was the supervisor of A. Kleiner may not say much to an arbitrary user. By allowing an additional variable, we can consider the expression “he was the supervisor of the supervisor of Albert Einstein”, namely . We highlight that Einstein is simpler to describe than Kleiner, which makes the second expression, albeit longer, more informative and overall more intuitive than the first one. This shows that in the presence of irrelevant object entities, further atoms may help increase intuitiveness. Nevertheless, in the general case longer expressions tend to be more complex. This phenomenon becomes more palpable when the additional atoms do not describe the root variable as in (“she speaks a language in a subfamily of the Italic languages”). This expression introduces two additional variables that turn comprehension and translation to natural language more effortful. Besides, further atoms and specially additional variables can dramatically increase the size of the search space of REs, which is exponential in the number of possible subgraph expressions. Our observations reveal, e.g., that a second additional variable increases by more than 270% the number of subgraph expressions that REMI must handle in DBpedia. Conversely, increasing the number of atoms from 2 to 3 while keeping only one additional variable, leads to an increase of 40%. Based on all these observations, we restrict REMI’s language bias to subgraph expressions with at most one additional variable and 3 atoms. The 3-atom constraint goes in line with rule mining approaches on large KBs (Galárraga et al., 2015). This decision disqualifies our last example, but still allows expressions such as (she was born, lived and died in the same place). Table 1 summarizes REMI’s language of subgraph expressions.

1 atom
Path + star
2 closed atoms
3 closed atoms
Table 1. REMI’s subgraph expressions.
Figure 1. Search space example.

3.3. Algorithm

REMI implements a depth-first search (DFS555DFS approaches are preferred over BFS (breadth-first search) due to their smaller memory footprint) on conjunctions of the subgraph expressions common to all the target entities. Assume that the KB knows only three common subgraph expressions , , and for the entities Rennes and Nantes, such that . The tree in Figure 1 illustrates the search space for our example. Each node in the tree is an expression, i.e., a conjunction of subgraph expressions and its complexity is in parentheses. When visiting a node, DFS must test whether the corresponding expression is an RE, i.e., whether the expression identifies exclusively the target entities. If the test fails, the strategy should move to the node’s first child. If the test succeeds, DFS must verify whether the expression is less complex than the least complex RE seen so far. If it is the case, this RE should be remembered, and DFS can prune the search space by backtracking. To see why, imagine that in Figure 1 is an RE. In this case, all REs prefixed with this expression (the node’s descendants) will also be REs. However, all these REs have higher values for , i.e., they are more complex. This means that we can stop descending in the tree pruning the node in Figure 1. We call this step a pruning by depth. We can do further pruning if we leverage the order of the entities. In our example, if is an RE, any expression prefixed with for must be more complex and can be therefore skipped. We call this a side pruning. All these ideas are formalized by Algorithm 1 that takes as input a KB as well as the entities to describe, and returns an RE of minimal complexity according to . Line 1 calculates all subgraph expressions in REMI’s language bias (Figure 1) that are common to the target entities. They are then sorted by increasing complexity and put in a priority queue (line 2). The depth-first exploration is performed in lines 4-8. At each iteration, the least complex subgraph expression is dequeued (line 5) and sent to the subroutine DFS-REMI (line 6) with the rest of the queue. This subroutine returns the most intuitive RE prefixed with . If is smaller than the complexity of the best solution found so far (line 7), we remember it666We define . If DFS-REMI returns an empty expression, we can conclude that there is no RE for the target entities (line 8). To see why, recall that DFS will, in the worst case, combine with all remaining expressions that are more complex. If none of such combinations is an RE, there is no solution for in .

Input: a KB: , the target entities:
Output: an RE of minimal complexity:
2 create priority queue from in ascending order by
4 while  do
6        DFS-REMI(, , , )
7        if  then
8        if  then return
Algorithm 1 REMI

We now sketch how to calculate the subgraph expressions of an entity (line 1). Contrary to DFS-REMI, the routine subgraphs-expressions carries out a breadth-first search. Starting with atomic expressions of the form (where binds to ), the routine derives all two-atom expressions, namely paths of the form and closed conjunctions . The two-atom paths are extended with atoms of the form to produce the path+star combinations, whereas the closed conjunctions are used to derive the closed expressions of three atoms (see Table 1).

The subroutine DFS-REMI is detailed in Algorithm 2, which takes as input a subgraph expression , the priority queue of subgraph expressions (without ), the target entities and a KB . We use a stack initialized with the empty subgraph expression in order to traverse the search space in a depth-first manner (line 1). Each iteration pushes a subgraph expression to the stack, starting with (line 3). The conjunction of the elements of the stack defines an expression that is evaluated on the KB to test if it is an RE for the target entities (lines 4-5). If is the least complex RE seen so far, the algorithm remembers it (line 6). Adding more expressions to can only increase its complexity, hence line 7 performs a pruning by depth, so that all descendants of are abandoned. Line 8 backtracks anew to achieve a side pruning. If backtracking leads to an empty stack, DFS-REMI cannot do better, and can thus return (line 9).

Input: a subgraph expression: , priority queue: , target entities: , a KB:
Output: an RE of minimal complexity prefixed in:
2 foreach  do
5       if   then
6             if  then
9             if  then return
Algorithm 2 DFS-REMI

3.4. Parallel REMI

We can parallelize Algorithm 1 if we allow multiple threads to concurrently dequeue elements from the priority queue of subgraph expressions and explore the subtrees rooted at those elements independently. This implies to execute the loop in lines 4-8 in parallel. This new strategy, called P-REMI, preserves the logic of REMI with three differences. First, the least complex solution can be read and written by all threads. Second, if a thread found no solution from its exploration rooted at subgraph expression , it must signal all the other threads rooted at subgraph expressions () to stop. For instance, if a thread finished its exploration rooted at in Fig. 1, any exploration rooted at or is superfluous as it covers expressions that are less specific than those rooted at . Third, before testing if an expression is an RE, each thread should verify whether there is already a solution of lower complexity. If so, the thread can backtrack until reaching a node of even lower complexity than . Since these differences mostly affect the logic of DFS-REMI, we detail a new procedure called P-DFS-REMI in Algorithm 3. The new routine has the same signature as DFS-REMI plus a reference to the best solution . Lines 1 and 2 initialize the stack and create a new priority queue from the original one. The DFS exploration starts in line 3. The first task of P-DFS-REMI is to backtrack iteratively while the expression represented by the stack is less complex than the best solution (line 6). If P-DFS-REMI backtracked to the root node, it means the algorithm cannot find a better solution from now on, and can return (line 7). If backtracking did not remove any expression from the stack (check in line 8), P-DFS-REMI proceeds exactly as its sequential counterpart DFS-REMI (lines 9-14). Conversely, if the stack was pruned by the loop in line 6, P-DFS-REMI starts a new iteration since the corresponding expression in the resulting stack must have been tested in a previous iteration. For proper implementation the access to must be synchronized among the different threads.

Input: a subgraph expression: , priority queue of subgraph expressions: , the target entities: , : best solution found so far, a KB:
Output: an RE of minimal complexity:
3 while  do
6       while  do if  then return if  then
8             if   then
9                   if  then
12                   if  then return
Algorithm 3 P-DFS-REMI

3.5. Implementation

3.5.1. Data storage.

We store the KB in a single file using the HDT (Fernández et al., 2013) format. HDT is a binary compressed format, conceived for fast data transfer, that offers reasonable performance for search and browse operations without prior decompression. HDT libraries support only the retrieval of bindings for atoms , leaving the execution of additional query operators to upper layers. We used the Apache Jena framework777 (version 3.7) as access layer.

3.5.2. Algorithms.

REMI and P-REMI are implemented in Java 8. Their runtime is dominated by two phases: (1) the construction of the priority queue of subgraph expressions (line 2 in Alg. 1), and (2) the DFS exploration (lines 4-8). The first phase is computationally expensive because it requires the calculation of

on large sets of subgraph expressions, leading to the execution of expensive queries on the KB. To alleviate this fact, we parallelized the construction and sorting of the queue and applied a series of pruning heuristics. First, the routine

subgraphs-expressions ignores expressions of the form with , since blank nodes are by conception irrelevant entities. However, the routine always derives paths that “hide” blank nodes, that is, (such that binds to and ) is always considered. Conversely, we do not derive multi-atom subgraph expressions from atoms with object entities among the 5% most prominent entities. For example, we do not explore extensions of such as , because the complexity of the additional atom will likely be higher than the complexity of a simple entity such as Germany. Finally, REMI requires the execution of the same queries multiple times, thus query results are cached in a least-recently-used fashion.

3.5.3. Complexity function.

The calculation of requires the construction of multiple rankings on prominence for concepts in the KB. For example, to calculate , we need the rank among all predicates, as well as the rank among all capital cities. Even though predicates are always evaluated against the same ranking, an entity may rank differently depending on the context. We could precompute for every , in the KB, however we can leverage the correlation between prominence and rank to reduce the amount of stored information. It has been empirically shown that the frequency of terms in textual corpora follows a power-law distribution (Manning et al., 2008). If is the frequency of the k most frequent term in a corpus, for some constants . If we treat all the facts with the same predicate as a corpus, we can estimate the number of bits of an entity given from its conditional frequency as follows:


We can thus learn the coefficients and that map frequency in the KB to complexity in bits. While this still requires us to precompute the conditional rankings, Equation 1 allows us to “compress” them as a collection of pairs of coefficients (one per predicate). Our results on two KBs confirm the linear correlation between the logarithms of rank and frequency, since the fitted functions exhibit an average measure of 0.85 in DBpedia and 0.88 in Wikidata888Values closer to 1 denote a good fit.. Likewise, this power-law correlation extrapolates to the Wikipedia page rank, which reveals an average of 0.91 in DBpedia.

4. Experimental Evaluation

We evaluated REMI along two dimensions: output quality and runtime performance. The evaluation was conducted on DBpedia (Auer et al., 2007) and Wikidata (Erxleben et al., 2014). For DBpedia (v. 2016-10) we used the files instance types, mapping-based objects, and literals. The resulting dataset amounts to 42.07M facts and 1951 predicates. For Wikidata we used the dump provided in (Galárraga et al., 2017) that contains 15.9M facts and 752 predicates. For both KBs we materialized the inverse facts —such that — for all objects among the top 1% most frequent entities.

4.1. Qualitative Evaluation

We carried out four rounds of experiments in order to evaluate the intuitiveness of REMI’s descriptions. These experiments comprise three user studies, and an evaluation with a benchmark for entity summarization. The cohort for the user studies consisted mainly of computer science students, researchers, and university staff. It also included some of their friends and family members.

4.1.1. Evaluation of .

Recall that subgraph expressions are the building blocks of REs, thus intuitive REs should make use of simple (concise and informative) components. In order to measure whether function captures users’ notion of intuitiveness, our first study asked participants to rank by simplicity five subgraph expressions from 24 sets of DBpedia entities. The expressions were obtained from the common subgraph expressions ranked by Alg. 1 in line 2 using , and include (i) the top 3 as well as a baseline defined by (ii) the worst ranked, and (iii) a random subgraph expression. We manually translated the subgraph expressions to natural language statements in the shortest possible way by using the textual descriptions (predicate rdfs:label) of the concepts when available. The entity sets (of sizes 1 to 3) were randomly sampled from the 5% most frequent entities in four classes: Person, Settlement, Album Film, and Organization. Most of these entities are tail entities, however, looking at the top of the frequency ranking ensures that the entities have enough subgraph expressions to rank. We show the results in Table 2 for our two variants of using fr and pr as prominence measure. We observe that precision@1 is low for both versions of . The reason behind this result is that people usually deem the predicate type the simplest whereas REMI often ranks it second or third (16 times for ). This shows the need of special treatment for the type predicate as suggested by (Reiter and Dale, 1992). Nevertheless, the high values for the other metrics show a positive correlation between the preferences of the users and the function . In 88% of the cases, the three simplest subgraph expressions according to are among the three simplest ones according to users.

metric # responses p@1 p@2 p@3
44 0.380.42 0.660.18 0.880.09
48 0.430.42 0.530.25 0.720.16
Table 2. Average precision@k (

) and standard deviation for the ranking of subgraph expressions computed by two variants of

in DBpedia

4.1.2. Evaluation of REMI’s output.

The second study requested users to rank by simplicity the answer of REMI and a baseline consisting of other REs encountered during search space traversal. We hand-picked 20 sets of prominent entities of the same DBpedia classes used to evaluate , and provided 3 to 5 solutions for each set including the most intuitive RE according to REMI. The entities were hand-picked to guarantee the existence of at least two solutions that are not too similar to each other, i.e., they are not proper subsets of other solutions. Based on our findings with the first study and the higher complexity of this task, we provided the type of the entities in the question statement and used fr as notion of prominence. We report an average MAP (mean average precision) of 0.640.17 for this task on 51 answers when we assume REMI’s solution as the only relevant answer. To give the reader an idea of this result, we recall that a MAP of 1 denotes full agreement between REMI and the users, while a MAP of 0.5 means that REMI’s solution is always among the user’s top 2 answers. Furthermore, we remark that in 6 out of the 20 studied cases, REMI reported the same solution with and as intuitiveness metric. When we asked users to choose the simplest RE between the answers of the two variants of REMI, 59% of the users on average voted for the solution provided by .

4.1.3. User’s perceived quality and lessons learned.

In order to measure the perceived quality of the reported REs, we requested users to grade the interestingness of 35 Wikidata REs in a scale from 1 to 5, where 5 means the user deems the description interesting based on her personal judgment. The entities were taken from the top 7 of the frequency ranking for the classes Company, City, Film, Human, and Movie. Our results on 86 answers exhibit an average score of 2.650.71, with 11 descriptions scoring at least 3. During the exchanges with the participants, some of them made explicit their preference for short but at the same time informative REs. The latter dimension is related to the notions of pertinence of concepts and narrative interest. For instance, users deemed uniformative and scored badly (1.45/5) REMI’s portrayal of Neil Amstrong as someone “whose place of burial is a part of the earth” (in allusion to the fact that he was buried in the Atlantic Ocean). When asked to choose between the REs and for two movies, 95% of the users preferred the first one999They correspond to the answers of REMI with the two variants of . Both REs had more or less the same length when translated to natural language, but the second one conveys less information and resorts to a domain-unrelated entity (i.e., Buddhism). These observations suggest that prominence captures the notion of simplicity, but it does not always accurately model the dimension of informativeness. While these examples might discourage the use of existential variables in descriptions, we remark that users also liked REs such as for Rennes and Nantes, or for the Italian movie “Altri templi”, as they deemed the first one quite pertinent, and the second one narratively interesting. Other well-ranked descriptions include “the CEOs is Andrej Babiš, the Prime Minister of the Czech Republic” (scored 3.97/5) for Agrofert in Wikidata, “she died of aplastic anemia” for Marie Curie, and “they were both places of the Inca Civil War” for Ecuador and Peru (the two latter in DBpedia). Finally, we highlight the impact of noise and incompleteness in the accuracy and informativeness of the solutions. For instance, REMI cannot describe France as the country with capital Paris, because Paris is also the capital of the former Kingdom of France in DBpedia.

4.1.4. Evaluation on benchmark for entity summarization.

Despite being different tasks, RE mining and entity summarization in KBs are related problems (see Section 5 for a discussion). For this reason, we evaluated REMI on the gold standard used for the evaluation of FACES (Gunaratna et al., 2015) and LinkSUM (Thalhammer et al., 2016), two state-of-the-art approaches for entity summarization on RDF KBs. The gold standard consists of sets of reference summaries of 5 and 10 attributes for 80 prominent hand-picked entities from DBpedia. The entity summaries were manually constructed by 7 experts in semantic Web, and consist of pairs predicate-object chosen from DBpedia with diversity, prominence, and uniqueness as selection criteria. We ran REMI with the state-of-the-art language bias, and excluded the subgraph expressions with the predicate rdf:type and the inverse predicates to make our results compliant with the language of the summaries. We compare the reference summaries with the top 5 and top 10 most intuitive subgraph expressions (single atoms in this case) according to and in Table 3.

Method top 5 top 10
quality PO quality O quality PO quality O
FACES 0.930.54 1.660.57 2.920.94 4.331.01
LinkSUM 1.200.60 1.890.55 3.200.87 4.821.06
REMI 0.680.18 1.310.27 2.260.34 3.700.46
REMI 0.730.13 1.210.29 2.240.46 3.750.23
Table 3. REMI’s top 5 and top 10 subgraph expressions compared with solutions for entity summarization in RDF KBs.

Quality is defined in (Gunaratna et al., 2015) as the average overlap between the reported and the gold standard summaries. This overlap can be calculated at the level of the object entities (O) or the pairs predicate-object (PO). Even though the quality of REMI’s summaries exhibits a lower variability than other approaches, its average quality is generally lower. This happens because entity summarization approaches optimize for a different objective. Besides (non-strict) unambiguity, and the use of popular concepts, these approaches optimize for diversity too. This implies that among multiple semantically close subgraph expressions, summaries must choose only one. We remark that such a constraint makes sense in a setting when a description may consist of 10 atoms, however it may be too restrictive for settings such as NLG or query generation where compacteness matters. Finally, we highlight that if we merge the top-10 gold-standard summaries, the REs reported by REMI yield averages of 0.53, 0.62, and 0.31 for the P, O, and PO precisions when using as prominence metric, i.e., 62% of the RE’s used object entities that appear in the summaries. The values for are slightly worse, except for the PO precision (0.38). Most of the REs reported by REMI consisted of a single atom.

4.2. Runtime Evaluation

4.2.1. Opponent.

RE mining can be conceptually formulated as an ILP task. Hence, we compare the runtime of REMI and a state-of-the-art parallel ILP system designed for large KBs, namely AMIE+ (Galárraga et al., 2015). We chose AMIE+ over other solutions, because it allows us to mine rules of arbitrary length out of the box. AMIE+ mines Horn rules of the form , such as , on RDF KBs. We call the left-hand side the head and the right-hand side the body of the rule. AMIE+ focuses on closed rules, i.e., rules where all variables appear in at least two atoms. The system explores the space of closed rules in a breadth-first search manner and reports those above given thresholds on support and confidence. The support of a rule is the number of cases where the rule predicts a fact . If we normalize this measure by the total number of predictions made by the rule, we obtain its confidence. RE mining for a target entity set is equivalent to rule mining with AMIE+, if we instruct the system to find rules of the form , where is a surrogate predicate with facts for all . In this case, the body of the rule becomes our RE. We observe, however, that AMIE+ cannot capture REMI’s language bias exactly. This happens because AMIE’s language is defined in terms of a maximum number of atoms , whereas REMI allows an arbitrary number of multi-atom subgraph expressions. Since most of REMI’s descriptions are not longer than 3 atoms, we set for AMIE+. We also set thresholds of and 1.0 for support and confidence respectively. This is because an RE should predict the exact set of target entities, neither subsets nor supersets. AMIE+ does not define a complexity score for rules and outputs all REs for the target entities, thus we use to rank AMIE’s output and return the least complex RE.

Language DBpedia Wikidata
#solutions amie+ remi p-remi speed-up #sol. amie+ remi p-remi speed-up
Standard 63 97.4k 10.3k 576 13.5kx, 2.44x 44 115.5k 1.06k 76.2 142kx, 4.7x
REMI’s 65 508.2k 66.5k 28.9k 5218x, 21.4x 44 608.3k 21.7k 33.8k 6476x, 7.1x
Table 4. REMI’s runtime performance (in seconds) on DBpedia and Wikidata. The column speed-up denotes the average speed-up of P-REMI w.r.t. AMIE+ and REMI.

4.2.2. Results.

We compared the runtimes of REMI and AMIE+ on a server with 48 cores (Intel Xeon E2650 v4), 192GB of RAM101010AMIE assumes the entire KB fits to main memory, and 1.2T of disk space (10K SAS). We tested the systems on 100 sets of DBpedia and Wikidata entities taken from the same classes used in the qualitative evaluation. The sets were randomly chosen so that they consist of 1, 2, and 3 entities of the same class in proportions of 50%, 30%, and 20%. We chose sets of at most 3 entities because small sets translate into more subgraph expressions leading to more challenging settings. We mined REs for those sets of entities according to (i) the standard language of conjunctive bound atoms, and (ii) REMI’s language of conjunctions of subgraph expressions. We show the total runtime among all sets for AMIE+ and REMI in Table 4. For each group of entities, we set a timeout of 2 hours. The values in red account for the number of timeouts, thus cells with red superscripts define runtime lower bounds. We observe that AMIE+ already timed out 23 times with the state-of-the-art language. In particular, AMIE+ is optimized for rules without constant arguments in atoms, such as , thus its performance is heavily affected when bound variables are allowed in atoms (Galárraga et al., 2015). In contrast REMI and P-REMI are on average 3 and 4 orders of magnitude (up to 142k times) faster than AMIE+ in this language thanks to the tailored pruning heuristics detailed in Section 3.5. In the worst case REMI was confronted with a space of 62 subgraph expressions for the state-of-the-art language bias. For REMI’s language bias, however, this number increased to 25.2k, which is challenging even for REMI (8 timeouts in total). Despite this boost in complexity, multithreading makes it manageable: P-REMI can be at least 4.7x on average faster than REMI for the extended language bias and at least

21x faster for the state-of-the-art language, even though PREMI’s total time on Wikidata is higher than REMI’s for the extended language bias. The latter phenomenon is caused by the high variance of the speed-up, which ranges from 0.003x—for small search spaces where the overhead of multithreading is overwhelming—to 197x in Wikidata. Extending the language bias also increases the time to sort the subgraph expressions (line 2 in Alg. 

1), which jumps from 0.39% to 9.1% for P-REMI in DBpedia. Finally, we observe that the extended language bias slightly increases the chances of finding a solution (column #solutions in Table 4) in DBpedia. We observe this phenomenon among the sets with more than one entity.

5. Related Work

Mining REs on structured data is a central task in natural language generation (NLG). Systems such as Epicure (Dale, 1989) and IDAS (Reiter et al., 1992) provided descriptions in natural language based on REs mined on domain-specific KBs —equipment parts and recipes respectively. NLG methods consider criteria such as brevity, context, user’s prior knowledge, lexical preference, and psychological factors when producing simple and informative REs (Pechmann, 1989). The full brevity algorithm (Dale, 1992), based on breadth-first search, is among the first approaches to mine REs on semantic data. This method mines short REs consisting of conjunctions of bound atoms, which we call the standard language bias. Nonetheless, the results of (Dale, 1992) are not always intuitive, because it disregards factors such as user’s prior knowledge and lexical preference. The incremental approach proposed in (Reiter and Dale, 1992) took these criteria into account by modeling user’s knowledge as Boolean metatags on facts, and lexical preference as a manually-constructed ranking of predicates. While this solution produces more intuitive REs, providing these metadata can be tedious for large KBs with thousands of classes and predicates. In REMI, the complexity ranking for subgraph expressions captures both lexical preference and user’s knowledge automatically to a certain degree, as subgraph expressions with familiar concepts are ranked higher. Other approaches (Horacek, 2003) allow for REs with disjunctions, e.g., . Albeit more expressive, the language of REs with disjunctions is in general more difficult to interpret. REMI proposes conjunctive expressions with existentially quantified variables instead. These are more expressive than standard REs, and can be intuitive as shown in Section 4.1. The work in (Krahmer et al., 2003)

finds standard conjunctive REs in knowledge graphs by means of a branch and bound algorithm in combination with several cost functions. Such functions can optimize for different aspects such as compactness. Unlike REMI, this approach was conceived to mine REs on scenes, hence it does not scale to large KBs. The largest graph tested had 256 vertices and 1.7K edges.

Other works focus on expressions that are similar to REs. Maverick (Zhang et al., 2018), e.g., mines exceptional facts on KBs. Given an entity such as Hillary Clinton and a context, e.g., “candidates to the US presidential election”, Maverick will report the fact that she is a female, as that makes her exceptional in the context. Unlike REMI, Maverick does not find REs in a strict sense; instead it reports descriptions that are rare among the entities in the context. Approaches for entity summarization (ES), such as (Thalhammer et al., 2016; Gunaratna et al., 2015), construct informative summaries of entities from a KB by considering groups of attributes that optimize for uniqueness, diversity, and prominence. Albeit related to RE mining, ES is concerned neither with compacteness nor with strict unambiguity: ES approaches usually take the size of the summary as input, and do not guarantee that the resulting summary is a strict RE. Besides, ES approaches has not been defined for sets of entities to the best of our knowledge. All these approaches mine expressions in the standard language bias.

6. Conclusion and Future Work

In this work we have presented REMI, a method to mine intuitive referring expressions on large RDF KBs. REMI relies on the observation that frequent concepts are more intuitive to users, and leverages this fact to quantify the simplicity (intuitiveness) of descriptions in bits. We have not targeted a particular application in this work, instead we aim at paving the way towards the automatic generation of descriptions in large KBs. Our results show that (1) real-time RE generation is often possible in large KBs and (2) a KB-based frequency ranking can provide intuitive descriptions despite the noise in KBs. While this latter factor impedes the fully automatic generation of intuitive REs for NLG purposes, our descriptions are applicable to scenarios such as computer-aided journalism and query generation in KBs. As future work we aim to investigate if external sources—such as the ranking provided by a search engine or external localized corpora—can yield even more intuitive REs that model users’ background more accurately. We also envision to relax the unambiguity constraint to mine REs with exceptions, and study more accurate models for the dimensions of informativeness and semantic relatedness for the concepts used in REs. We provide the source code of REMI as well as the experimental data at


  • (1)
  • Auer et al. (2007) Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A Nucleus for a Web of Open Data. In The Semantic Web. Springer, 722–735.
  • Dale (1989) Robert Dale. 1989. Cooking Up Referring Expressions. In Annual Meeting on Association for Computational Linguistics (ACL).
  • Dale (1992) Robert Dale. 1992. Generating Referring Expressions: Constructing Descriptions in a Domain of Objects and Processes. A Bradford Book, MIT.
  • Erxleben et al. (2014) Fredo Erxleben, Michael Günther, Markus Krötzsch, Julian Mendez, and Denny Vrandečić. 2014. Introducing Wikidata to the linked data web. In International Semantic Web Conference. Springer, 50–65.
  • Fernández et al. (2013) Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutiérrez, Axel Polleres, and Mario Arias. 2013. Binary RDF Representation for Publication and Exchange (HDT). Web Semantics 19 (2013), 22–41.
  • Galárraga et al. (2017) Luis Galárraga, Simon Razniewski, Antoine Amarilli, and Fabian M. Suchanek. 2017. Predicting Completeness in Knowledge Bases. In ACM International Conference on Web Search and Data Mining (WSDM).
  • Galárraga et al. (2015) Luis Galárraga, Christina Teflioudi, Katja Hose, and Fabian M. Suchanek. 2015. Fast Rule Mining in Ontological Knowledge Bases with AMIE+. The VLDB Journal 24, 6 (2015), 707–730.
  • Gunaratna et al. (2015) Kalpa Gunaratna, Krishnaparasad Thirunarayan, and Amit Sheth. 2015. FACES: Diversity-aware Entity Summarization Using Incremental Hierarchical Conceptual Clustering. In

    AAAI Conference on Artificial Intelligence

  • Horacek (2003) Helmut Horacek. 2003. A Best-first Search Algorithm for Generating Referring Expressions. In Conference on European Chapter of the Association for Computational Linguistics (EACL).
  • Krahmer et al. (2003) Emiel Krahmer, Sebastiaan van Erk, and André Verleg. 2003. Graph-based Generation of Referring Expressions. Computational Linguistics 29, 1 (2003), 53–72.
  • Manning et al. (2008) Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press.
  • Pechmann (1989) Thomas Pechmann. 1989. Incremental Speech Production and Referential Overspecification. Linguistics 27, 1 (1989), 89–110.
  • Reiter and Dale (1992) Ehud Reiter and Robert Dale. 1992. A Fast Algorithm for the Generation of Referring Expressions. In Conference on Computational Linguistics (COLING).
  • Reiter et al. (1992) Ehud Reiter, Chris Mellish, and John Levine. 1992. Automatic Generation of On-line Documentation in the IDAS Project. In

    Conference on Applied Natural Language Processing

  • Thalhammer et al. (2016) Andreas Thalhammer, Nelia Lasierra, and Achim Rettinger. 2016. LinkSUM: Using Link Analysis to Summarize Entity Data. In International Conference on Web Engineering (ICWE).
  • Zellner et al. (2001) Arnold Zellner, Hugo A. Keuzenkamp, Michael McAleer, et al. 2001. Simplicity, Inference and Modelling: Keeping it Sophisticatedly Simple. Cambridge Univ. Press.
  • Zhang et al. (2018) Gensheng Zhang, Damian Jimenez, and Chengkai Li. 2018. Maverick: Discovering Exceptional Facts from Knowledge Graphs. In International Conference on Management of Data (SIGMOD).