Named Entity Linking (NEL) task seeks to link the entity mentions in a given text to their relevant entities in a knowledge base (KB). NEL is a crucial process to advance the Web of Linked Data. It is also useful for many applications such as Semantic Search, Text Classification, Reasoning and Question & Answering. It can enrich the document representation which may improve the performance of many applications.
NEL is composed of two phases: candidate selection and candidate ranking or disambiguation. A NEL system selects the potential candidates for each named entity mention in a document, it ranks them according to their relevance in order to link the mention to the most relevant entity.
Relation between a mention and an entity is many-to-many, each mention denotes many entities and each entity could be denoted by different mentions. For example, the word Paris refers to Paris (capital of France), Paris (a city in USA), Paris (a prince of Troy in Greek mythology) …etc, and ”The capital of France” can be denoted by Paris, ”City of Light” or ”the French Capital” ..etc. NEL task seeks to resolve this ambiguity and assign one entity to one mention according to the context.
Much research has parametrised their methods by using supervised approach [Hoffart et al., 2011, Durrett and Klein, 2014, Ganea et al., 2016, Francis-Landau et al., 2016] or unsupervised approach but they tuned their methods to work only on one dataset [Pershina et al., 2015, Alhelbawy and Gaizauskas, 2014], therefore these methods cannot be considered as a robust solution for different datasets. Other studies have proved the effectiveness of their methods on different datasets using a little number of parameters such as DOSER [Zwicklbauer et al., 2016], AGISTICS [Usbeck et al., 2014]. In this work, we follow the last two systems and use a collective graph-based approach but with almost no parameters.
The graph-based methods construct a graph from the candidate entities for each mention in a document and apply a centrality method in order to get the best entity for each mention.
Some studies have shown that the entity popularity, which considers that the best entity is the most frequent entity for a given mention, is a strong baseline for entity linking[Hachey et al., 2013, Cheng and Roth, 2013]. It is also very fast method therefore in contrast to other similar works, instead of replacing the popularity method with a collective graph-based approach, we propose to improve the popularity by using a collective graph-based method.
Hence, we assume that the best entity exists in the top x results given by the popularity method, and we take only these x results to use them for constructing the disambiguation graph which minimises the noise in the graph as many unsuitable nodes and edges will be avoided and the size of the graph will be reduced, which leads to more accurate and faster system, such system can be efficiently applied to Big Data.
In this work, we exploit the anchor texts in Wikipedia to construct a mention-entity dictionary in order to be used for candidate selection. We also extract the outgoing linking from each Wikipedia pages which refers to another Wikipedia page to construct entity-entity dictionary in order to be used for constructing the disambiguation graph for each document. The mention-entity dictionary will be used to select the top candidates for each mention, then we construct a graph between these candidates using the links from the entity-entity dictionary, afterthat five centrality measures will be applied on the graph, each one gives a score for each entity, a mention will be linked to the entity which gets the most relevant score among other candidates. We also compare the performance of these five methods on four different datasets in order to choose the best centrality measure overall datasets.
The rest of this paper is organised as follows: Section 2 is devoted for problem definition and formulation, Section 3 gets into the related work, Section 4 describes the proposed system, Section 5 presents the centrality measures which we have applied, Section 5 describes the datasets, experiments and results and Section 6 concludes the paper with some future perspectives.
2 Problem Definition and Formulation
Let be a set of mentions in a document d. Let K be a knowledge base (KB), and be a set of entities from K which represent the mentions: m1 refers to e1, m2 to e2 and so forth. Our problem is to find the best E for a given M specifically the best entity ei for each mention mi, therefore we aim to find a mapping between each mention and entity given the other mentions and K.
We approximate the solution by decomposing the problem: we first limit the number of entities by candidate selection phase which selects a few number instead of all K, then we compute which is the popularity method to select the most relevant x candidate entities to be used for constructing a graph in order to pick up the most relevant entity. Thus, the calculation of is approximated by selecting the top x candidates for each mention, then reranking them after constructing a disambiguation graph of all candidates in M.
3 Related Work
NEL consists of two phases, the first one is candidate selection which aims to select the most relevant entities for a mention from a knowledge base. Most research have exploited the link structure in Wikipedia for constructing a mention-entity dictionary where each mention may refer to different entities. For example, In AIDA [Hoffart et al., 2011], authors proposed ”Yago Means” which is derived from Wikipedia, DOSER [Zwicklbauer et al., 2016] and AGISTICS [Usbeck et al., 2014] extracted the rdfs:label attribute of DBpedia which is also derived from Wikipedia. [Chang et al., 2016] constructed 4 dictionaries from Wikipedia, then a strategy to disambiguate the mentions have been proposed to use these dictionaries. Others have used a corpus of web links, [Chisholm and Hachey, 2015] showed that using 34 millions web links instead of Wikipedia gives a similar performance but combining Wikipedia with web links outperforms both of them.
In this work we exploit the link structure of Wikipedia for building a mention-entity dictionary, we do not go farther for this phase because we are actually more interested in exploring the second phase.
The second phase is the candidate disambiguation in which one entity will be chosen from the candidate set for each mention. There are two main approaches for this phase: single mention disambiguation and collective disambiguation.
The first one disambiguates one entity mention at a time without considering the effect of other entity mentions, Support Vector Machine has been firstly used[Bunescu and Pasca, 2006], then a large-scale system for entity disambiguation by incorporating different features has been presented [Cucerzan, 2007], statistical methods with rich relational analysis has been incorporating [Cheng and Roth, 2013]
, and recently, Convolotional neural networks that compute the similarity between a mention and an entity have been proposed[Francis-Landau et al., 2016].
The collective approach disambiguates jointly all entity mentions, it tries to model the interdependencies between the candidate entities for all mentions. This approach reformulates the problem as a global optimisation problem which is NP-hard for which many approximations have been proposed. [Ratinov et al., 2011] proposed a local and global methods to address entity linking problem, while the local methods take into account the similarity between the mention and the candidate entity, the global one is based on the entire document and all mentions were disambiguated concurrently in order to produce coherent results. [Hoffart et al., 2011] built a weighted graph of mentions and candidate entities, and computes a dense subgraph that approximates the best joint mention-entity mapping. A probabilistic method that makes use of graphical models to perform collective disambiguation [Ganea et al., 2016]. [Alhelbawy and Gaizauskas, 2014] used PageRank to rank the candidate entity, [Pershina et al., 2015] proposed Personalized PageRank to filter out the noise introduced by incorrect candidate entities. AGISTICS [Usbeck et al., 2014] used HITS algorithm to rank the graph nodes, while DOSER [Zwicklbauer et al., 2016] found that PageRank performs better on their disambiguation graph. [Brando et al., 2016] proposed different centrality measures to collectively disambiguate the authors’s names in a French corpus.
Different works have concluded that choosing the most popular candidate entity is a strong baseline and it is difficult to beat [Hachey et al., 2013, Cheng and Roth, 2013]. Here, we are based on this observation therefore we propose to improve the popularity method by constructing a graph between the top candidate entities which expected to reduce the noise in the graph and is not time-consuming as when using all candidate entities. HITS, PageRank, Degree, Betweenness, Closeness are compared to obtain the most robust measure which works well over different datasets.
4 Proposed System
In this section, we present how we use Wikipedia for candidate selection and Disambiguation graph construction.
4.1 Candidate Selection
For selecting the candidate entities for each mention, we construct a mention-entity pairs dictionary from English Wikipedia, we extract the title of each page and parse the page content to extract its outgoing links (anchor texts). We consider that entities are the Wikipedia page titles, each title is a mention-entity pair, both the mention and the entity are the title itself111A title is an entity, but also it could be used to refer to itself therefore it is also a possible mention.. Each outgoing link is composed of an anchor text and a link to another Wikipedia page, the anchor text is the mention and the link is the entity. Table 1 shows an example for the top 3 candidate entities for the mention ”sun” ordered by their occurrences in Wikipedia. Table 2 shows some statistics of Wikipedia version which is used to construct the mention-entity dictionary.
|Number of Pages||5,206,974|
|Number o Pages having links||5,191,234|
|Number o Mentions||74,753,045|
|Number o Distinct mentions||10,584,594|
4.2 Disambiguation Graph Construction
The graph-based method constructs a disambiguation graph where each node represents a candidate entity and each edge represents a link between two entities. Therefore, we need to find the links between the candidate entities. For this purpose, we also leverages Wikipedia for extracting entity-entity relations dictionary. We parse each Wikipedia page to extract all outgoing links, and for each link we add an entry to the dictionary where the first entity is the actual Wikipedia page and the second one is the target page. Table 3 shows the most frequent outgoing links from Sun_Microsystems page.
|Actual Page||Target Page|
Figure 1 shows the disambiguation graph for the sentence ”Java was created by Sun.” where there are two mentions: Java and Sun. For each mention we take the top most popular candidates using mention-entity dictionary, then for each candidate we look up in the entity-entity dictionary to find if there is a link to another entity in the graph, therefore a bidirectional link is made from Sun_Microsystems to Java_(programming_language).
4.3 Proposed Methods
Popularity-Based approach treats each mention independently, we compute the popularity of an entity given a mention. The popularity in this regard is the conditional probability which is based on the number of the occurrences of the entity-mention pair in Wikipedia normalized by the occurrences of all entities. Then we select the most popular entity for each mention. This approach does not take into account the context. To surpass this limitation we construct a disambiguation graph to improve the accuracy of this method.
Constructing a disambiguation graph is very cost for each document, so we propose to use the popularity to limit the number of candidate entities set for a mention to only 3. The reason behind that is that in most cases the correct entity for a mention appears in the first three candidates and using more number of candidates is not useful because the added entities and edges will increase the ambiguity in the graph and negatively affect the results (it should note that some datasets are not affected by increasing the number of candidates such AIDA-CONLL). After constructing the disambiguation graph, we apply five centrality measures to compute a score for each candidate entity. Later on, we compute the average accuracy for each measure over 4 datasets to get the most reliable measure.
5 Centrality Measures
Once the directed graph is constructed, centrality measures are computed to assign a score to each node. We use five centrality measures:
The degree centrality for a node e is the fraction of nodes it is connected to.
The degree centrality values are normalized by dividing by the maximum possible degree in a graph n-1 where n is the total number of nodes. Degree is the first and the simplest centrality measure.
Hyperlink-Induced Topic Search (HITS) [Kleinberg, 1999]
computes two numbers for a node: Authorities estimates the node value based on the incoming links and Hubs estimates the node value based on outgoing links. It supposes that good authorities are pointed to by good hubs and good hubs point to good authorities. HITS starts with initial two values for each node, then it successively refines them. Let, is the authority and hub scores for a node i, E the set of directed edges in the graph, represents the directed edge from node i to node j.
PageRank [Page et al., 1998] computes a ranking of the nodes in the graph based on the structure of the incoming links, the underlying assumption is that more important nodes are likely to receive more links from other nodes. It starts by assigning an initial score for each node then it successively refines them .
represents the number of outgoing links from j.
Betweenness centrality measure [Brandes, 2001] is based on shortest paths, it is the number of these shortest paths that pass through the node. [Freeman (1977) REF] It is given for a node i by the expression:
where is total number of shortest paths from node s to node t and is the number of those paths that pass through i.
The more central a node is, the closer it is to all other nodes. Thus, Closeness was defined as the reciprocal of the fareness [Freeman, 1978]:
where d(j,i) is the distance between node i and j.
6 Experiments and Evaluation
The proposed system is implemented in Python, NetworkX library is used for graph construction and centrality measures. Four datasets are used to evaluate it and get the most efficient centrality measure. In this section, we describe the used datasets and the experiments which we have done and the results 222The source code will be soon available on GitHub..
For evaluating our systems, we have chosen 4 well-known and publicly available datasets:
ACE2004: This dataset is a subset of ACE2004 coreference documents which contains 253 mentions in 56 news articles [Ratinov et al., 2011].
AIDA/CONLL-TestB: This dataset contains 231 news articles, it is derived from CONLL-2003 shared task and annotated by [Hoffart et al., 2011].
MSNBC: This dataset contains 20 news articles, it comprises a wide range of entities [Cucerzan, 2007].
Microposts-2014-Test: This tweet dataset contains 687 tweets [Usbeck et al., 2015].
|Number of documents||231||36||20||687|
|Number of mentions per document||19.1||8.1||36.2||2|
|Number of mentions||4420||290||723||1405|
|mentions with candidate||4155||216||460||1091|
All annotations in these datasets refer to Wikipedia or DBpedia. Both of them contain the same entities whose URLS can be easily converted from one to another. Some statistics about these datasets are shown in Table 4. It should note that we removed all the mentions which their annotated entities does not exist in the Wikipedia version which we use.
6.2 Experiments and Results
For evaluating our proposed method, we construct the disambiguation graph using the first top 3 candidate entities for each mention, then we use each centrality measure to compute the importance of each entity. For each mention we choose the entity which has the highest score among all 3 candidates. The following Table 5 shows the results in terms of micro-accuracy (percentage of mentions linked correctly) and f-score. We have evaluated the two stages: Overall performance which shows the total performance of the system over the two stages. The disambiguation performance in which we remove all the mentions which do not have the correct entity in their candidate set, because in this case the disambiguation algorithm will certainly not be capable to get the correct answer therefore we should not penalise this algorithms because of the weakness of the candidate selection phase.
For AIDA/CONLL-TestB, the popularity method achieves 65.78% in terms of overall accuracy, all centrality measures improve it, the best one is the Degree centrality (+14%), the worst one is the Closeness (+6%). We remark that the centrality measures affect this dataset more than others datasets, one possible reason is that the entities in this dataset are more connected through wikipedia. The Degree is also the best for disambiguation accuracy, it improves the performance (+14%). The popularity method achieves 68.82 (+3%) more than the overall accuracy, which means that the selection phase is fair in this dataset (few mentions do not contain the correct entity in their candidate set).
The popularity for MSNBC achieves 61.19% (5% less than AIDA/CONLL-TestB), all centrality measures improve this baseline, the best one is HITS (72.2%) then Degree (71.27%), the worst one is Closeness (65.86%). The disambiguation accuracy for popularity is 78.6%, 17% more than the overall one which means that the selection phase is not fair enough for this dataset, more work on this phase should be done to improve the results. HITS is still the best with 90.23% and Degree (86.92% ) is 4% lower while it was 1% lower with overall accuracy which means that some incorrect entity candidates for the removed mentions may affect positively the results, that leads us to think more about the disambiguation graph and how we can minimise the impact of incorrect candidate entities. The results reach 90%, therefore the graph-base centrality works very well on this dataset where the average number of mentions per document is 36.2.
For ACE2004, the popularity gives 78.42 % and all measures improve it. The best one is Degree with 81.33% and the worst are Betweenness and Closeness (79.25%). The disambiguation accuracy for popularity is 87.84%, 9% more than overall accuracy which also indicates the importance of selection phase. Degree centrality is also the best 89.19%. The centrality measures improve the results in this dataset up to 3% while in the previous two datasets were up to 9% and 15%. Therefore, centrality measures does not enough fair for this dataset in which the the average number of mentions per document is 8.1.
Likewise for Microposts-2014-Test, the popularity gives 56.09 % which is very low comparing with other datasets and all measures improve it a little bit. The best one is Degree with 58.08% and the worst is Betweenness (56.8%). The disambiguation accuracy for popularity is 69.65%, 13% more than overall accuracy which also indicates the importance of selection phase. Degree centrality is also the best (72.6% ) centrality measures. Therefore, centrality measures does not enough fair for this dataset in which the document is very short and the average number of mentions per document is 2.
|Method||Overall Accuracy||Dis. Accuracy|
|Accuracy||F Score||Precision||Recall||Accuracy||F Score||Precision||Recall|
Thus, the centrality measures can improve a strong baseline for entity linking. The Degree centrality measure which is a simple and fast computed one gives the best results for overall and disambiguation performances. HITS is very close to Degree in terms of overall performance but it is less useful in terms of overall performance. Table 5 shows the average accuracy for each centrality measure over the four datasets.
The performance of Degree centrality may be surprising because of the proved performance of HITS and PageRank, but these two algorithms have been designed to work on the Web, they suppose that there is some spam pages and the search engine want to give a lower rank for these pages, in Wikipedia this assumption is not applied, and the pages which have lots of outgoing links and less incoming links may still important pages, therefore a Degree centrality may be a suitable measure and designing new measures should take in consideration the difference between Web and Wikipedia.
The number of mentions per document plays an important role in graph-based method, Constructing one graph for each document may not be the best approach, constructing a graph for each paragraph may be a more reliable approach, but if the paragraph does not have enough number of mentions (as in tweets), the disambiguation graph will not reflect the interactions between the entities, in contrast if it has many number of mentions, the graph will have a lot of noise. Segmenting the document to construct a graph for each segment may increase the accuracy but also the complexity. The fact that a simple measure as degree could give a good results make the graph construction not necessary so that the segmentation will not decrease the performance of such system.
We proposed to use graph-based methods to improve a strong baseline based on popularity for entity linking. We use the popularity method as a method to reduce the size of the disambiguation graph. The experimental results show that the collective methods improve the performance of popularity. Degree centrality, a simple and fast measure gives the best result, the fact that this measure does not need to construct a graph to be computed open the door to construct different disambiguation graphs for each document. Our future work will focus on constructing different graphs for a document and more attention will be devoted for selection phase which is critical to improve the overall performance.
This research is funded by Labex Observatoire de la vie littéraire (OBVIL) and Laboratoire d’Informatique de Paris 6 (LIP6), Pierre and Marie Curie University, UMR 7606, Paris, France.
- [Alhelbawy and Gaizauskas, 2014] Alhelbawy, A. and Gaizauskas, R. J. (2014). Graph Ranking for Collective Named Entity Disambiguation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, June 22-27, 2014, Baltimore, MD, USA, Volume 2: Short Papers, pages 75–80.
- [Brandes, 2001] Brandes, U. (2001). A Faster Algorithm for Betweenness Centrality. Journal of Mathematical Sociology, 25:163–177.
- [Brando et al., 2016] Brando, C., Frontini, F., and Ganascia, J.-G. (2016). REDEN: Named Entity Linking in Digital Literary Editions Using Linked Data Sets. Complex Systems Informatics and Modeling Quarterly, 4(7):60–80.
- [Bunescu and Pasca, 2006] Bunescu, R. and Pasca, M. (2006). Using Encyclopedic Knowledge for Named Entity Disambiguation. In Proceesings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-06), pages 9–16, Trento, Italy.
- [Chang et al., 2016] Chang, A., Spitkovsky, V. I., Manning, C. D., and Agirre, E. (2016). A comparison of Named-Entity Disambiguation and Word Sense Disambiguation. In Chair), N. C. C., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., and Piperidis, S., editors, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Paris, France. European Language Resources Association (ELRA).
- [Cheng and Roth, 2013] Cheng, X. and Roth, D. (2013). Relational inference for wikification. In In EMNLP.
- [Chisholm and Hachey, 2015] Chisholm, A. and Hachey, B. (2015). Entity Disambiguation with Web Links. Transactions of the Association for Computational Linguistics, 3:145–156.
Cucerzan, S. (2007).
Large-Scale Named Entity Disambiguation Based on
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 708–716, Prague, Czech Republic. Association for Computational Linguistics.
- [Durrett and Klein, 2014] Durrett, G. and Klein, D. (2014). A Joint Model for Entity Analysis: Coreference, Typing, and Linking. Transactions of the Association for Computational Linguistics, 2:477–490.
[Francis-Landau et al., 2016]
Francis-Landau, M., Durrett, G., and Klein, D. (2016).
Capturing Semantic Similarity for Entity Linking with Convolutional Neural Networks.In Proceedings of the North American Association for Computational Linguistics, San Diego, California, USA. Association for Computational Linguistics.
- [Freeman, 1978] Freeman, L. C. (1978). Centrality in social networks conceptual clarification. Social Networks, 1(3):215 – 239.
- [Ganea et al., 2016] Ganea, O.-E., Ganea, M., Lucchi, A., Eickhoff, C., and Hofmann, T. (2016). Probabilistic Bag-Of-Hyperlinks Model for Entity Linking. In Proceedings of the 25th International Conference on World Wide Web, pages 927–938. International World Wide Web Conferences Steering Committee.
- [Hachey et al., 2013] Hachey, B., Radford, W., Nothman, J., Honnibal, M., and Curran, J. R. (2013). Evaluating Entity Linking with Wikipedia. Artif. Intell., 194:130–150.
- [Hoffart et al., 2011] Hoffart, J., Yosef, M. A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., and Weikum, G. (2011). Robust Disambiguation of Named Entities in Text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’11, pages 782–792, Stroudsburg, PA, USA. Association for Computational Linguistics.
- [Kleinberg, 1999] Kleinberg, J. M. (1999). Authoritative Sources in a Hyperlinked Environment. J. ACM, 46(5):604–632.
- [Page et al., 1998] Page, L., Brin, S., Motwani, R., and Winograd, T. (1998). The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
- [Pershina et al., 2015] Pershina, M., He, Y., and Grishman, R. (2015). Personalized Page Rank for Named Entity Disambiguation. In NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31 - June 5, 2015, pages 238–243.
- [Ratinov et al., 2011] Ratinov, L., Roth, D., Downey, D., and Anderson, M. (2011). Local and Global Algorithms for Disambiguation to Wikipedia. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT ’11, pages 1375–1384, Stroudsburg, PA, USA. Association for Computational Linguistics.
- [Usbeck et al., 2014] Usbeck, R., Ngonga Ngomo, A.-C., Auer, S., Gerber, D., and Both, A. (2014). AGDISTIS - Graph-Based Disambiguation of Named Entities using Linked Data. In 13th International Semantic Web Conference.
- [Usbeck et al., 2015] Usbeck, R., Röder, M., Ngonga Ngomo, A.-C., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., Ferragina, P., Lemke, C., Moro, A., Navigli, R., Piccinno, F., Rizzo, G., Sack, H., Speck, R., Troncy, R., Waitelonis, J., and Wesemann, L. (2015). GERBIL – General Entity Annotation Benchmark Framework. In 24th WWW conference.
- [Zwicklbauer et al., 2016] Zwicklbauer, S., Seifert, C., and Granitzer, M. (2016). DoSeR - A Knowledge-Base-Agnostic Framework for Entity Disambiguation Using Semantic Embeddings. In The Semantic Web. Latest Advances and New Domains - 13th International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29 - June 2, 2016, Proceedings, pages 182–198.