Approaches for Enriching and Improving Textual Knowledge Bases
Verifiability is one of the core editing principles in Wikipedia, where editors are encouraged to provide citations for the added statements. Statements can be any arbitrary piece of text, ranging from a sentence up to a paragraph. However, in many cases, citations are either outdated, missing, or link to non-existing references (e.g. dead URL, moved content etc.). In total, 20% of the cases such citations refer to news articles and represent the second most cited source. Even in cases where citations are provided, there are no explicit indicators for the span of a citation for a given piece of text. In addition to issues related with the verifiability principle, many Wikipedia entity pages are incomplete, with relevant information that is already available in online news sources missing. Even for the already existing citations, there is often a delay between the news publication time and the reference time. In this thesis, we address the aforementioned issues and propose automated approaches that enforce the verifiability principle in Wikipedia, and suggest relevant and missing news references for further enriching Wikipedia entity pages.
READ FULL TEXT