What if we had no Wikipedia? Domain-independent Term Extraction from a Large News Corpus

09/17/2020
by   Yonatan Bilu, et al.
0

One of the most impressive human endeavors of the past two decades is the collection and categorization of human knowledge in the free and accessible format that is Wikipedia. In this work we ask what makes a term worthy of entering this edifice of knowledge, and having a page of its own in Wikipedia? To what extent is this a natural product of on-going human discourse and discussion rather than an idiosyncratic choice of Wikipedia editors? Specifically, we aim to identify such "wiki-worthy" terms in a massive news corpus, and see if this can be done with no, or minimal, dependency on actual Wikipedia entries. We suggest a five-step pipeline for doing so, providing baseline results for all five, and the relevant datasets for benchmarking them. Our work sheds new light on the domain-specific Automatic Term Extraction problem, with the problem at hand being a domain-independent variant of it.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2022

WikiDoMiner: Wikipedia Domain-specific Miner

We introduce WikiDoMiner, a tool for automatically generating domain-spe...
research
12/26/2018

DBpedia NIF: Open, Large-Scale and Multilingual Knowledge Extraction Corpus

In the past decade, the DBpedia community has put significant amount of ...
research
01/22/2021

Unsupervised Technical Domain Terms Extraction using Term Extractor

Terminology extraction, also known as term extraction, is a subtask of i...
research
12/15/2021

Event Linking: Grounding Event Mentions to Wikipedia

Comprehending an article requires understanding its constituent events. ...
research
11/14/2022

Between News and History: Identifying Networked Topics of Collective Attention on Wikipedia

The digital information landscape has introduced a new dimension to unde...
research
05/03/2020

Tailoring and Evaluating the Wikipedia for in-Domain Comparable Corpora Extraction

We propose an automatic language-independent graph-based method to build...
research
09/21/2021

When expertise gone missing: Uncovering the loss of prolific contributors in Wikipedia

Success of planetary-scale online collaborative platforms such as Wikipe...

Please sign up or login with your details

Forgot password? Click here to reset