Hedera: Scalable Indexing and Exploring Entities in Wikipedia Revision History

01/14/2017
by   Tuan Tran, et al.
0

Much of work in semantic web relying on Wikipedia as the main source of knowledge often work on static snapshots of the dataset. The full history of Wikipedia revisions, while contains much more useful information, is still difficult to access due to its exceptional volume. To enable further research on this collection, we developed a tool, named Hedera, that efficiently extracts semantic information from Wikipedia revision history datasets. Hedera exploits Map-Reduce paradigm to achieve rapid extraction, it is able to handle one entire Wikipedia articles revision history within a day in a medium-scale cluster, and supports flexible data structures for various kinds of semantic web study.

READ FULL TEXT
research
10/26/2021

A Map of Science in Wikipedia

In recent decades, the rapid growth of Internet adoption is offering opp...
research
05/03/2018

Scalable Semantic Querying of Text

We present the KOKO system that takes declarative information extraction...
research
02/25/2022

Mining Naturally-occurring Corrections and Paraphrases from Wikipedia's Revision History

Naturally-occurring instances of linguistic phenomena are important both...
research
05/24/2017

Analysing Timelines of National Histories across Wikipedia Editions: A Comparative Computational Approach

Portrayals of history are never complete, and each description inherentl...
research
04/26/2023

Toxic comments reduce the activity of volunteer editors on Wikipedia

Wikipedia is one of the most successful collaborative projects in histor...
research
08/04/2018

Evaluating Wikipedia as a source of information for disease understanding

The increasing availability of biological data is improving our understa...
research
03/20/2019

A Graph-structured Dataset for Wikipedia Research

Wikipedia is a rich and invaluable source of information. Its central pl...

Please sign up or login with your details

Forgot password? Click here to reset