Tracking Knowledge Propagation Across Wikipedia Languages

03/30/2021
by   Roldolfo Valentim, et al.
0

In this paper, we present a dataset of inter-language knowledge propagation in Wikipedia. Covering the entire 309 language editions and 33M articles, the dataset aims to track the full propagation history of Wikipedia concepts, and allow follow up research on building predictive models of them. For this purpose, we align all the Wikipedia articles in a language-agnostic manner according to the concept they cover, which results in 13M propagation instances. To the best of our knowledge, this dataset is the first to explore the full inter-language propagation at a large scale. Together with the dataset, a holistic overview of the propagation and key insights about the underlying structural factors are provided to aid future research. For example, we find that although long cascades are unusual, the propagation tends to continue further once it reaches more than four language editions. We also find that the size of language editions is associated with the speed of propagation. We believe the dataset not only contributes to the prior literature on Wikipedia growth but also enables new use cases such as edit recommendation for addressing knowledge gaps, detection of disinformation, and cultural relationship analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2019

Wikipedia Cultural Diversity Dataset: A Complete Cartography for 300 Language Editions

In this paper we present the Wikipedia Cultural Diversity dataset. For e...
research
12/26/2018

DBpedia NIF: Open, Large-Scale and Multilingual Knowledge Extraction Corpus

In the past decade, the DBpedia community has put significant amount of ...
research
12/02/2018

Why the World Reads Wikipedia: Beyond English Speakers

As one of the Web's primary multilingual knowledge sources, Wikipedia is...
research
12/01/2020

Introducing Inter-Relatedness between Wikipedia Articles in Explicit Semantic Analysis

Explicit Semantic Analysis (ESA) is a technique used to represent a piec...
research
09/18/2017

Robust clustering of languages across Wikipedia growth

Wikipedia is the largest existing knowledge repository that is growing o...
research
01/13/2023

Using the profile of publishers to predict barriers across news articles

Detection of news propagation barriers, being economical, cultural, poli...
research
12/10/2021

LSH methods for data deduplication in a Wikipedia artificial dataset

This paper illustrates locality sensitive hasing (LSH) models for the id...

Please sign up or login with your details

Forgot password? Click here to reset