Orphan Articles: The Dark Matter of Wikipedia

06/06/2023
by   Akhil Arora, et al.
0

With 60M articles in more than 300 language versions, Wikipedia is the largest platform for open and freely accessible knowledge. While the available content has been growing continuously at a rate of around 200K new articles each month, very little attention has been paid to the accessibility of the content. One crucial aspect of accessibility is the integration of hyperlinks into the network so the articles are visible to readers navigating Wikipedia. In order to understand this phenomenon, we conduct the first systematic study of orphan articles, which are articles without any incoming links from other Wikipedia articles, across 319 different language versions of Wikipedia. We find that a surprisingly large extent of content, roughly 15% (8.8M) of all articles, is de facto invisible to readers navigating Wikipedia, and thus, rightfully term orphan articles as the dark matter of Wikipedia. We also provide causal evidence through a quasi-experiment that adding new incoming links to orphans (de-orphanization) leads to a statistically significant increase of their visibility in terms of the number of pageviews. We further highlight the challenges faced by editors for de-orphanizing articles, demonstrate the need to support them in addressing this issue, and provide potential solutions for developing automated tools based on cross-lingual approaches. Overall, our work not only unravels a key limitation in the link structure of Wikipedia and quantitatively assesses its impact, but also provides a new perspective on the challenges of maintenance associated with content creation at scale in Wikipedia.

READ FULL TEXT
research
11/02/2020

Analyzing Wikidata Transclusion on English Wikipedia

Wikidata is steadily becoming more central to Wikipedia, not just in mai...
research
12/26/2018

DBpedia NIF: Open, Large-Scale and Multilingual Knowledge Extraction Corpus

In the past decade, the DBpedia community has put significant amount of ...
research
02/12/2019

WikiLinkGraphs: A complete, longitudinal and multi-language dataset of the Wikipedia link networks

Wikipedia articles contain multiple links connecting a subject to other ...
research
02/16/2023

The role of online attention in the supply of disinformation in Wikipedia

Wikipedia and many User-Generated Content (UGC) communities are known fo...
research
09/23/2020

Crosslingual Topic Modeling with WikiPDA

We present Wikipedia-based Polyglot Dirichlet Allocation (WikiPDA), a cr...
research
05/08/2023

Dreams Are More "Predictable” Than You Think

A consistent body of evidence suggests that dream reports significantly ...
research
09/18/2018

Mind Your POV: Convergence of Articles and Editors Towards Wikipedia's Neutrality Norm

Wikipedia has a strong norm of writing in a 'neutral point of view' (NPO...

Please sign up or login with your details

Forgot password? Click here to reset