Wikipedia Cultural Diversity Dataset: A Complete Cartography for 300 Language Editions

01/23/2019
by   Marc Miquel-Ribe, et al.
0

In this paper we present the Wikipedia Cultural Diversity dataset. For each existing Wikipedia language edition, the dataset contains a classification of the articles that represent its associated cultural context, i.e. all concepts and entities related to the language and to the territories where it is spoken. We describe the methodology we employed to classify articles, and the rich set of features that we defined to feed the classifier, and that are released as part of the dataset. We present several purposes for which we envision the use of this dataset, including detecting, measuring and countering content gaps in the Wikipedia project, and encouraging cross-cultural research in the field of digital humanities.

READ FULL TEXT
research
03/30/2021

Tracking Knowledge Propagation Across Wikipedia Languages

In this paper, we present a dataset of inter-language knowledge propagat...
research
04/27/2015

Exploring semantically-related concepts from Wikipedia: the case of SeRE

In this paper we present our web application SeRE designed to explore se...
research
04/02/2019

The Tower of Babel Meets Web 2.0: User-Generated Content and its Applications in a Multilingual Context

This study explores language's fragmenting effect on user-generated cont...
research
05/05/2022

Introducing the Welsh Text Summarisation Dataset and Baseline Systems

Welsh is an official language in Wales and is spoken by an estimated 884...
research
05/21/2019

MultiWiki: Interlingual Text Passage Alignment in Wikipedia

In this article we address the problem of text passage alignment across ...
research
03/28/2023

A Perspectival Mirror of the Elephant: Investigating Language Bias on Google, ChatGPT, Wikipedia, and YouTube

Contrary to Google Search's mission of delivering information from "many...
research
09/04/2019

Cultural diversity and the measurement of functional impairment: A cross-cultural validation of the Amsterdam IADL Questionnaire

INTRODUCTION: Assessment of cognitively complex instrumental activities ...

Please sign up or login with your details

Forgot password? Click here to reset