Fair multilingual vandalism detection system for Wikipedia

06/02/2023
by   Mykola Trokhymovych, et al.
0

This paper presents a novel design of the system aimed at supporting the Wikipedia community in addressing vandalism on the platform. To achieve this, we collected a massive dataset of 47 languages, and applied advanced filtering and feature engineering techniques, including multilingual masked language modeling to build the training dataset from human-generated data. The performance of the system was evaluated through comparison with the one used in production in Wikipedia, known as ORES. Our research results in a significant increase in the number of languages covered, making Wikipedia patrolling more efficient to a wider range of communities. Furthermore, our model outperforms ORES, ensuring that the results provided are not only more accurate but also less biased against certain groups of contributors.

READ FULL TEXT
research
09/14/2016

Transliteration in Any Language with Surrogate Languages

We introduce a method for transliteration generation that can produce tr...
research
04/05/2022

Considerations for Multilingual Wikipedia Research

English Wikipedia has long been an important data source for much resear...
research
04/25/2017

280 Birds with One Stone: Inducing Multilingual Taxonomies from Wikipedia using Character-level Classification

We propose a simple, yet effective, approach towards inducing multilingu...
research
05/15/2023

Characterizing Image Accessibility on Wikipedia across Languages

We make a first attempt to characterize image accessibility on Wikipedia...
research
10/21/2020

Multilingual Contextual Affective Analysis of LGBT People Portrayals in Wikipedia

Specific lexical choices in how people are portrayed both reflect the wr...
research
03/06/2013

Japanese-Spanish Thesaurus Construction Using English as a Pivot

We present the results of research with the goal of automatically creati...
research
05/03/2020

Tailoring and Evaluating the Wikipedia for in-Domain Comparable Corpora Extraction

We propose an automatic language-independent graph-based method to build...

Please sign up or login with your details

Forgot password? Click here to reset