Feature Analysis for Assessing the Quality of Wikipedia Articles through Supervised Classification

12/06/2018
by   Elias Bassani, et al.
0

Nowadays, thanks to Web 2.0 technologies, people have the possibility to generate and spread contents on different social media in a very easy way. In this context, the evaluation of the quality of the information that is available online is becoming more and more a crucial issue. In fact, a constant flow of contents is generated every day by often unknown sources, which are not certified by traditional authoritative entities. This requires the development of appropriate methodologies that can evaluate in a systematic way these contents, based on `objective' aspects connected with them. This would help individuals, who nowadays tend to increasingly form their opinions based on what they read online and on social media, to come into contact with information that is actually useful and verified. Wikipedia is nowadays one of the biggest online resources on which users rely as a source of information. The amount of collaboratively generated content that is sent to the online encyclopedia every day can let to the possible creation of low-quality articles (and, consequently, misinformation) if not properly monitored and revised. For this reason, in this paper, the problem of automatically assessing the quality of Wikipedia articles is considered. In particular, the focus is on the analysis of hand-crafted features that can be employed by supervised machine learning techniques to perform the classification of Wikipedia articles on qualitative bases. With respect to prior literature, a wider set of characteristics connected to Wikipedia articles are taken into account and illustrated in detail. Evaluations are performed by considering a labeled dataset provided in a prior work, and different supervised machine learning algorithms, which produced encouraging results with respect to the considered features.

READ FULL TEXT

page 1

page 4

research
06/01/2021

Is it a click bait? Let's predict using Machine Learning

In this era of digitisation, news reader tend to read news online. This ...
research
01/26/2020

Information Credibility in the Social Web: Contexts, Approaches, and Open Issues

In the Social Web scenario, large amounts of User-Generated Content (UGC...
research
05/09/2019

Methodology for accurately assessing the quality perceived by users on 360VR contents

To properly evaluate the performance of 360VR-specific encoding and tran...
research
02/16/2023

The role of online attention in the supply of disinformation in Wikipedia

Wikipedia and many User-Generated Content (UGC) communities are known fo...
research
10/28/2022

Polarization and reliability of news sources in Wikipedia

Wikipedia is the largest online encyclopedia: its open contribution poli...
research
12/28/2018

Wikibook-Bot - Automatic Generation of a Wikipedia Book

A Wikipedia book (known as Wikibook) is a collection of Wikipedia articl...
research
10/14/2019

Online Disinformation and the Role of Wikipedia

The aim of this study is to find key areas of research that can be usefu...

Please sign up or login with your details

Forgot password? Click here to reset