Detection of fake news on CoViD-19 on Web Search Engines

by   V. Mazzeo, et al.

In early January 2020, after China reported the first cases of the new coronavirus (SARS-CoV-2) in the city of Wuhan, unreliable and not fully accurate information has started spreading faster than the virus itself. Alongside this pandemic, people have experienced a parallel infodemic, i.e., an overabundance of information, some of which misleading or even harmful, that has widely spread around the globe. Although Social Media are increasingly being used as information source, Web Search Engines, like Google or Yahoo!, still represent a powerful and trustworthy resource for finding information on the Web. This is due to their capability to capture the largest amount of information, helping users quickly identify the most relevant, useful, although not always the most reliable, results for their search queries. This study aims to detect potential misleading and fake contents by capturing and analysing textual information, which flow through Search Engines. By using a real-world dataset associated with recent CoViD-19 pandemic, we first apply re-sampling techniques for class imbalance, then we use existing Machine Learning algorithms for classification of not reliable news. By extracting lexical and host-based features of associated Uniform Resource Locators (URLs) for news articles, we show that the proposed methods, so common in phishing and malicious URLs detection, can improve the efficiency and performance of classifiers. Based on these findings, we think that usage of both textual and URLs features can improve the effectiveness of fake news detection methods.



There are no comments yet.


page 8

page 9


Cross-SEAN: A Cross-Stitch Semi-Supervised Neural Attention Model for COVID-19 Fake News Detection

As the COVID-19 pandemic sweeps across the world, it has been accompanie...

Feature Modulation to Improve Struggle Detection in Web Search: A Psychological Approach

Searcher struggle is important feedback to Web search engines. Existing ...

Fake News Data Collection and Classification: Iterative Query Selection for Opaque Search Engines with Pseudo Relevance Feedback

Retrieving information from an online search engine is the first and mos...

Nation-wide Mood: Large-scale Estimation of People's Mood from Web Search Query and Mobile Sensor Data

The ability to estimate the current affective statuses of web users has ...

Leveraging Selective Prediction for Reliable Image Geolocation

Reliable image geolocation is crucial for several applications, ranging ...

Scraping SERPs for Archival Seeds: It Matters When You Start

Event-based collections are often started with a web search, but the sea...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.