Fake News Data Collection and Classification: Iterative Query Selection for Opaque Search Engines with Pseudo Relevance Feedback

12/23/2020
by   Aviad Elyashar, et al.
0

Retrieving information from an online search engine is the first and most important step in many data mining tasks. Most of the search engines currently available on the web, including all social media platforms, are black-boxes (a.k.a opaque) supporting short keyword queries. In these settings, retrieving all posts and comments discussing a particular news item automatically and at large scales is a challenging task. In this paper, we propose a method for generating short keyword queries given a prototype document. The proposed algorithm interacts with the opaque search engine to iteratively improve the query. It is evaluated on the Twitter TREC Microblog 2012 and TREC-COVID 2019 datasets showing superior performance compared to state of the art and is applied to automatically collect large scale dataset for training machine learning classifiers for fake news detection. The classifiers training on 70,000 labeled news items and more than 61 million associated tweets automatically collected using the proposed method obtained impressive performance of AUC and accuracy of 0.92, and 0.86, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2021

Detecting Fake News Using Machine Learning : A Systematic Literature Review

Internet is one of the important inventions and a large number of person...
research
03/22/2021

Detection of fake news on CoViD-19 on Web Search Engines

In early January 2020, after China reported the first cases of the new c...
research
04/05/2021

A Heuristic-driven Uncertainty based Ensemble Framework for Fake News Detection in Tweets and News Articles

The significance of social media has increased manifold in the past few ...
research
09/13/2022

CovidMis20: COVID-19 Misinformation Detection System on Twitter Tweets using Deep Learning Models

Online news and information sources are convenient and accessible ways t...
research
03/15/2023

Automated Query Generation for Evidence Collection from Web Search Engines

It is widely accepted that so-called facts can be checked by searching f...
research
09/14/2021

MMCoVaR: Multimodal COVID-19 Vaccine Focused Data Repository for Fake News Detection and a Baseline Architecture for Classification

The outbreak of COVID-19 has resulted in an "infodemic" that has encoura...
research
05/24/2020

How Does That Sound? Multi-Language SpokenName2Vec Algorithm Using Speech Generation and Deep Learning

Searching for information about a specific person is an online activity ...

Please sign up or login with your details

Forgot password? Click here to reset