Scalable and Generalizable Social Bot Detection through Data Selection

11/20/2019
by   Kai-Cheng Yang, et al.
0

Efficient and reliable social bot classification is crucial for detecting information manipulation on social media. Despite rapid development, state-of-the-art bot detection models still face generalization and scalability challenges, which greatly limit their applications. In this paper we propose a framework that uses minimal account metadata, enabling efficient analysis that scales up to handle the full stream of public tweets of Twitter in real time. To ensure model accuracy, we build a rich collection of labeled datasets for training and validation. We deploy a strict validation system so that model performance on unseen datasets is also optimized, in addition to traditional cross-validation. We find that strategically selecting a subset of training data yields better model accuracy and generalization than exhaustively training on all available data. Thanks to the simplicity of the proposed model, its logic can be interpreted to provide insights into social bot characteristics.

READ FULL TEXT
research
03/14/2018

How to evaluate sentiment classifiers for Twitter time-ordered data?

Social media are becoming an increasingly important source of informatio...
research
06/17/2020

Catching them red-handed: Real-time Aggression Detection on Social Media

The rise of online aggression on social media is evolving into a major p...
research
11/10/2016

Why is it Difficult to Detect Sudden and Unexpected Epidemic Outbreaks in Twitter?

Social media services such as Twitter are a valuable source of informati...
research
06/18/2021

Graph-based Joint Pandemic Concern and Relation Extraction on Twitter

Public concern detection provides potential guidance to the authorities ...
research
03/11/2016

Towards using social media to identify individuals at risk for preventable chronic illness

We describe a strategy for the acquisition of training data necessary to...
research
03/15/2022

SISL:Self-Supervised Image Signature Learning for Splicing Detection and Localization

Recent algorithms for image manipulation detection almost exclusively us...
research
04/27/2020

"Unsex me here": Revisiting Sexism Detection Using Psychological Scales and Adversarial Samples

To effectively tackle sexism online, research has focused on automated m...

Please sign up or login with your details

Forgot password? Click here to reset