BotArtist: Twitter bot detection Machine Learning model based on Twitter suspension

by   Alexander Shevtsov, et al.

Twitter as one of the most popular social networks, offers a means for communication and online discourse, which unfortunately has been the target of bots and fake accounts, leading to the manipulation and spreading of false information. Towards this end, we gather a challenging, multilingual dataset of social discourse on Twitter, originating from 9M users regarding the recent Russo-Ukrainian war, in order to detect the bot accounts and the conversation involving them. We collect the ground truth for our dataset through the Twitter API suspended accounts collection, containing approximately 343K of bot accounts and 8M of normal users. Additionally, we use a dataset provided by Botometer-V3 with 1,777 Varol, 483 German accounts, and 1,321 US accounts. Besides the publicly available datasets, we also manage to collect 2 independent datasets around popular discussion topics of the 2022 energy crisis and the 2022 conspiracy discussions. Both of the datasets were labeled according to the Twitter suspension mechanism. We build a novel ML model for bot detection using the state-of-the-art XGBoost model. We combine the model with a high volume of labeled tweets according to the Twitter suspension mechanism ground truth. This requires a limited set of profile features allowing labeling of the dataset in different time periods from the collection, as it is independent of the Twitter API. In comparison with Botometer our methodology achieves an average 11 scenario datasets.


page 1

page 2

page 3

page 4


Russo-Ukrainian War: Prediction and explanation of Twitter suspension

On 24 February 2022, Russia invaded Ukraine, starting what is now known ...

The Anatomy of Conspirators: Unveiling Traits using a Comprehensive Twitter Dataset

The discourse around conspiracy theories is currently thriving amidst th...

Identification of Twitter Bots based on an Explainable ML Framework: the US 2020 Elections Case Study

Twitter is one of the most popular social networks attracting millions o...

Simplistic Collection and Labeling Practices Limit the Utility of Benchmark Datasets for Twitter Bot Detection

Accurate bot detection is necessary for the safety and integrity of onli...

Probabilistic Inference of Twitter Users' Age based on What They Follow

Twitter provides an open and rich source of data for studying human beha...

Healthy Twitter discussions? Time will tell

Studying misinformation and how to deal with unhealthy behaviours within...

TrollHunter2020: Real-Time Detection of Trolling Narratives on Twitter During the 2020 US Elections

This paper presents TrollHunter2020, a real-time detection mechanism we ...

Please sign up or login with your details

Forgot password? Click here to reset