MulBot: Unsupervised Bot Detection Based on Multivariate Time Series

09/21/2022
by   Lorenzo Mannocci, et al.
0

Online social networks are actively involved in the removal of malicious social bots due to their role in the spread of low quality information. However, most of the existing bot detectors are supervised classifiers incapable of capturing the evolving behavior of sophisticated bots. Here we propose MulBot, an unsupervised bot detector based on multivariate time series (MTS). For the first time, we exploit multidimensional temporal features extracted from user timelines. We manage the multidimensionality with an LSTM autoencoder, which projects the MTS in a suitable latent space. Then, we perform a clustering step on this encoded representation to identify dense groups of very similar users – a known sign of automation. Finally, we perform a binary classification task achieving f1-score = 0.99, outperforming state-of-the-art methods (f1-score ≤ 0.97). Not only does MulBot achieve excellent results in the binary classification task, but we also demonstrate its strengths in a novel and practically-relevant task: detecting and separating different botnets. In this multi-class classification task we achieve f1-score = 0.96. We conclude by estimating the importance of the different features used in our model and by evaluating MulBot's capability to generalize to new unseen bots, thus proposing a solution to the generalization deficiencies of supervised bot detectors.

READ FULL TEXT
research
10/29/2021

Transformer Ensembles for Sexism Detection

This document presents in detail the work done for the sexism detection ...
research
10/13/2020

Enhancing the Identification of Cyberbullying through Participant Roles

Cyberbullying is a prevalent social problem that inflicts detrimental co...
research
05/18/2020

Classification of Spam Emails through Hierarchical Clustering and Supervised Learning

Spammers take advantage of email popularity to send indiscriminately uns...
research
02/28/2020

UKARA 1.0 Challenge Track 1: Automatic Short-Answer Scoring in Bahasa Indonesia

We describe our third-place solution to the UKARA 1.0 challenge on autom...
research
01/16/2019

It's Only Words And Words Are All I Have

The central idea of this paper is to demonstrate the strength of lyrics ...
research
10/27/2022

Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data?

The detection of pathologies from speech features is usually defined as ...
research
04/03/2020

Unsupervised Keyphrase Rubric Relationship Classification in Complex Assignments

Complex assignments are open-ended question with varying content irrespe...

Please sign up or login with your details

Forgot password? Click here to reset