The diffusion of information on Social Media is often supported by automated accounts, controlled totally or in part by computer algorithms, called bots. Unfortunately, a dominant and worrisome use of automated accounts is far from being benign: malicious bots are purposely created to distribute spam, sponsor public characters and, ultimately, induce a bias within the public opinion [ferrara2016rise]. Especially, their malicious activities are of high efficacy when performed on a targeted audience [bovet2019influence, bastos2019brexit] to, e.g., generate misconception or encourage hate campaigns [ChatzakouKBCSV17]. Recent work in [shao2018spread, yang2019arming] demonstrate that bots are particularly active in spreading low credibility content and amplifying their significance. Moreover, human-operated accounts contribute to the diffusion of disinformation by, e.g., retweeting and/or liking fake content.
In a previous work [balestrucci2019credulous], the authors shed light on so called credulous Twitter users assuming, with a harmless abuse of language, that they refer to human-operated accounts with a high percentage of bots as friends. Unlike [balestrucci2019credulous], where the authors performed an analysis involving the friends of a set of human-operated accounts - a highly time consuming task - here we design and develop a classifier to find out credulous Twitter users, by considering a number of features that do not take the friendship with bots into account. Starting by considering a set of features commonly employed in the literature to detect bots [Varol17, Cresci15fame], we end up with a lightweight classifier, in terms of costs for gathering the data needed for the feature engineering phase. The classification performance achieves very encouraging results – an accuracy of 93.27% and an AUC (Area Under the ROC curve) equal to 0.93.
We believe that automatically detecting credulous users is a promising line of research. Such an investigation could help researchers to: 1. better understand the characteristics of those users more polarized and/or more willing to be influenced; 2. unveil low-credibility and/or deceptive content and limite their online diffusion; 3. devise alternative strategies for bot detection by concentrating the analysis on the friends of credulous users; 4. improve the users’ awareness about threats to data trustworthiness.
The following section presents the approach for the automatic detection of credulous Twitter users, while Section 3 presents the experimental results. Section 4 discusses the outcome and suggests further investigations. Section 5 presents related work in the area, arguing on the differences, the contributions and the novelty w.r.t. our work. Section 6 concludes the paper.
2 The approach
We consider three publicly available datasets111Bot Repository Datasets: https://goo.gl/87Kzcr: CR15 [Cresci15fame], CR17 [Cresci17paradigm], and VR17 [Varol17]. From the merging of these three datasets, we obtain a unique labeled dataset (human-operated/bot) of 12,961 accounts - 7,165 bots and 5,796 humans. We use this dataset to train a bot detector, as described in Section 2.2. To this end, we use the Java Twitter API222Twitter API: https://goo.gl/njcjr1, and for each account we collect: tweets (up to 3,200), mentions (up to 100), IDs of friends and followers (up to 5k).
The identification of credulous users follows the approach presented in [balestrucci2019credulous]. To this end, we need to detect the amount of bots which are friends of the 5,796 human-operated accounts. Due to the rate limits of the Twitter APIs and to the huge amount of friends possibly belonging to these human-operated accounts, we consider only those accounts with a list of friends lower than or equal to 400 [balestrucci2019credulous]. This leads to a dataset of 2,838 human-operated accounts, namely Humans2Consider hereafter. By crawling the data related to their friends, we overall acquire information related to 421,121 Twitter accounts.
2.2 Bot Detection
Regarding the features, we consider two sets. The first one derives from Botometer [Varol17], a popular bot detector333https://botometer.iuni.iu.edu/. In addition to the original Botometer features [Varol17], we also include: the CAP444Complete Automation Probability: https://tinyurl.com/yxp3wqzh
(Complete Automation Probability) score, the Scores555English/Universal Score: https://tinyurl.com/y2skbmqc, the number of tweets and mentions; we call Botometer+ this augmented set of features. The second feature set is inherited from [Cresci15fame], where a classifier was designed to detect fake Twitter followers. We use almost all their ClassA features666ClassA features require only information available in the profile of the account [Cresci15fame]., except the one about duplicated pictures, because it was not possible for us to verify whether the same profile picture was used twice; we call ClassA- this reduced set of features. The conjunction of the two sets of features is referred in the following as ALL_features.
We use 19 learning algorithms to train our classifier (with a 10-fold cross validation) and we compare their classification capabilities with respect to the three feature sets (Botometer+, ClassA- and ALL_features). The classification performances are evaluated according to: percentage of accuracy, precision, recall, F-measure (F1), and Area Under the ROC Curve (AUC). On the most accurate classifier, Hyper-Parameter tuning is performed. The tuned classifier is then used to label the friends of the Humans2Consider dataset (see Section 2.1).
2.3 Identification of Credulous Twitter Users
The identification of credulous users can be performed with multiple strategies, since there are various aspects that may contribute to spot those users more exposed to the malicious activities of bots. In our previous work [balestrucci2019credulous], we introduced a set of rules to discern whether a genuine user is a credulous one. These rules allow to rank users by relying on the ratio of bots over the user’s list of friends. Here, we inherit these rules to rank the users in our dataset (see Section 2.1), but further ranking strategies can be also considered. Our goal is to build a ground truth of credulous users to derive an assessed characterization of these accounts. Applying the approach defined in [balestrucci2019credulous], we identified as credulous 316 users in Humans2Consider. This constitutes the input data for the next step. We note that the approach in [balestrucci2019credulous] is very expensive in terms of data gathering. For example, for the investigated dataset, it requires 421k users’ account information and 833 million of tweets.
2.4 Classification of Credulous Twitter Users
Goal of this phase is to build a decision model to automatically classify a Twitter account as credulous or not. As ground-truth, we consider the 316 accounts identified as credulous according to the process described in Section 2.3.
We experiment the same learning algorithms and the same feature sets considered in Section 2.2, with 10 cross-fold validation. However, for credulous users classification, the learning algorithms take as input a very unbalanced dataset: we have 2,838 human-operated accounts (see Section 2.1) and, among them, 316 have been identified as credulous accounts (see Section 2.3). To avoid working with unbalanced datasets, we split the sets of not credulous users into smaller portions, equal to the number of credulous users. We randomly select a number of not credulous users equal to the number of credulous ones; then, we unify these instances in a new dataset (hereinafter referred to as fold). Then, we repeat this process on previously un-selected sets, until there are no more not credulous instances. Such procedure has been inspired by the under-sampling iteration methodology, for strongly unbalanced datasets [lee2015iterative]
. Each learning algorithm is trained on each fold. To evaluate the classification performances on the whole dataset, and not just on individual folds, we compute the average of the single performance values, for each evaluation metric.
3 Experimental Results
All the experiments are performed with Weka [witten2016data]
, a tool providing the implementation of several machine learning algorithms. In the following, we present the main results obtained for bot detection and credulous classification, all the details are publicly available:https://tinyurl.com/y4l632g5.
The first column of Tables 1 and 2 shows the set of features considered for learning (i.e., ALL_features, Botometer+, ClassA-, see Section 2.2). The second column reports a subset of the adopted machine learning algorithms whose name is abbreviated according to the Weka’s notation and reported in the following:
, NB: Naive Bayes[john1995estimating], SMO: Sequential Minimal Optimization [Platt1998], JRip: RIPPER [cohen1995fast]
, MLP: Multi-Layer Perceptron[pal1992multilayer]
, RF: Random Forest[Breiman2001], REP: Reduced-Error Pruning [quinlan1987simplifying], 1R [Holte1993]
The remaining columns report the evaluation metrics mentioned above.
Regarding bot detection, Table 1 shows that all the machine learning algorithms well behave, regardless of the feature set. Random Forest is the one that performs best. When the set ALL_features is used, the results are: accuracy = 98.33%, F1 = 0.98 and AUC = 1.00; and after the tuning phase, we obtain a final accuracy = 98.41%.
Table 2 shows that ALL_features and ClassA- have good and quite similar classification performances, contrary to Botometer+. Both ALL_features and ClassA- demonstrate their efficacy to discriminate credulous users. On the contrary, the Botometer+’s features properly work for bot detection tasks only. Going into deeper details, in Table 2 we can notice that the 1R algorithm obtains the best accuracy percentage (93.27% with
) and F-score (0.93), but not the highest AUC (0.93). It is worth noting that the values of the 1R algorithm are exactly the same when consideringALL_features and ClassA-. This means that the algorithm selects ClassA-’s features only, the ones from Botometer+ are useless in this case. This is a relevant result since we recall that ClassA- features refer to the profile of accounts and it is less expensive to collect them.
The results in Table 2 show the capability of our approach to automatically discriminate those Twitter users with a large number of bots as friends, namely credulous, without explicitly considering the features of the latter, which would imply a very high cost in terms of data gathering. To better understand this point, we recall that the approach in [balestrucci2019credulous] for the identification of credulous users needs to crawl a large amount of data, due to the necessity of extending the analysis to the friends of a Twitter account. In the specific case under investigation, this means to retrieve information for more than 400k user accounts, 11 millions of tweet mentions, and more than 820 millions of tweets. As opposite, the credulous detector here proposed requires to gather the profile information of 2,838 accounts only. The classification performances are really promising, with the best accuracy 93.27%, best F1 0.93, best AUC 0.93. We remark that such results have been achieved by relying on so called ClassA-
features only, i.e., features extracted from the account profile. It is peculiar how the features useful to discriminate credulous genuine accounts are features belonging to the account profile only. This preliminary result calls for three further investigations: 1. to compare the range of values assumed by these features when detecting credulous accounts with the one assumed to detect social bots (as in[Cresci15fame]
); 2. to explore the reason why more complex features (such as the ones of Botometer) do not seem to give good results to find credulous users; 3. to perform a deeper analysis on the importance of each specific feature when discriminating credulous users, by means, e.g., of Principal Component Analysis[witten2016data]).
Finally, even if the design of a bot detector is not the primary goal of this paper, but only a mean through which we obtain the ground-truth for training the credulous user classifier, we notice that, compared to the performances reported in [Cresci17paradigm, yang2019arming], our bot detector achieves very good classification performances. This strengthens the robustness of the ground truth obtained in Section 2.3, since the friends’ nature evaluation is assessed by means of a very accurate classifier.
5 Related Work
Our work is related to all those approaches that investigate peculiar features of social networks users. We discuss the ones we find more relevant for our approach, with the caveat that the presented literature review is far from being exhaustive.
A survey on users’ behaviour in social networks is proposed in [jin2013understanding]: it is remarked that the recipients of shared information should be chosen, in a more precautionary way, by taking into account more real-life relationships and less virtual links. Our approach works exactly in the direction of enhancing the awareness of users, by classifying the ones more exposed to attacks of social bots.
Information spreading on Twitter is investigated in [DBLP:journals/corr/MonstedSFL17], where the authors demonstrate that the probability of spreading a given piece of information is higher when promoted by multiple sources. This supports our attempt to analyze the percentage of bots within the friends of human-operated Twitter accounts, as a symptom for being more tempted to disinformation.
In [amato2018recognizing], human behaviour on Facebook is analyzed by building graphs that capture sequence of activities. Behavioural patterns that do not match any of the known benign models likely signal malicious objectives. Similarly, the realization of a classifier to automatically recognize credulous users is the first step to derive their sequence of activities and, hopefully, peculiar behavioral patterns.
In [cresci2017humans, gilani2019large], a behavioural analysis of bots and humans on Twitter is performed, to draw fundamental differences between the two groups. Specifically, the former demonstrates how, despite a higher level of synchronization characterizing bot accounts, the human behaviour on Twitter is far from being random. The latter defines a ‘credibility score’ as a measure of how many tweets by bots are present in the timeline of an account. Our work supports the discrimination of credulous users and it may lead to a deeper characterization of human accounts.
To the best of our knowledge, few research explores ways to automatically recognize those Twitter users susceptible to attacks of social bots or exposed to disinformation. A notable example in [wagner2012social] builds on interactions (mentions, replies, retweets and friendship) between genuine and bot accounts, to obtain a ground truth of users susceptible to social bots. Then, similar to our approach, different learning algorithms have been adopted to train a classifier. Contrary to their approach, the current work is able to classify users close to social bots with lightweight features, all computed from data available in the user’s profile. Another brand new line of research is the detection of users susceptible to fake news. Work in [DBLP:conf/websci/ShenCGLYL19] monitors the replies of Twitter users to a priori known fake news, in order to tag the same users as vulnerable to disinformation or not. Then, a supervised classification task is launched, to train a model able to classify gullible users, according to content-, user-, and network-based features.
Inspired by recent literature that shows how disinformation is not only promoted by social bots but also emphasized by genuine peers, in this work we proposed a supervised classification engine to discriminate credulous users, i.e., human-operated accounts with a high percentage of bots as friends. The classifier achieves very good performances and avoids a heavy feature engineering and extraction phase. Further research efforts will be devoted to investigate the behaviour of credulous users, as well as the posted content, to know more about their peculiarities and the quality of information they contribute to diffuse.