Credulous Users and Fake News: a Real Case Study on the Propagation in Twitter

Recent studies have confirmed a growing trend, especially among youngsters, of using Online Social Media as favourite information platform at the expense of traditional mass media. Indeed, they can easily reach a wide audience at a high speed; but exactly because of this they are the preferred medium for influencing public opinion via so-called fake news. Moreover, there is a general agreement that the main vehicle of fakes news are malicious software robots (bots) that automatically interact with human users. In previous work we have considered the problem of tagging human users in Online Social Networks as credulous users. Specifically, we have considered credulous those users with relatively high number of bot friends when compared to total number of their social friends. We consider this group of users worth of attention because they might have a higher exposure to malicious activities and they may contribute to the spreading of fake information by sharing dubious content. In this work, starting from a dataset of fake news, we investigate the behaviour and the degree of involvement of credulous users in fake news diffusion. The study aims to: (i) fight fake news by considering the content diffused by credulous users; (ii) highlight the relationship between credulous users and fake news spreading; (iii) target fake news detection by focusing on the analysis of specific accounts more exposed to malicious activities of bots. Our first results demonstrate a strong involvement of credulous users in fake news diffusion. This findings are calling for tools that, by performing data streaming on credulous' users actions, enables us to perform targeted fact-checking.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/07/2020

Where Are the Facts? Searching for Fact-checked Information to Alleviate the Spread of Fake News

Although many fact-checking systems have been developed in academia and ...
09/09/2019

Do you really follow them? Automatic detection of credulous Twitter users

Online Social Media represent a pervasive source of information able to ...
05/22/2021

Sockpuppet Detection: a Telegram case study

In Online Social Networks (OSN) numerous are the cases in which users cr...
11/21/2021

Fake News Detection Tools and Methods – A Review

In the past decade, the social networks platforms and micro-blogging sit...
08/08/2020

Network Inference from a Mixture of Diffusion Models for Fake News Mitigation

The dissemination of fake news intended to deceive people, influence pub...
05/15/2020

Keystroke Biometrics in Response to Fake News Propagation in a Global Pandemic

This work proposes and analyzes the use of keystroke biometrics for cont...
10/01/2021

Users' ability to perceive misinformation: An information quality assessment approach

Digital information exchange enables quick creation and sharing of infor...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

0.1 Introduction

Pervasiveness and ease of use of Online Social Media (OSM), like Twitter and Facebook, have lead to new ways for people to keep up with news and for their propagation. Recent studies [1, 2, 3] confirm the growing trend of using digital media as the favourite source of information, especially among youngsters [4]. By routinely checking pages of interest and channels, people get informed effortlessly. With OSM, publishers reach larger audience than with traditional mass media, like newspapers, radio, etc., and news are disseminated easily and at a faster rate. This has however brought concerns about the veracity of the news circulating on OSM. Fake news and mis/disinformation [5] are well known problems, against which both government [6], researchers [7] and social media administrators111Facebook: https://tinyurl.com/y3yzvpah222Twitter: https://tinyurl.com/y3efs8s5 are struggling.

Several approaches have been developed to counteract the spread of this phenomenon. Some of them aim at detecting bots [8]

, i.e., automated accounts interacting with human users; others use natural language processing (NLP) techniques 

[9] by analyzing the actual contents of messages. In this work, we look at fake news from a different perspective, aiming to figure out the extent of relationships between fake news and credulous users [10]. The latter are human-operated accounts with a relatively high percentage of bot-followees among the total number of their followees and are thus more exposed to bots’ malicious activities. The motivation for studying the phenomenon from this perspective is twofold, indeed credulous users: (i) may unconsciously contribute to fake news dissemination [11, 12]; (ii) may be affected by malicious activities that are more effective when performed on a targeted audience [13, 14].

Starting from a publicly available dataset of fake news which concerned with politics and gossips [15] published on Twitter, we have studied the involvements of credulous users in terms of the number of their tweets with fake news they posted and the number of credulous that have shared fakes. By jointly exploiting bot detection and credulous detection approaches [11], we have seen that: (i) credulous users produce more fake news tweets than not-credulous ones; (ii) credulous users publish less real news than fake ones and (iii) the extent to which credulous publish fake news depends on the topic.

This findings are calling for tools that, by performing data streaming on credulous’ users actions, enables us to perform targeted fact-checking. A possible exploitation of this could be a system that, by attentioning credulous users performs on line data streaming by ”listening” their activities (e.g., tweets, retweets, replies, mentions, etc.) in real time. As soon as a credulous user publishes on his social dashboard, the tool could analyze the content, with, e.g., text mining and/or NLP techniques, the reliability of the source. Obviously, to reduce the amount of content to inspect, a key component of the system is the part concerned with credulous detection that reduces the number of human-operated accounts to investigate. This can be made more efficient by setting up a priority inspection measure, based on the content production rate of credulous users, to firstly focuses on the more active ones.

The rest of the paper is structured as follows: Section 0.2 considers related works; Section 0.3 describes the used dataset and the approach we have followed; Section 0.4 illustrates the experimental results; Section 0.5 discusses the main findings, and Section 0.6 concludes the paper.

0.2 Related Work

This paper aims at investigating the relationship between credulous users and fake news; in particular, the way the former behave in OSM and the extent to which they are involved in spreading the latter using tweets. Rather than providing a complete literature review about the many approaches to fake news detection, we concentrate just on work investigating on the way bots’ activities influence humans in OSM. For comprehensive surveys on fighting fakes, we refer the reader to other papers [7, 16].

An interesting study misinformation spreading in OSM is presented in [17]. The authors analyzed 14 million of tweets published during 2016 U.S. presidential campaign and found non-negligible evidence of the role that bots have had in fake news dissemination. Furthermore, they identified specific vulnerability of humans that had been retweetting malicious bots; thus actively contributing to fake news spreading.

The notion of susceptible users, i.e., OSM’s users who interact with bots, is introduced in [18]

. A binary classifier was trained to single out

susceptible users; and the need of introducing protection mechanisms in Social Media to prevent support of malicious activities by human-operated accounts was advocated. Human-operated accounts “attacked” by bots through tweet mention are studied in [19] with the aim of predicting whether an interaction will start. Instead, we consider those human users who have already started interacting with bots to discover the extent of their involvement in spreading of fake news.

A live lab experiment testing the ability of social bots to shape or influence the social graph of human-operated accounts in Twitter is presented in [20] and it is observed that the activities of social bots can affect the links established between targeted users.

Starting from the definition of susceptible users [18] and with aims close to ours, the notion of gullible users and their susceptibility to fake news is introduced in [21]. Five degrees of susceptibility (referred to a user’s reply to a fake news) are defined and a multi-class classifier is trained to predict users susceptibility level. The classifier achieves an AUC of 0.82. Indeed, their aim is similar to ours, but we do not rely on users replies to deem them as credulous, but on the number of bots they follow, and our objective is not a classification, but to find evidence of their involvement in fake news spreading.

Beside the definitions of susceptible users [18] and gullible users [21], we mention our previous works [10, 11, 22] where credulous users have been studied. In [10] the concept of credulous

users has been introduced and heuristics have been used to label as credulous human-operated accounts; humans are ranked by relying on the ratio between the number of bots (recognized by a bot detector) they follows and their total number of followees

3331st dataset of credulous users: https://tinyurl.com/y6lod2yz. A method for automatically identifying credulous users, has been introduced in [11] for the first time. In this work, classification performance achieved an accuracy of 93.27% and an AUC (Area Under the ROC curve) of 0.93. From a perspective of regression, rather than for classification purposes, the goal of the work in [22] has been to single out a technique for quantifying the percentage of bot-followees, i.e., bots followed by human-operated accounts in Twitter. The best regression model is trained with the SMOreg [23] algorithm, achieving a Mean Absolute Error of 3.62%. The current paper differs in scope and strategy with our previous work. Its primary objective is to investigate the way credulous users deal with fake news and to understand whether analyzing credulous users profiles can be useful to support other fake news detection techniques. In this case, the basic strategy relies on observing tweets containing fake news and their spreaders, without considering any longer regression or classification.

0.3 Experimental Setup

0.3.1 Dataset

We take as starting point a publicly available dataset of fake news, called FakeNewsNet444FakeNewsNet Dataset: https://tinyurl.com/uwadu5m [7, 24, 15]. For each item the following information is provided: a unique identifier (id), the publisher (in url form), the content of the news (title), a list tweets (as Twitter ids) containing the news and the information about its “veracity” (fake or real). To label the news, the authors in [15] used two fact-checking websites: PolitiFact555https://www.politifact.com/ and GossipCop666https://www.gossipcop.com/. In the former, fact-checking was performed by politics experts (e.g., journalists) whom labelled news as fake or real. In the latter, a numerical scores was assigned to news to indicate reliability, ranging from 0 (fake) to 10 (real).

News Tweets
Original Retrieved Original Retrieved

Politic

Fake

432 392 165,356 141,421

Real

622 407 417,072 357,655

Gossip

Fake

5,323 5,135 598,299 518,502

Real

16,817 15,759 881,627 812,719
Table 1: FakeNewsNet Dataset: original and retrieved content

Table 1 outlines the structure of the dataset. The original dataset contained 432 fake and 622 real political news (see in row Politic the column Original). However, on Twitter we were only able to find tweets of 392 fake and 407 real news (column headed Retrieved). The number of tweets (column titled Tweets) containing such news was initially of 165,356 on fake and 417,072 on real news; but we could find only 141,421 tweets related to fake news and 357,655 tweets related to real ones. The numerical mismatch between the original data and the retrieved data is almost certainly due to deletion. Regarding the other topic, (row headed Gossip), out of 5,323 fake and 16,817 real news, we got 5,135 and 15,759 news respectively. And it was possible to find a total of 1,331,221 tweets; 518,502 related to fake news and 812,719 containing real news. Obviously, in our study, we will only use retrieved data in both cases.

0.3.2 Experimental Approach

To pursue the goals of this work, we single out three sequential tasks: (i) tweets’ authors identification, (ii) distinction between automated (bots) and human-operated authors, and (iii) distinction between credulous and not-credulous users among the human-operated authors.

Tweets’ authors identification

In this phase, we aim to identify the Twitter accounts that published the tweets listed in FakeNewsNet dataset, and thus their authors. Starting from the tweets’ id and using Twitter API777Twitter API libraries: https://tinyurl.com/rfte3k2, we collected 1,731,422 tweets out the original list of almost 2 million. It might be worth noting that some tweets contain more than one news, and thus some tweets are counted more than once in Table 1. This explains the numerical mismatch between the collected tweets and retrieved ones (value’s sum of 4th column).

At the end of this phase, in addition to tweets’ data, we collected the profile’s data 888Twitter User Object: https://tinyurl.com/y5s5kpuw of 536,513 Twitter accounts which are their authors.

Bot detection

The goal of this phase is to distinguish, among the set of authors obtained in the previous phase, between human-operated accounts and bots. To this purpose, we used a bot detector (i.e., a decision model able to recognize automated accounts) introduced in our previous work [11]

. The classification model is based on Random Forest 

[25], achieving an accuracy (instances correctly classified) of 98.41% and an area under the ROC curve (AUC) of 1.00. It relies on a set of 30 features. Specifically, such features are obtained by combining the feature sets of Botometer+ and ClassA-. The reader is referred to [22] for details. For each users, Botometer+ considers the timeline (the list of published tweets) and its mentions (the tweets that mention the user). ClassA-’s, instead, relies on users’ profile data to determine their featuresfootnotemark: .

It is worth to notice that to retrieve the Botometer+’s features is needed a connection to a web service999Botometer [26]: https://tinyurl.com/yytf282s and to classify tweets’ authors it is mandatory to obtain features from both Botometer+ and ClassA-. In this way, 479,569 authors have been classified as human-operated.

Credulous classification

The last task aims at singling out credulous users among the human-operated authors. To this purpose a refined version of the approach presented in [11], has been adopted. In [11] the ground-truth was strongly imbalanced (316 credulous vs. 2,522 not-credulous). There, the set of not-credulous users has been divided into smaller subsets, by a randomly instance selection (without replacement), and each subset has cardinality equal to the number of credulous users. Each subset of not-credulous users has been then merged with the credulous instances, creating eight new “sub-datasets” called folds; differing from each other for not-credulous instances only. Then, for each fold, several decision models have been trained and cross-validated. The fold OneR101010OneR [27]: https://tinyurl.com/wgpezcp turned out to be the best performing one. But for some folds, other algorithms (specifcally JRip111111JRip [28]: https://tinyurl.com/ufke2zb and RepTree121212RepTree [29]: https://tinyurl.com/qkl8nko) turned out to produce better models (see fold 4 and 5 in Table 2).

Instead of classifying authors using the best classifier (which has been trained on just one fold in [11]), for labelling authors we use all eight credulous classifiers in Table 2.

evaluation metrics
Fold alg accuracy prec. rec. F1 MCC AUC
1 OneR 98.26 0.98 0.98 0.98 0.97 0.98
2 OneR 95.73 0.96 0.96 0.96 0.92 0.96
3 OneR 94.15 0.95 0.94 0.94 0.89 0.94
4 JRip 90.67 0.92 0.91 0.91 0.83 0.89
5 RepTree 91.93 0.93 0.92 0.92 0.85 0.90
6 OneR 90.35 0.92 0.90 0.90 0.82 0.90
7 OneR 90.93 0.92 0.91 0.91 0.83 0.91
8 OneR 96.65 0.97 0.97 0.97 0.93 0.97
Table 2: The eight Credulous Classifiers

Specifically, the basic idea is that each author has to be classified by means of the classifier trained on the fold most “similar” to it. Hence, for each human author singled out in Section 0.3.2, the distance between author’s feature representation and the centroids of each fold is computed. The author is then classified using the classifier trained on the fold whose centroid is closest to it. Classifiers selection is performed with a specific tool we have implemented; and 350,622 human authors have been classified as credulous users.

0.3.3 Investigation targets

At this stage, we have all the information to study the relationships between fake news and credulous users. In particular, we want to investigate potential relationships under three different perspectives, aiming at understanding whether credulous users do significantly contribute to fake news production and/or propagation in Twitter. First, we want to look for numerical differences between the amount of fake and real news produced/diffused by the three categories of user/author (namely, credulous/not-credulous and bots) and on both news’ topic. Second, we want to compare the quantity of fake/real tweets, i.e., the tweets containing a fake/real news, by contrasting first bots and humans and then credulous and not-credulous. To avoid that the numerical unbalancing between fake and real tweets (e.g., see in Table 1 political retrieved tweets) may lead to inaccurate observations we will also consider a randomly selected subset, among the set of real-news tweets, with the same number of fake-news tweets. Third, we want to quantify the authors’ level of involvement in fake news spreading/production by counting, for each category, how many of them are authors of tweets containing: at least one fake news, at least one real news, only fake news and only real news.

0.4 Experimental Results

In this section we present the experimental results, by relying on the approach described in Section 0.3.2. We start by showing the results obtained for bot and credulous detection in Table 3 (Sections 0.3.2 and 0.3.2).

Politc Gossip Union
#Bot 27,137 34,160 56,548
#Human 256,561 247,113 479,569
#Credulous 185,196 178,715 350,622
#Not-Credulous 71,365 68,398 128,947
Table 3: Bot Detector and Credulous detector outcomes

In the first column there are the different types of users; in the first macro-row the difference is based on the “automation” of an account, the second macro-row reported the numbers of human-operated accounts labeled as credulous or not-credulous users. Each column reports the number of users which tweeted about a certain topic. We could not classify 396 accounts, because the Botometer web service did not return features.

Table 4 reports the amount of news for each category of users, i.e., credulous/not-credulous users and bots (1st column), for each topic (Politic or Gossip). The 2nd and the 3rd column, (macro-column headed Politic), indicate the amount of political news, respectively fake and real, that users have used in their tweets. The 4th and 5th column, instead, indicate the number of real and fake news about the gossip, respectively.

Politic Gossip
Fake Real Fake Real
Credulous 373 364 4,121 14,486
NotCredulous 361 366 4,768 13,418
Bot 350 332 4,470 15,050
Table 4: Users’ topic coverage by their tweets

With their tweets, credulous users cover 373 and 364 fake and real political news respectively. The number of news covered by not-credulous users is 361 fake and 366 real political news. For the sake of completeness, the same information is reported for bots too; the amount of news covered by them is 350 fake and 332 real political facts.

When considering Gossip news, starting from a retrieved set of 5,135 fake news, we have the following numbers: 4,121 by credulous users, 4,768 by not-credulous users and 4,470 by automated accounts. For real news, we retrieved 15,759 tweet, of which 14,486 come from credulous users, 13,418 from not-credulous and 15,050 from bots.

Table 5 and 6 provides a more detailed view of our experiments. For each category of users, the number of tweets containing fake news (column FN) and real news (column RN) for both politics (Table 5) and gossip (Table 6) news is reported. In both tables, the 4th column (called RN ) and 5th column (called RN*) are introduced to mitigate the unbalance between fake and real tweets, in accordance with the discussion in Section 0.3.3. With RN we denote a subset of RN whose entries have been randomly selected, from the original list of tweets of Table 1, in order to have RNFN. While in RN*, tweets are taken from the retrieved list of Table 1 (RN*FNFN).

FN RN RN RN
Tot. 165,356 417,072 165,356 141,421
Bot 19,888 45,924 18,120 18,013
n.a. 23,935 59,417 23,519 0
Human 121,533 311,731 123,717 123,408
Credulous 84,362 197,454 78,528 77,994
Not Credulous 37,171 114,277 45,193 45,414
Table 5: Number of tweets about political fact
FN RN RN RN
Tot. 598,299 881,627 598,299 518,502
Bot 116,398 486,907 330,425 310,810
n.a. 79,797 68,908 46,552 0
Human 402,104 325,812 221,322 207,692
Credulous 244,690 209,579 142,246 133,518
Not Credulous 157,414 116,233 79,076 74,174
Table 6: Number of tweets about gossip fact

Table 5 presents the information related to the tweets on political fact. Almost 20k fake tweets have been produced by bots, more than 121k fake tweets have been produced by human-operated accounts; while for 24k fake tweets it has not been possible to retrieve their information from Twitter (row named n.a.). When considering human users, we can see that the number of fake tweets (FN) published by credulous users (i.e., 84k) overcomes the number of those published by not-credulous ones (i.e., 37k).

Credulous NotCredulous Bot
#Users Average St.Dev. #Users Average St.Dev. #Users Average St.Dev.
#Fake News 54,828 1.54 2.79 19,525 1.90  5.05 9,622 2.07 4.53
#Real News 138,113 1.43 2.91 57,839 1.98 10.22 45,924 2.36 9.11
Only Fake News 47,083 1.40 2.15 13,526 1.60  4.61 7,658 1.79 3.78
Only Real News 130,368 1.37 2.53 51,480 1.78 10.25 17,515 2.15 8.80
Table 7: Users that tweetted in political topic
Credulous NotCredulous Bot
#Users Average St.Dev. #Users Average St.Dev. #Users Average St.Dev.
#Fake News 147,158 1.66 8.01 56,451 2.79 33.56 25.818 4.51 22.00
#Real News   39,528 5.30 143.59 19,047 6.10 88.26 14,620 33.30 606.39
Only Fake News 139,187 1.45 6.97 49,351 1.81 6.51 19,540 2.39 12.79
Only Real News   31,557 1.35 2.96 11,947 1.52 4.72   8,342 2.19 10.95
Table 8: Users that tweeted in gossip topic

Although the tweets in RN are more numerous than FN, the number of tweets authored by credulous overcomes that of not-credulous, but with a lower proportionality (197,454 by credulous users and 114,277 by not-credulous users). But, because the #RN humans’ tweets are almost three times the #FN ones, looking to the values related to RN and RN* columns can led to a better comparison. In fact, we can note that the number of real tweets published by credulous users (in 4th and 5th column) are similar and in any case lower than in FN (see Table 5). We preferred to use such subsets, rather than resorting to ratios and percentages, in order to provide direct numerical differences. However, had we used percentages, nothing would have changed. Obviously, we had to take into account that the percentage of FN tweets from credulous users, out of the total set of human’s FN-tweets, is 60.85%, while the percentage tweeted by not-credulous users is 39.15%.

As far as bots are concerned, despite RN’s value seems higher then FN, the amount of fake and real tweets is more or less the same (see the 4th and 5th column).

Switching to the case of tweets containing gossip news (Table 6), we can immediately notice that, like Table 5, also here there is a superiority in tweet’s production by credulous users. In particular, by focusing on the fake tweets column (FN), we can see that even for this topic the amount of tweets published by credulous users (244,690) is greater than not-credulous (157,414). This superiority is confirmed even in all the RN’s cases (see 3rd, 4th and 5th columns) but with lower numbers. Surprisingly, by looking about bots, they authored a lot of real tweets, precisely 468,907 (RN), that represents more than the 50% of all real tweets; conversely, they published only 116,398 fake tweets (FN), less than the 20% of Tot. FN.

Tables 7 and 8 present the results deriving by looking to what extent, for each topic, the three categories of users (macro-columns’ headers in both tables), are participating. Specifically, it is reported the amount of users that are authors of: at least a fake tweet (1st row, #Fake News ), at least a real tweet (2nd row, #Real News), only fake tweets (3rd row, #Only Fake News) and only real tweets (4th row, #Only Real News

). For each of this four cases, the average and standard deviation have been calculated in both tables to show the fake/real tweet’s rate and the uniformity of the users belonging to each of the aforementioned cases.

The results exposed in Table 7 are referred to the topic of political news. The amount of credulous users that tweeted at least a fake news (1st row) is of 54,828, with publishing rate of 1.54 fake tweets on average and a standard deviation of 2.79. 138,113 credulous users have been identified that published at least a real news (2nd row); and despite they are more numerous than previous case, their related tweeting rate (average) is slightly lower (1.43) with an higher standard deviation (2.91).

For what concern the amount of credulous users publishing only fake/real news (3rd and 4th line), we can observe a small numerical decrease in quantity. There are 47,083 credulous authors with, 1.40 tweets with fake news on average and the standard deviation is of 2.15. In the other case (4th row), 130,368 credulous users have posted only tweets of real news, with almost the same average (1.37) as in its dual case but with an higher standard deviation (2.53).

As regards of not-credulous users, we can notice that the authors posted at least a tweet containing a real news (2nd row) are 57,839, so almost 3 times than the users with at least a fake tweets (1st row), i.e.,19,525. The averages are similar in both cases, 1.98 for Real News and 1.90 for Fake News; the respective standard deviations are high too, 5.05 (1st row) and 10.22(2nd row). By observing those authors tweeted only fake/real news (3rd and 4th line), we can observe a similar trend to the same case of credulous, but with a much lower level of participation. In fact, despite the not-credulous authors posted only tweets of real news (51,480) are more than the ones published only fake tweets (13,526), these latter are only a third of the number of credulous users who only publish fake news. Furthermore, the not-credulous users’ tweeting rate of real news (average) is also higher than the one referred to who is posting only fake news.

For sake of comparison to human accounts, in Table 7 are also reported the results related to bots. We can see a certain disparity between the number of bots tweeting Fake News (i.e, 9,622 as indicated in 1st row) and Real News (i.e., 45,924 as indicated in 2nd row). On the other hand, by comparing the previous data with the ones related to bots authoring only fake/real news (3rd and 4t line), we can see: (i) a reduction equal to more than 2 times about bots tweeting only real news (17,515 in 4th line w.r.t the case in the 2nd line), and (ii) a little reduction for what concern the amount of bots sharing only fake news in their tweets (7,658 in 3rd line, w.r.t. 1st line). The tweeting averages per bots are upper than the human’s cases; and the standard deviations have higher values when referred to real news (9.11) and only real news (8.80).

The outcomes concerning to gossip topic are presented in Table 8. Starting by the 1st macro-column (headed, credulous), we can see a big amount of authors having at least one fake-news tweet (147,158), and this numerical superiority occurs even when we count the ones tweeting only fake news (139,187). Conversely, about the authors of real news’ tweets, 39,528 credulous users have at least one and 31,557 published only real tweets. By looking the details corresponding to the category of not-credulous users, we can see a good downsizing of fake news’ authors. Precisely, there are 56,451 users that tweetted at least one fake news, and 49,351 users that published only fake news in their tweets. Moreover, we observed the same decreasing trends also in both cases of real news published by not-credulous authors. In particular, the authors published at least one real news are 19,047 (2nd row), whereas the ones published only real news are 11,947.

Lastly, as in Table 8, we conclude by reporting the numerical details referred to the automated accounts (bot, 3rd macro-column). We can summarize by saying that, the tweeting bots of at least a fake news are 25,818 and those ones published only fake are 19,540; more than the ones publishing real news, which are 14,620 (2nd row) and 8,342 (4th row). With regard to the average and standard deviation, the values are very high compared to the other lines, especially on standard deviation. In general, the averages of the bots is higher than both credulous and not-credulous users, regardless the news’ veracity in their tweets.

From these results we can derive very interesting findings that will be exposed and discussed in the following Section 0.5.

0.5 Discussion

The experimental results described in the previous section, shed light on the connections between fake news and credulous users, and on the extent the latter are involved in spreading fake news on Twitter. Here we provide some considerations about the experimental results along the three perspectives introduced in the Section 0.3.3. Specifically, we consider

  • News: the differences between the number of fake and real news spread by credulous, not-credulous and bots.

  • Tweets: the number of tweets containing either fake or real news posted by bots and humans and credulous and not-credulous.

  • Activities: the number of users tweet at least one fake news, at least one real news, and only fake or real news.

In addition, we provide some recommendation concerned with online monitoring of credulous users.

0.5.1 News

The different behaviour of credulous and not-credulous users relatively to fake news, is made manifest in Table 4. We can see that the tweets produced by credulous users cover 95% of the retrieved fake news concerned with politics, while those produced by not-credulous cover 92%. Instead, for non fake news, credulous users “talk” about the 91% of the total retrieved, which is almost the same of not-credulous users.

When considering gossiping news, the situation is a bit different. Specifically, on the one hand, the percentage of fake news covered by the credulous users’ tweets is 80%, while not-credulous users’ tweets 93% (see Table 1). On the other hand, not-credulous users “talk” of 86% of the retrieved true news, while the coverage of credulous users is of 92%. This may seem counter-intuitive, but we have to stress the numerical imbalance between the amount of fake (more than 5k) and real gossip news (more than 15k). Moreover, in Table 1, we consider as covered any news (re)twitted just once. In fact, considering the amount of tweets (Table 6) can be more informative.

From this first perspective, we can say that news do catch credulous users’ interest, especially when fake content is concerned (80% gossips vs. 95% politics). In future work, we plan to investigate along new directions, by considering other topics such as technology or medicine.

0.5.2 Tweets

Regardless of news’ veracity, from Table 5, the low number of tweets made by bots compared to human-operated accounts attracts attention. This is mainly due to the disproportion between the amount of bots and humans (see Table 3), that confirms the statement in [26]

where the percentage of bots in Twitter is estimated to range from 9% to 15%; in our case is 10,52%. By looking to the human-operated accounts, we can see that the tweets authored by

credulous users are always more then those from the accounts classified as not-credulous. This was expected about fake news; however, it was unexpected for the other news. Indeed, it would be a bit extreme to expect that credulous users are active exclusively on fake news; however, it has also be taken into account that the amount of retrieved tweets of real news is more than three times the amount of fake ones. This can be observed by considering the 4th row in Table 5 (by comparing the columns headed FN and RN); we think that looking to the values in 3th (RN) and 4th (RN) columns would lead to a fairer comparison. Concerning these two columns (for real tweets), it appears that credulous users are authors of fewer real news tweets than fake ones, differently from not-credulous users.

As already mentioned in Section 0.4, also for gossips (Table 6), there is a downward trend between the amount of fake news tweets and real news ones for both categories of human-operated accounts. In this case, the situation is peculiar because the number of RN is just less than FN one; hence, considering the values related to the reduced set of real tweets (i.e., the column headed RN and RN) is somehow pointless. The fact that the amount of not-credulous users’ real tweets does not overcome the number of the fake ones, can be justified by a strong bot’s authorship. In fact, the number of tweets done by bots on real news exceeds the number of its fake counterpart by more than 4 times (3 times if we consider the restricted sets RN and RN). A further motivation may be due to the fact that gossip is a more attractive topic than politic and with a potential larger audience. In addition, traditional mass media (e.g., television, radio, newspapers, etc.) can be used to check “veracity” of political news, while it is more difficult to do the same for gossips. However, we want just to emphasize the threat of the fake news, and even in this case the credulous users “win” for number of fake tweets.

0.5.3 Activities

Further findings can be derived by observing the users’ numerical participation, in each news’ topic (Tables 7 and 8). It is worth to say that, in such tables, when we talk on the numbers related to real and fake news, we refer to the respective total set (FN and RN).

Concerning political topics (Table 7), we can claim that, despite there are more credulous users publishing at least a real news, the related tweet’s average is lower than the case when at least a fake news has been posted. It is also important to consider that the number of tweets of real political news is more than 410k, almost 3 times the amount of fake ones. Conversely, the (numerical) presence of not-credulous users tweetting fake news is limited.

In the gossip’s case (Table 8), we can see that, contrarily to what observed for the quantitative analysis of tweets (Table 6), the number of credulous users who publish fake news is much higher than those who publish real ones. Furthermore, the fact that the number of tweets with real news posted by credulous users is relatively high suggests that they tweet at greater intensity. And this even explains the unexpected high number (in Table 6) of real-tweets published by credulous users. An additional confirmation of this fact is given by observing just the authors of only real or only fake news. In this case, the amount of credulous users’ that tweet real news is much lower than its fake equivalent. On the opposite, for not-credulous users, we see a smaller presence in fake news cases; and, despite their number overcomes that of not-credulous users that diffuse real news, the proportionality is significantly lower but with a higher average of real news’s tweets (#Real News 1) authorship.

Although the bots have been included just for comparative purposes and the study of their involvement in fake news dissemination is not the main concern of our work, we can notice that they have the highest tweet’s publishing rate compared to both categories of human-operated accounts. Although, from the study, their involvement in fake news spreading does not numerically emerge, we do think that they are among the main actors in this malicious activity and other literature provides some evidence. However, also in our study their malicious activity can this can be indirectly noticed by considering the average values of their fake-tweeting rate, which is higher than those of both categories of human-operated accounts.

To conclude this section, we can resume our findings in the following statements:

  1. Topics (politics vs gossips) play a key role in determining the amount of fake news published and circulating in social media, due to the size of potential audience and to the number of methods available for fact-checking;

  2. For both topics, credulous users do spread a higher amount of fake news than not-credulous users, but they do not publish only fake contents;

  3. A high number of credulous users has a strong involvement in posting fake news;

  4. Automated accounts, which in general have a higher rate of tweets publication than humans, exhibit a good uniformity between fake and real content published.

0.5.4 Monitoring recommendations

Despite bots’ tireless activity, the findings of this work shows active humans’ participation in spreading fake news. A possible way of taking advantage of our work is the design of a self-adaptive and evolving system that, by focusing on the data stream published by credulous users’ activity, carefully inspects what they publish in OSM, with targeted fact-checking. The tool could be based on processes that intelligently exploit the data stream coming from credulous users as soon as these publish content on their dashboards. Then, by using text mining and/or NLP techniques, the tool could analyze such contents (tweets, retweets, replies, mentions, …) in real-time, and, of course, the reliability of the source. This would considerably reduce the set of human users to scrutiny and, indirectly, the number of tweets, to perform targeted fact-checking. This tool could further evolve by targeting credulous more efficiently in order to further narrow the group of credulous users to analyze. For instance, by considering content production’s rate of credulous users (to pay more attention to the more active ones), or the number of followers.

Such a tool, could be used by OSM administrators to hold up some of credulous users’ activities (e.g., content re-posting) in order to slow down the propagation of malicious information.

0.6 Conclusion

Nowadays Online Social Media are very important and the channel of information, preferred by people, especially by youngsters, to traditional media, like newspapers, radio and television. Their pervasiveness, favoured by the widespread use of mobile devices, and people’s compulsive check of their social profiles has enabled faster news dissemination and a wider audience. This has brought one of the bigger problem of our time, misinformation. The absence of any control of the news published on OSM, sort of common in newspapers, has stimulated production and diffusion of fake news. In many cases, this is done using automated accounts, called bots, which actively interact with people to induce bias in their opinion, generate misconception, incite hate-speech, etc. Current approaches, which effectively counter malicious bots activities, are based on their detection and removal from OSM. But whenever a bot is removed, it is easy to introduce new ones that are able to deceive detectors, giving rise to an arms race between botnet masters and OSM’s administrators.

Inspired and stimulated by recent literature, that stresses the susceptibility of human-operated accounts to activities of malicious bots, we have studied the relationship between fake news and so called credulous Twitter users, i.e., human-operated accounts following a high percentage of bots over their social contacts.

Starting by a publicly available dataset of fake and real news (concerned with politics and gossips) and by using bots and credulous detectors, we provided evidence of the actual involvement of credulous users in the diffusion/production of fake news. The experimental results showed that, regardless of the news topic, credulous users tweet a larger number of fake news than not-credulous one. Although this superiority is also confirmed for the number of tweets containing real news, it is worth saying that: (i) real news are harmless and (ii) the number of such tweets is still lower than those containing fake news. Furthermore, we observed the numerical participation of users, counting how many of each category have tweeted fake and/or real news; in this case we noticed a discordance depending on the news’s topic. Credulous who have tweeted true political news, are in greater numbers than those who have published fake news, unlike the case of gossip news. But either way, the number of credulous users tweeted fake news is always higher than the number of not-credulous users.

Because of this, we are pretty sure about the contribution of credulous detection techniques for improving fake news detectors; they would make it possible to focus on the content posted by credulous users using NLP and text mining techniques. We believe that the study of this category of users can help researchers to better understand misinformation and users’ polarization phenomenons and can give an extra edge to fight the propagation of fake news. It is worth to notice that the application of this kind of approach, that focuses on credulous users, is dependent from the classification performance of the adopted bot and credulous detectors.

Among the possible future work research directions, we plan to check if the findings of this work are confirmed by using other fake news datasets and apply fake news detection approaches (based on NLP and content inspection) to credulous users’ tweets.

0.7 Acknowledgment

The authors acknowledge the support of the IT division at LNGS in providing the computing resources (ULITE) used to perform the reported experiments.

This work has been partially supported by the European Union’s Horizon 2020 program (grant agreement No. 830892, SPARTA) and by IMT School for Advanced Studies: Integrated Activity Project TOFFEe ‘TOols for Fighting FakEs’. Finally, AB would like to thank Emilio Cruciani, Luca Di Stefano and Aline Uwimbabazi for discussions and suggestions.

References