The use of social media for propaganda purposes has become an integral part of cyber warfare [aro16]. Most prominently, in 2016 the US presidential elections have been targeted by a Russian interference campaign on Twitter [BadFerLer18]. However, the use of online propaganda is not an isolated phenomenon, but a global challenge [ShiJiaDri17, RamFerPin18, StiBleLieStr18]. The effect of political propaganda and fake news is further amplified by journalists that use Twitter to acquire “cutting-edge information” when chasing down trending topics for their next story [BroGra12, BovMak19], and distribute them via traditional media.
In this paper, we investigate whether a similar influence on political elections can be observed in Europe as well and thus analyze the Twitter coverage of the German federal election (Bundestagswahl) to figure out if the public opinion has been influenced and by how much. To this end, we have collected million tweets related to the hashtags of all major German parties over days, from January to September 2017. In contrast to earlier work on the influence on Twitter [YeWu10, BakHofMasWat11, RiqGon16], we focus on basic features that can directly be derived from the Tweets and their metadata, such as the number of retweets or quotes. The mere quantity of tweets is already sufficient to identify distinct events in time, that precede the election day, for instance, the presentation of the political manifestos of the individual parties or TV shows covering the election.
We start with the investigation of the influence of troll accounts of the Internet Research Agency (IRA), which have been disclosed in the context of the investigations of Russian interference in the 2016 US presidential elections [website:twitteriralist1, website:twitteriralist2]. We find that of these trolls have also been active for the German federal election, resulting in a total amount of tweets in our dataset. Based on these first impressions we broaden our perspective to the entire political landscape looking for indicators of propaganda. In a detailed analysis, we survey specific topics and how these are related to political parties as well as individual users that have contributed to them. For instance, topics related to the controversial right-wing party Alternative für Deutschland (AfD) have been predominant during the election, including supporting as well as opposing positions.
Additionally, we develop a detector that is able to rate automated behavior in order to identify bot accounts in our dataset, which have been identified for being a root cause for the amplification of propaganda [WooHow16]
. Using this classifier we findpreviously unknown bots, which represent of all user accounts in our dataset. While this number seems surprisingly large, it is perfectly in line with previous research, which states that – of all active Twitter accounts are bots [VarFerDavMenFla17]. However, differentiating the automated behavior of bots and the repetitive manual actions of eagerly tweeting users is particularly difficult. Thus our results should be rather seen as first indicators.
In summary, we make the following contributions:
Analysis of Known Actors. We identify known actors involved in propaganda by correlating the published IRA troll accounts with the users from our dataset.
Investigation of the Propaganda Landscape. We analyze the largest dataset of tweets in the context of the German federal election, in particular, million tweets over days, and inspect them regarding indicators of propaganda.
Detection of Automated Propaganda. We effectively detect previously unknown bots that contribute to propaganda by implementing a classifier that can identify automated account behavior.
The remainder of the paper is organized as follows: Section 2 discusses the basic properties of our dataset that has been recorded during and prior to the German federal election. In Section 3, we investigate the presence of known propaganda actors in this data, before we discuss the overall political landscape of the dataset regarding indicators of propaganda in Section 4. Subsequently, we describe and evaluate our bot detector in Section 5. Related work is discussed in Section 6, while Section 7 concludes the paper.
2 The German Federal Election on Twitter
For our analysis, we consider million tweets that have been published in the context of the German federal election (Bundestagswahl) and have been collected over days, from January to September 2017. As we are relying on the publicly available Twitter Stream, we receive maximally of all publicly available tweets. This limit, however, is only hit seldom. Due to random sampling, the subsequently reported numbers can be safely extrapolated and the drawn conclusions remain valid. To restrict our analysis to the German federal election, we apply the search terms shown in Table 1, that correspond to the abbreviations of the major German parties111We consider all parties that have cleared the threshold in the previous federal election (2013) or in one of the previous state elections (2014 – 2016). We additionally consider the NPD that has closely failed the threshold () in Saxony in 2014.. For Die Grünen and Die Linke we use different common abbreviations, derived from the list of recognized parties by the Federal Electoral Committee [website:federalreturiningofficer], as these do not bear official acronyms.
Based on a manual plausibility examination of the collected data on a sample basis, we found an exceptionally high amount of tweets in Portuguese language matching the search term fdp. Further investigation revealed that fdp is a commonly used abbreviation for a Portuguese swearword that is tainting our dataset. Due to the fact that the language of the affected tweets is not correctly identified by Twitter, we cannot use this feature for filtering. Instead, we completely exclude all tweets that contain the search term fdp, which has accounted for of the tweets.
In the following, we focus on the remaining tweets for further analysis. We proceed with the detection of known propaganda actors in our dataset.
3 Known Actors
In the course of the investigations of Russian interference in the 2016 US presidential elections, Twitter has composed a list of accounts that are linked to the Internet Research Agency (IRA) [website:twitteriralist1] and had been identified to be influential during the US elections. An updated list was forwarded to the US Congress in June 2018 [website:twitteriralist2] and released to the public to foster further research on the behavior of those accounts [website:schiffstatement].
|Alternative für Deutschland (AfD)||Right-wing to far-right||afd|
|Christlich Demokratische Union (CDU)||Christ ian-democratic, liberal-conservative||cdu|
|Christlich-Soziale Union (CSU)||Christian-democratic, conservative||csu|
|Freie Demokratische Partei (FDP)||(Classical) Liberal||fdp|
|Bündnis 90/Die Grünen||Green politics||gruene222Additionally: grüne, diegruenen, diegrünen|
|Die Linke||Democratic socialist||linke333Additionally: dielinke|
|Nationaldemokratische Partei Deutschlands (NPD)||Ultra-nationalists||npd|
|Sozialdemokratische Partei Deutschlands (SPD)||Social-democratic||spd|
Based on the assumption that existing Twitter accounts are often reused for other purposes, we try to identify the same trolls in our dataset. To this end, we match the list of the published IRA troll accounts to the user accounts from our dataset. Since the screen name of a user account can be freely changed, we first map the obtained screen names to their corresponding unique user IDs [ZanCauCriSirStrBla18]. In doing so, we are able to detect of the IRA troll accounts in our dataset which is of the total number of users. Surprisingly, only one of the identified accounts has changed its screen name during this time. However, the identified accounts are only responsible for a total amount of tweets that is of the tweets from our dataset, rendering their potential direct influence comparably low. Interestingly, of the identified accounts have tweeted less than tweets over the entire time span, while the top troll accounts published more than tweets each. Similarly, to the entire dataset, most of the trolls’ tweets are actually retweets (); however, there is also a significant amount of original tweets () and fewer quotes (). Due to the fact that the list of IRA accounts was made publicly available a significant time ago, it is likely that the IRA has created new accounts that we are not aware of, yet.
Figure (a)a shows the creation dates of the IRA accounts over the last few years. Most of the IRA accounts have been created before November 2016, the month of the US presidential elections, with a significant peak in July 2016. However, additional IRA accounts have been created between the beginning and mid-2017 which means right before the German federal election. Figure (b)b shows the number of tweet contributions of the IRA accounts in the context of the 2017 German federal election. Unsurprisingly, there is a strong increase of tweets over the year 2017, with its highest peaks at the beginning of September, the month of the election, and particularly on the day of the election itself.
Finally, to examine the impact of the IRA accounts on other users, we verify if other accounts do interact with the IRA troll accounts, for instance, by retweeting their tweets. First of all of the tweets posted by IRA accounts have been retweeted. Only tweets originate from the known IRA accounts, leaving the large remainder to other users. Interestingly, the quoted tweets from IRA accounts have all been quoted by other users, that are outside the peer group of known IRA accounts. Although the majority of the other users are likely regular user accounts, there seem to be a fraction of accounts that are unknown troll accounts, we are not aware of. We conclude that although the amount of IRA accounts and corresponding tweets is low, in comparison to the total amount of recorded users and tweets, there is a verifiable impact from the IRA accounts on other accounts of the dataset.
4 Propaganda Landscape
Based on our analysis of known propaganda actors, we broaden our perspective by taking the general political propaganda landscape into account. To this end, we proceed with an analysis of the total tweet corpus to verify if the same ratio of original tweets, retweets and quotes can be observed for all collected tweets and parties.
Figure 2 shows the temporal development for original tweets (blue), retweets (yellow), and quoted tweets (green). Notice that the amount of retweets significantly exceeds the other two tweet types. Consequently, these are a particularly strong factor of amplification when spreading opinions. Original or quoted tweets occur roughly less frequent, each. However, the general trend leading up to the collection’s highest value at election day, and the shape of the amount’s development corresponds to all three types.
Throughout the recording, we observe local peaks that may be attributed to distinct events in time, which we briefly discuss in the following: In January the Federal Constitutional Court has ruled in favor of not banning the far-right, nationalists party NPD, which has been preceded and succeeded by heated debates. The state elections of Schleswig-Holstein (SH), Saarland (SL), and North Rhine-Westphalia (NRW), in turn, have only triggered mediocre response, whereas the presentation of the election manifestos for the German federal election partly receives significant attention. Particularly, the publication of the manifesto of the right-wing party AfD at the end of April is noteworthy at this point. Starting in August, we record a strong increase of tweets leading up to the federal election day on 24 of September. This rise is supported by several political talk shows, such as TV Duell and Fünfkampf at the beginning of September.
To get a clearer view of the involved user accounts and topics, we further analyze the most frequent hashtags, media files, and quoted/retweeted user accounts.
Among the ten most used hashtags we observe the acronyms of five political parties that have been up for election. Figure (a)a shows a summary of the top hashtags and their number of occurrences. Interestingly, the party that has triggered the largest peak in tweets when presenting their election manifesto, the AfD, also peaks in total as hashtag #afd, with occurrences. Thereby, the AfD occurs three times more often than the second-placed SPD with occurrences. The general hashtag for the German federal election, #btw, in turn, is only used in tweets. On the sixth place, with the campaign #traudichdeutschland, the AfD takes a prominent position for a second time with mentions.
Moreover, in Figure (b)b we consider the most used combinations of hashtags and observe a similar dominance of the AfD. The hashtag #afd appears in four out of ten different combinations. In summary, the Alternative für Deutschland (AfD) seems to be particularly active on Twitter in comparison to other parties.
Next, we discuss the five most frequently tweeted images that are related to the election (see Figure 4). With occurrences, Figure (a)a, showing two heat maps of Germany, is the most popular. It displays the proportion of foreigners per region on the left and the proportion of AfD voters per region on the right, showing a drastic imbalance. The image in Figure (b)b, tweeted times, shows a longish text about why eligible voters should not vote for the AfD. Using an image for a long text was very common in the early days of Twitter, since until November 2017 Twitter restricted the maximum number of characters per tweet to . The third most tweeted picture shows a black and white portrait of the former German Chancellor Helmut Kohl who died on the 16 of June, 2017. This news with the corresponding picture was tweeted times.
The pictures shown in Figure (d)d and Figure (e)e occur and times, respectively, and also concern the AfD. However, this images likewise popularize against the party by, on the one hand, showing a comment of the British author A. Moore explaining the idiocy of protest votes and, on the other hand, displaying a fake AfD election poster, that was published by the German political satire show heute-show. Thus, the spike in hashtags likely cannot be traced back to the involvement of supporters alone, but also to opponents of this controversial party.
As a measure of the popularity and influence of individual accounts, we also look at the most quoted and retweeted users on our recording. Figure (a)a and Figure (b)b show the top users for both categories. Interestingly, @AfD_Bund, and @Beatrix_vStorch are present in both rankings. The first is the official account of the AfD party, and the latter is an AfD politician, so are @FraukePetry, @SteinbachErika, @lawyerberlin, and@Alice_Weidel. Consequently, the list of the ten most retweeted users is largely dominated by one party. The remaining accounts, @66Freedom66 and @DoraBromberger, advertise right-wing views and thus being also in line with the party.
Furthermore, three other political parties are rather prominently present: @CDU, @CSU, and @SPD. Especially the latter, the left-wing social democrats, have two politicians among the top quoted user accounts (@Ralf_Stegner and @MartinSchulz). The remaining accounts mainly correspond to popular German news magazines: @welt, @tagesschau, @wahlrecht_de, and @ZDFheute.
5 Detecting Automated Propaganda
Based on our findings on the political landscape in our dataset, we proceed with the identification of automatic bot behavior, which holds responsible for being one of the root causes for the amplification of heavily discussed political topics [WooHow16]
. To this end, we apply a supervised machine learning approach to detect bots.
Although the general topic of bot detection is well-known, the detection of political social bots, in particular, is still an open challenge, as indicated in related work [e.g., ChuGiaWanJaj12, FerVarDavMenFla16]. On the one hand, this is due to its diverse characteristics, involving the political direction and target audience, and, on the other hand, due to the constant evolution of social bots that are approaching a more human-like behavior by imitating common usage patterns [FerVarDavMenFla16, QiAlBro18].
For the implementation of our classifier, we make use of the insights gained from the identified IRA trolls and saliences found in our in-depth analysis of the political landscape.
As the dataset has been just recorded for this purpose, there are no existing labels of bots and humans, respectively, available that are required to train a supervised machine learning model. We therefore manually attribute Twitter accounts for both classes using a set of simple heuristics. These include a test for repetitive behavior of the same tweeting pattern, a frequently posting of tweets without adhering sleep breaks at least everyor tweeting of multiple hashtags from the trending topics combined with a URL, etc. Even for trained experts, the distinction between humans and bot remains a difficult challenge. To avoid wrongly labeled training samples, we concentrate on those accounts for which we could identify the class with high confidence. As a result, we gathered bot and human accounts in total for the training of the classifier.
Based on the heuristics that were used to manually label the training data, we proceed with the engineering of additional features to improve the bot detection rate by exploring the available tweet and user profile information from our dataset. We engineered unique features that are covering the four main categories of metadata-based, text-based, time-based and user-based features. The metadata-based features include features such as the average number of tweets per day, the number of different clients used or the retweet-to-tweet ratio. In contrast, the text-based features comprise, for instance, the average tweet length, the vocabulary diversity or the URL ratio. Furthermore, the time-based features involve the longest average break within the median time between a retweet, the original tweet, etc. Finally, the user-based features imply, for example, the number of followers, the account verification status or the voluntary disclosure of being a bot. The complete list of derived features is presented in Table 3.
We train and evaluate seven different machine learning algorithms for the classification of bots and humans. This includes the statistical-based LogisticRegression model, the non-parametric KNeighbors
model as well as the decision tree modelsRandomForest, AdaBoost and GradientBoosting
. Apart from that the two support vector machine learning variantsLinearSVC and SVC are applied and evaluated for their aptitude.
We proceed with the application of our classifier in two experiments: a controlled experiment and an extrapolation of our findings. While the first controlled experiment targets the validation of our classifier on the previously labeled training data and comparison to Botometer [DavVarFerFlaMen16] as a baseline, the second extrapolate our findings by applying the classifier on the remainder of our unlabeled dataset as an indicator of the human-bot-ratio within the entire dataset.
5.1 Controlled Experiment
Next, we apply the selected machine learning models to our training data by making use of -fold cross-validation and repeating the experiments times, followed by averaging the result metrics. We identify the best parameter combination per classifier, by employing a grid search, optimizing for the metric of best average Area Under Curve (AUC). Table 2 shows the examined classifiers with the best parameters found for each classifier type, sorted by the average AUC overall repetitions in descending order. We further compute the F1-Score for a single value comparison that considers both the precision and the recall likewise. The best performance for each metric is shown in the table. The best performing classifier, regarding the average AUC, is the GradientBoosting classifier with an AUC of and -bounded AUC with .
As a baseline, we compare our results to the predictions of Botometer, formerly known as BotOrNot [DavVarFerFlaMen16], a popular bot classifier that is publicly available on the Internet. To this end, we query the Botometer API for each of the previously labeled Twitter accounts from the training dataset to obtain a corresponding bot score. The Botometer classifier yields an AUC of and a value of if the false positive rate is bound to , that is, a false alarm rate of . Figure 6 shows the two ROC curves of Botometer and our improved GradientBoosting classifier. Our novel classifier outperforms the mature Botometer classifier on our dataset by providing significantly better results.
5.2 Extrapolated Findings
As an indicator of the human-bot-ratio within our entire dataset, we apply the best performing classifier (GradientBoosting) on the remainder of our extracted user dataset. We focus on the potentially interesting users that have published at least tweets during the collection period. Using our classifier we obtained predictions for human and bot accounts. In total, that means in combination with the previously manually labeled accounts, we can identify human () and bot () accounts within the potentially interesting Twitter accounts. Though we do not have labels for the complete user dataset to verify our predictions, our results seem consistent with the recent study of VarFerDavMenFla17, who claim that between and of active Twitter accounts are likely to be bots.
6 Related Work
In the past, a plethora of research on various aspects of social media and
Twitter has been conducted.
In the following, we discuss the major points of contact with our
Analyses of Political Elections.
The first line of research deals with the analysis of political elections on Twitter. For instance, FraBelMer19 as well as PratSai2019 investigate the 2015 and 2016 general elections in Spain. While FraBelMer19 measure the regional support of political parties on Twitter during the electoral periods in 2015 and 2016, PratSai2019 focus on the two trending topics #24M and #Elections2015 on the election day in 2015 and build a predictive model to infer the ideological orientation of tweets. Also the US 2016 presidential elections on Twitter are a topic of ongoing research: For instance, SaiYogNasSah19 characterize the Twitter networks of the major presidential candidates, Donald Trump and Hillary Clinton, with various American hate groups defined by the US Southern Poverty Law Center (SPLC), while CaeLimSanMar18
analyze the political homophily of users on Twitter during the 2016 US presidential elections using sentiment analysis.
Furthermore, there are recent works on the 2017 German federal election: GimHaaSchWit18
collect a representative dataset on the German federal election and conduct a cluster analysis to derive eleven emergent roles from the most active users, whileMorShaCalKar18 try to discover communities and their corresponding themes during the German federal election. Subsequently, they analyze how content is generated by those communities and how the communities interact with each other.
The second line of research deals with the detection of bots on Twitter. Most recent works include ChaHamMue16 who present a correlation finder to identify colluding user accounts using la-sensitive hashing. This has the advantage that no labels are required as for supervised approaches. In contrast, CrePiePetSpoTes17 study the phenomenon of social spambots on Twitter and provide quantitative evidence for a paradigm-shift in spambot design. The authors claim that the new generation of bots imitates human behavior, thus making them harder to detect.
WalElo18 try to detect fake accounts that have been created by humans. To this end, a corpus of human account profiles was enriched with engineered features that had previously been used to detect fake accounts by bots. The tested supervised machine learning algorithms, could only detect the fake accounts with a F1 score of , showing that human-created fake accounts are much harder to detect than bot created accounts.
use a deep neural network based on contextual long short-term memory (LSTM) to detect bots at tweet level. Using synthetic minority oversampling, a large dataset is generated that is required to train the model. As a result, an AUC ofis achieved. Recently, CasAlPalAlfRamGonEloSan19 study the use of bots in the 2017 presidential elections in Chile. They manually derive labels for the training data and then build a classifier for detecting bots. Though the model reached good results in the training stage, the testing results were not as good as they hoped.
In comparison to the above-mentioned classifiers, our detector makes use of features from multiple categories of different domains i.e., metadata, text, time and user-profile, to cover all aspects of modern bot behavior.
We have analyzed a total of million tweets to investigated the dissemination of propaganda in the context of the German federal election. We find that of the trolls of Internet Research Agency (IRA) that have already been influencing the US presidential elections in 2016 have also been active a year later in Germany.
Based on these finding and the knowledge about the significance of retweets and quoted tweets for propaganda purposes, we have then broadened our analysis to the general political landscape. In this scope, we have particularly inspected the most tweeted hashtags and images as well as the involved users. Our evaluation shows that especially the right-wing party AfD has played a prominent role in several controversial discussions. The hashtag #afd, for instance, dominates the top-10 ranking of hashtag combinations and also the most retweeted users are all involved with this right-wing party. Given the partly significant influence on the public discourse on Twitter, it remains an open question whether this influence is driven by automated efforts and bots. The detector we have developed has enabled us to identify previously unknown bots in our dataset, which account for of all user accounts.
The large proportion of automated accounts highlights the potential danger when used for propaganda purposes. While it has been inconclusive whether the propaganda efforts observed in our dataset is mainly attributable to bot accounts, our study of the German federal election clearly shows that the political landscape heavily relies on propaganda on social media. Particularly troublesome is the amount of right-wing positions featured in the data.
Technische Universität Braunschweig
Institute of System Security