Political Elections Under (Social) Fire? Analysis and Detection of Propaganda on Twitter

12/09/2019
by   Ansgar Kellner, et al.
0

For many, social networks have become the primary source of news, although the correctness of the provided information and its trustworthiness are often unclear. The investigations of the 2016 US presidential elections have brought the existence of external campaigns to light aiming at affecting the general political public opinion. In this paper, we investigate whether a similar influence on political elections can be observed in Europe as well. To this end, we use the past German federal election as an indicator and inspect the propaganda on Twitter, based on data from a period of 268 days. We find that 79 trolls from the US campaign have also acted upon the German federal election spreading right-wing views. Moreover, we develop a detector for finding automated behavior that enables us to identify 2,414 previously unknown bots.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 9

page 20

10/20/2017

Strategies and Influence of Social Bots in a 2017 German state election - A case study on Twitter

As social media has permeated large parts of the population it simultane...
10/16/2019

Right-wing German Hate Speech on Twitter: Analysis and Automatic Detection

Discussion about the social network Twitter often concerns its role in p...
11/03/2021

Shifting Polarization and Twitter News Influencers between two U.S. Presidential Elections

Social media are decentralized, interactive, and transformative, empower...
10/26/2020

PoliWAM: An Exploration of a Large Scale Corpus of Political Discussions on WhatsApp Messenger

WhatsApp Messenger is one of the most popular channels for spreading inf...
11/12/2020

Characterizing the roles of bots during the COVID-19 infodemic on Twitter

An infodemic is an emerging phenomenon caused by an overabundance of inf...
01/22/2021

Deepfakes and the 2020 US elections: what (did not) happen

Alarmed by the volume of disinformation that was assumed to have taken p...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The use of social media for propaganda purposes has become an integral part of cyber warfare [aro16]. Most prominently, in 2016 the US presidential elections have been targeted by a Russian interference campaign on Twitter [BadFerLer18]. However, the use of online propaganda is not an isolated phenomenon, but a global challenge [ShiJiaDri17, RamFerPin18, StiBleLieStr18]. The effect of political propaganda and fake news is further amplified by journalists that use Twitter to acquire “cutting-edge information” when chasing down trending topics for their next story [BroGra12, BovMak19], and distribute them via traditional media.

In this paper, we investigate whether a similar influence on political elections can be observed in Europe as well and thus analyze the Twitter coverage of the German federal election (Bundestagswahl) to figure out if the public opinion has been influenced and by how much. To this end, we have collected  million tweets related to the hashtags of all major German parties over  days, from January to September 2017. In contrast to earlier work on the influence on Twitter [YeWu10, BakHofMasWat11, RiqGon16], we focus on basic features that can directly be derived from the Tweets and their metadata, such as the number of retweets or quotes. The mere quantity of tweets is already sufficient to identify distinct events in time, that precede the election day, for instance, the presentation of the political manifestos of the individual parties or TV shows covering the election.

We start with the investigation of the influence of troll accounts of the Internet Research Agency (IRA), which have been disclosed in the context of the investigations of Russian interference in the 2016 US presidential elections [website:twitteriralist1, website:twitteriralist2]. We find that of these trolls have also been active for the German federal election, resulting in a total amount of  tweets in our dataset. Based on these first impressions we broaden our perspective to the entire political landscape looking for indicators of propaganda. In a detailed analysis, we survey specific topics and how these are related to political parties as well as individual users that have contributed to them. For instance, topics related to the controversial right-wing party Alternative für Deutschland (AfD) have been predominant during the election, including supporting as well as opposing positions.

Additionally, we develop a detector that is able to rate automated behavior in order to identify bot accounts in our dataset, which have been identified for being a root cause for the amplification of propaganda [WooHow16]

. Using this classifier we find

previously unknown bots, which represent of all user accounts in our dataset. While this number seems surprisingly large, it is perfectly in line with previous research, which states that of all active Twitter accounts are bots [VarFerDavMenFla17]. However, differentiating the automated behavior of bots and the repetitive manual actions of eagerly tweeting users is particularly difficult. Thus our results should be rather seen as first indicators.

In summary, we make the following contributions:

  • Analysis of Known Actors. We identify known actors involved in propaganda by correlating the published IRA troll accounts with the users from our dataset.

  • Investigation of the Propaganda Landscape. We analyze the largest dataset of tweets in the context of the German federal election, in particular,  million tweets over days, and inspect them regarding indicators of propaganda.

  • Detection of Automated Propaganda. We effectively detect  previously unknown bots that contribute to propaganda by implementing a classifier that can identify automated account behavior.

The remainder of the paper is organized as follows: Section 2 discusses the basic properties of our dataset that has been recorded during and prior to the German federal election. In Section 3, we investigate the presence of known propaganda actors in this data, before we discuss the overall political landscape of the dataset regarding indicators of propaganda in Section 4. Subsequently, we describe and evaluate our bot detector in Section 5. Related work is discussed in Section 6, while Section 7 concludes the paper.

2 The German Federal Election on Twitter

For our analysis, we consider  million tweets that have been published in the context of the German federal election (Bundestagswahl) and have been collected over days, from January to September 2017. As we are relying on the publicly available Twitter Stream, we receive maximally of all publicly available tweets. This limit, however, is only hit seldom. Due to random sampling, the subsequently reported numbers can be safely extrapolated and the drawn conclusions remain valid. To restrict our analysis to the German federal election, we apply the search terms shown in Table 1, that correspond to the abbreviations of the major German parties111We consider all parties that have cleared the threshold in the previous federal election (2013) or in one of the previous state elections (2014 – 2016). We additionally consider the NPD that has closely failed the threshold () in Saxony in 2014.. For Die Grünen and Die Linke we use different common abbreviations, derived from the list of recognized parties by the Federal Electoral Committee [website:federalreturiningofficer], as these do not bear official acronyms.

Based on a manual plausibility examination of the collected data on a sample basis, we found an exceptionally high amount of tweets in Portuguese language matching the search term fdp. Further investigation revealed that fdp is a commonly used abbreviation for a Portuguese swearword that is tainting our dataset. Due to the fact that the language of the affected tweets is not correctly identified by Twitter, we cannot use this feature for filtering. Instead, we completely exclude all tweets that contain the search term fdp, which has accounted for of the tweets.

In the following, we focus on the remaining tweets for further analysis. We proceed with the detection of known propaganda actors in our dataset.

3 Known Actors

In the course of the investigations of Russian interference in the 2016 US presidential elections, Twitter has composed a list of accounts that are linked to the Internet Research Agency (IRA) [website:twitteriralist1] and had been identified to be influential during the US elections. An updated list was forwarded to the US Congress in June 2018 [website:twitteriralist2] and released to the public to foster further research on the behavior of those accounts  [website:schiffstatement].

Party Political Direction Term
Alternative für Deutschland (AfD) Right-wing to far-right afd
Christlich Demokratische Union (CDU) Christ ian-democratic, liberal-conservative cdu
Christlich-Soziale Union (CSU) Christian-democratic, conservative csu
Freie Demokratische Partei (FDP) (Classical) Liberal fdp
Bündnis 90/Die Grünen Green politics gruene222Additionally: grüne, diegruenen, diegrünen
Die Linke Democratic socialist linke333Additionally: dielinke
Nationaldemokratische Partei Deutschlands (NPD) Ultra-nationalists npd
Sozialdemokratische Partei Deutschlands (SPD) Social-democratic spd
Table 1: Search terms used for the data acquisition.

Based on the assumption that existing Twitter accounts are often reused for other purposes, we try to identify the same trolls in our dataset. To this end, we match the list of the published IRA troll accounts to the user accounts from our dataset. Since the screen name of a user account can be freely changed, we first map the obtained screen names to their corresponding unique user IDs [ZanCauCriSirStrBla18]. In doing so, we are able to detect of the IRA troll accounts in our dataset which is of the total number of users. Surprisingly, only one of the identified accounts has changed its screen name during this time. However, the identified accounts are only responsible for a total amount of  tweets that is of the tweets from our dataset, rendering their potential direct influence comparably low. Interestingly, of the identified accounts have tweeted less than tweets over the entire time span, while the top troll accounts published more than tweets each. Similarly, to the entire dataset, most of the trolls’ tweets are actually retweets (); however, there is also a significant amount of original tweets () and fewer quotes (). Due to the fact that the list of IRA accounts was made publicly available a significant time ago, it is likely that the IRA has created new accounts that we are not aware of, yet.

(a) Creation dates of IRA troll accounts.
(b) Tweets posted in the context of the German federal election.
Figure 1: Internet Research Agency (IRA) troll accounts.

Figure (a)a shows the creation dates of the IRA accounts over the last few years. Most of the IRA accounts have been created before November 2016, the month of the US presidential elections, with a significant peak in July 2016. However, additional IRA accounts have been created between the beginning and mid-2017 which means right before the German federal election. Figure (b)b shows the number of tweet contributions of the IRA accounts in the context of the 2017 German federal election. Unsurprisingly, there is a strong increase of tweets over the year 2017, with its highest peaks at the beginning of September, the month of the election, and particularly on the day of the election itself.

Finally, to examine the impact of the IRA accounts on other users, we verify if other accounts do interact with the IRA troll accounts, for instance, by retweeting their tweets. First of all of the tweets posted by IRA accounts have been retweeted. Only tweets originate from the known IRA accounts, leaving the large remainder to other users. Interestingly, the quoted tweets from IRA accounts have all been quoted by other users, that are outside the peer group of known IRA accounts. Although the majority of the other users are likely regular user accounts, there seem to be a fraction of accounts that are unknown troll accounts, we are not aware of. We conclude that although the amount of IRA accounts and corresponding tweets is low, in comparison to the total amount of recorded users and tweets, there is a verifiable impact from the IRA accounts on other accounts of the dataset.

4 Propaganda Landscape

Based on our analysis of known propaganda actors, we broaden our perspective by taking the general political propaganda landscape into account. To this end, we proceed with an analysis of the total tweet corpus to verify if the same ratio of original tweets, retweets and quotes can be observed for all collected tweets and parties.

Figure 2: Development of tweet types over time.

Figure 2 shows the temporal development for original tweets (blue), retweets (yellow), and quoted tweets (green). Notice that the amount of retweets significantly exceeds the other two tweet types. Consequently, these are a particularly strong factor of amplification when spreading opinions. Original or quoted tweets occur roughly less frequent, each. However, the general trend leading up to the collection’s highest value at election day, and the shape of the amount’s development corresponds to all three types.

Throughout the recording, we observe local peaks that may be attributed to distinct events in time, which we briefly discuss in the following: In January the Federal Constitutional Court has ruled in favor of not banning the far-right, nationalists party NPD, which has been preceded and succeeded by heated debates. The state elections of Schleswig-Holstein (SH), Saarland (SL), and North Rhine-Westphalia (NRW), in turn, have only triggered mediocre response, whereas the presentation of the election manifestos for the German federal election partly receives significant attention. Particularly, the publication of the manifesto of the right-wing party AfD at the end of April is noteworthy at this point. Starting in August, we record a strong increase of tweets leading up to the federal election day on 24 of September. This rise is supported by several political talk shows, such as TV Duell and Fünfkampf at the beginning of September.

To get a clearer view of the involved user accounts and topics, we further analyze the most frequent hashtags, media files, and quoted/retweeted user accounts.

Hashtags

Among the ten most used hashtags we observe the acronyms of five political parties that have been up for election. Figure (a)a shows a summary of the top hashtags and their number of occurrences. Interestingly, the party that has triggered the largest peak in tweets when presenting their election manifesto, the AfD, also peaks in total as hashtag #afd, with occurrences. Thereby, the AfD occurs three times more often than the second-placed SPD with occurrences. The general hashtag for the German federal election, #btw, in turn, is only used in tweets. On the sixth place, with the campaign #traudichdeutschland, the AfD takes a prominent position for a second time with  mentions.

Moreover, in Figure (b)b we consider the most used combinations of hashtags and observe a similar dominance of the AfD. The hashtag #afd appears in four out of ten different combinations. In summary, the Alternative für Deutschland (AfD) seems to be particularly active on Twitter in comparison to other parties.

(a) Single hashtags
(b) Combinations of hashtags
Figure 3: Top-10 individual hashtags and combinations.

Media

(a) Immigrant numbers in comparison to the location of the most AfD votes.
(b) Longish text about why eligible voters should not vote for AfD.
(c) The former Chancellor Helmut Kohl, (✝ 16. June, 2017).
(d) Author A. Moore on protest votes.
(e) Fake AfD election poster by heute-show.
Figure 4: Selection from most tweeted media from the dataset.

Next, we discuss the five most frequently tweeted images that are related to the election (see Figure 4). With occurrences, Figure (a)a, showing two heat maps of Germany, is the most popular. It displays the proportion of foreigners per region on the left and the proportion of AfD voters per region on the right, showing a drastic imbalance. The image in Figure (b)b, tweeted times, shows a longish text about why eligible voters should not vote for the AfD. Using an image for a long text was very common in the early days of Twitter, since until November 2017 Twitter restricted the maximum number of characters per tweet to . The third most tweeted picture shows a black and white portrait of the former German Chancellor Helmut Kohl who died on the 16 of June, 2017. This news with the corresponding picture was tweeted  times.

The pictures shown in Figure (d)d and Figure (e)e occur and times, respectively, and also concern the AfD. However, this images likewise popularize against the party by, on the one hand, showing a comment of the British author A. Moore explaining the idiocy of protest votes and, on the other hand, displaying a fake AfD election poster, that was published by the German political satire show heute-show. Thus, the spike in hashtags likely cannot be traced back to the involvement of supporters alone, but also to opponents of this controversial party.

Quoted/Retweeted Users

(a) Top quoted users.
(b) Top retweeted users.
Figure 5: Top quotes and retweets.

As a measure of the popularity and influence of individual accounts, we also look at the most quoted and retweeted users on our recording. Figure (a)a and Figure (b)b show the top  users for both categories. Interestingly, @AfD_Bund, and @Beatrix_vStorch are present in both rankings. The first is the official account of the AfD party, and the latter is an AfD politician, so are @FraukePetry, @SteinbachErika, @lawyerberlin, and@Alice_Weidel. Consequently, the list of the ten most retweeted users is largely dominated by one party. The remaining accounts, @66Freedom66 and @DoraBromberger, advertise right-wing views and thus being also in line with the party.

Furthermore, three other political parties are rather prominently present: @CDU, @CSU, and @SPD. Especially the latter, the left-wing social democrats, have two politicians among the top quoted user accounts (@Ralf_Stegner and @MartinSchulz). The remaining accounts mainly correspond to popular German news magazines: @welt, @tagesschau, @wahlrecht_de, and @ZDFheute.

5 Detecting Automated Propaganda

Based on our findings on the political landscape in our dataset, we proceed with the identification of automatic bot behavior, which holds responsible for being one of the root causes for the amplification of heavily discussed political topics [WooHow16]

. To this end, we apply a supervised machine learning approach to detect bots.

Although the general topic of bot detection is well-known, the detection of political social bots, in particular, is still an open challenge, as indicated in related work [e.g., ChuGiaWanJaj12, FerVarDavMenFla16]. On the one hand, this is due to its diverse characteristics, involving the political direction and target audience, and, on the other hand, due to the constant evolution of social bots that are approaching a more human-like behavior by imitating common usage patterns [FerVarDavMenFla16, QiAlBro18].

For the implementation of our classifier, we make use of the insights gained from the identified IRA trolls and saliences found in our in-depth analysis of the political landscape.

Labeling

As the dataset has been just recorded for this purpose, there are no existing labels of bots and humans, respectively, available that are required to train a supervised machine learning model. We therefore manually attribute Twitter accounts for both classes using a set of simple heuristics. These include a test for repetitive behavior of the same tweeting pattern, a frequently posting of tweets without adhering sleep breaks at least every 

or tweeting of multiple hashtags from the trending topics combined with a URL, etc. Even for trained experts, the distinction between humans and bot remains a difficult challenge. To avoid wrongly labeled training samples, we concentrate on those accounts for which we could identify the class with high confidence. As a result, we gathered bot and human accounts in total for the training of the classifier.

Features

Based on the heuristics that were used to manually label the training data, we proceed with the engineering of additional features to improve the bot detection rate by exploring the available tweet and user profile information from our dataset. We engineered unique features that are covering the four main categories of metadata-based, text-based, time-based and user-based features. The metadata-based features include features such as the average number of tweets per day, the number of different clients used or the retweet-to-tweet ratio. In contrast, the text-based features comprise, for instance, the average tweet length, the vocabulary diversity or the URL ratio. Furthermore, the time-based features involve the longest average break within the median time between a retweet, the original tweet, etc. Finally, the user-based features imply, for example, the number of followers, the account verification status or the voluntary disclosure of being a bot. The complete list of derived features is presented in Table 3.

Models

We train and evaluate seven different machine learning algorithms for the classification of bots and humans. This includes the statistical-based LogisticRegression model, the non-parametric KNeighbors

model as well as the decision tree models

RandomForest, AdaBoost and GradientBoosting

. Apart from that the two support vector machine learning variants

LinearSVC and SVC are applied and evaluated for their aptitude.

We proceed with the application of our classifier in two experiments: a controlled experiment and an extrapolation of our findings. While the first controlled experiment targets the validation of our classifier on the previously labeled training data and comparison to Botometer [DavVarFerFlaMen16] as a baseline, the second extrapolate our findings by applying the classifier on the remainder of our unlabeled dataset as an indicator of the human-bot-ratio within the entire dataset.

5.1 Controlled Experiment

Next, we apply the selected machine learning models to our training data by making use of -fold cross-validation and repeating the experiments times, followed by averaging the result metrics. We identify the best parameter combination per classifier, by employing a grid search, optimizing for the metric of best average Area Under Curve (AUC). Table 2 shows the examined classifiers with the best parameters found for each classifier type, sorted by the average AUC overall repetitions in descending order. We further compute the F1-Score for a single value comparison that considers both the precision and the recall likewise. The best performance for each metric is shown in the table. The best performing classifier, regarding the average AUC, is the GradientBoosting classifier with an AUC of and -bounded AUC with .

rounded,border-color=gray Classifier Avg. F1-Score Avg. AUC GradientBoosting RandomForest AdaBoost SVC LogisticRegression LinearSVC KNeighbors

Table 2: Results of the tested classifiers.

Baseline

As a baseline, we compare our results to the predictions of Botometer, formerly known as BotOrNot [DavVarFerFlaMen16], a popular bot classifier that is publicly available on the Internet. To this end, we query the Botometer API for each of the previously labeled Twitter accounts from the training dataset to obtain a corresponding bot score. The Botometer classifier yields an AUC of and a value of if the false positive rate is bound to , that is, a false alarm rate of . Figure 6 shows the two ROC curves of Botometer and our improved GradientBoosting classifier. Our novel classifier outperforms the mature Botometer classifier on our dataset by providing significantly better results.

(a) Full range
(b) False positives bound to
Figure 6: Receiver Operating Characteristics (ROC) of GradientBoosting vs. Botometer.

5.2 Extrapolated Findings

As an indicator of the human-bot-ratio within our entire dataset, we apply the best performing classifier (GradientBoosting) on the remainder of our extracted user dataset. We focus on the potentially interesting users that have published at least tweets during the collection period. Using our classifier we obtained predictions for human and bot accounts. In total, that means in combination with the previously manually labeled accounts, we can identify human () and bot () accounts within the potentially interesting Twitter accounts. Though we do not have labels for the complete user dataset to verify our predictions, our results seem consistent with the recent study of VarFerDavMenFla17, who claim that between and of active Twitter accounts are likely to be bots.

6 Related Work

In the past, a plethora of research on various aspects of social media and Twitter has been conducted. In the following, we discuss the major points of contact with our work:

Analyses of Political Elections.

The first line of research deals with the analysis of political elections on Twitter. For instance, FraBelMer19 as well as PratSai2019 investigate the 2015 and 2016 general elections in Spain. While FraBelMer19 measure the regional support of political parties on Twitter during the electoral periods in 2015 and 2016, PratSai2019 focus on the two trending topics #24M and #Elections2015 on the election day in 2015 and build a predictive model to infer the ideological orientation of tweets. Also the US 2016 presidential elections on Twitter are a topic of ongoing research: For instance, SaiYogNasSah19 characterize the Twitter networks of the major presidential candidates, Donald Trump and Hillary Clinton, with various American hate groups defined by the US Southern Poverty Law Center (SPLC), while CaeLimSanMar18

analyze the political homophily of users on Twitter during the 2016 US presidential elections using sentiment analysis.

Furthermore, there are recent works on the 2017 German federal election: GimHaaSchWit18

collect a representative dataset on the German federal election and conduct a cluster analysis to derive eleven emergent roles from the most active users, while

MorShaCalKar18 try to discover communities and their corresponding themes during the German federal election. Subsequently, they analyze how content is generated by those communities and how the communities interact with each other.

Bot Detection.

The second line of research deals with the detection of bots on Twitter. Most recent works include ChaHamMue16 who present a correlation finder to identify colluding user accounts using la-sensitive hashing. This has the advantage that no labels are required as for supervised approaches. In contrast, CrePiePetSpoTes17 study the phenomenon of social spambots on Twitter and provide quantitative evidence for a paradigm-shift in spambot design. The authors claim that the new generation of bots imitates human behavior, thus making them harder to detect.

WalElo18 try to detect fake accounts that have been created by humans. To this end, a corpus of human account profiles was enriched with engineered features that had previously been used to detect fake accounts by bots. The tested supervised machine learning algorithms, could only detect the fake accounts with a F1 score of , showing that human-created fake accounts are much harder to detect than bot created accounts.

KudFer18

use a deep neural network based on contextual long short-term memory (LSTM) to detect bots at tweet level. Using synthetic minority oversampling, a large dataset is generated that is required to train the model. As a result, an AUC of

is achieved. Recently, CasAlPalAlfRamGonEloSan19 study the use of bots in the 2017 presidential elections in Chile. They manually derive labels for the training data and then build a classifier for detecting bots. Though the model reached good results in the training stage, the testing results were not as good as they hoped.

In comparison to the above-mentioned classifiers, our detector makes use of features from multiple categories of different domains i.e., metadata, text, time and user-profile, to cover all aspects of modern bot behavior.

7 Conclusion

We have analyzed a total of  million tweets to investigated the dissemination of propaganda in the context of the German federal election. We find that  of the trolls of Internet Research Agency (IRA) that have already been influencing the US presidential elections in 2016 have also been active a year later in Germany.

Based on these finding and the knowledge about the significance of retweets and quoted tweets for propaganda purposes, we have then broadened our analysis to the general political landscape. In this scope, we have particularly inspected the most tweeted hashtags and images as well as the involved users. Our evaluation shows that especially the right-wing party AfD has played a prominent role in several controversial discussions. The hashtag #afd, for instance, dominates the top-10 ranking of hashtag combinations and also the most retweeted users are all involved with this right-wing party. Given the partly significant influence on the public discourse on Twitter, it remains an open question whether this influence is driven by automated efforts and bots. The detector we have developed has enabled us to identify  previously unknown bots in our dataset, which account for  of all user accounts.

The large proportion of automated accounts highlights the potential danger when used for propaganda purposes. While it has been inconclusive whether the propaganda efforts observed in our dataset is mainly attributable to bot accounts, our study of the German federal election clearly shows that the political landscape heavily relies on propaganda on social media. Particularly troublesome is the amount of right-wing positions featured in the data.

References

Feature Description Metadata-based avg_tweets_per_day Average number of tweets per day. total_tweets Total number of tweets. orig_ratio Ratio of own composed tweets to total tweets retweet_ratio Ratio of retweeted tweets to total tweets. quote_ratio Ratio of quoted tweets to total number of tweets. reply_ratio Ratio of replies to total number of tweets. twitter_client Used Twitter client ( terms mapped via tf-idf). official_client Use of the official Twitter client. total_clients Total number of used Twitter clients. unique_users_retweet_ratio Ratio of unique users in retweets. unique_users_quotes_ratio Ratio of unique users in quotes. unique_users_retweet_ratio Ratio of unique users in replies. longest_conversation Longest conversation with a user. unique_users_conv_ratio Ratio of unique users in conversations. Text-based avg_text_len Average length of tweet text. std_text_len Standard Deviation of the length of tweet text. url_ratio Ratio of tweets with URL. unique_url_ratio Ratio of unique URLs in tweets. unique_url_host_ratio Ratio of unique host names in URLs. vocabulary_diversity Diversity of used vocabulary in tweets. mentions_ratio Ratio of mentions to tweets. hashtags_ratio Ratio of hashtags to tweets. unique_mentions_ratio Ratio of unique mentions to tweets. unique_hashtags_ratio Ratio of unique hashtags to tweets. ending_hashtags_ratio Ratio of tweets that ends with a hashtags. starting_mention_ratio Ratio of tweets that starts with a mention. starting_rt_ratio Ratio of tweets that starts with RT. zip_ratio Ratio of tweets after zipping to original size. user_simhash Simhash of all tweets per user. avg_duplicate_simhash Average of tweets that have similar simhash. duplicate_simhash_ratio Ratio of all duplicates to amount of from-users. Time-based chi_square_seconds -distribution of seconds of tweet creations. avg_longest_break Longest break of a user every on avg. avg_second_longest_break Second longest break of a user every on avg. median_retweet Median timespan between retweet & orig. tweet. median_quote Median timespan between a quote & orig. tweet. User-based total_friends Total number of friends. total_followers Total number of followers. friend_follower_ratio Ratio of number of friends to number of followers. has_default_profile_image Has the default profile image. has_default_user_image Has the default user image. is_verified Is a verified Twitter account. has_geo_coordinates User has geo coordinates enabled. self_bot User account contains the term ‘bot’ in name.
Table 3: Engineered features used for classifying automated propaganda, subdivided by four categories.

[bgcolor=tuBlueMedium100,fgcolor=tubsWhite]3 Technische Universität Braunschweig
Institute of System Security
Rebenring 56
38106 Braunschweig
Germany

[bgcolor=tuBlueMedium20]4