Tackling spam in the era of end-to-end encryption: A case study of WhatsApp

06/08/2021 ∙ by Pushkal Agarwal, et al. ∙ University of Surrey MIT King's College London Queen Mary University of London 0

WhatsApp is a popular messaging app used by over a billion users around the globe. Due to this popularity, spam on WhatsApp is an important issue. Despite this, the distribution of spam via WhatsApp remains understudied by researchers, in part because of the end-to-end encryption offered by the platform. This paper addresses this gap by studying spam on a dataset of 2.6 million messages sent to 5,051 public WhatsApp groups in India over 300 days. First, we characterise spam content shared within public groups and find that nearly 1 in 10 messages is spam. We observe a wide selection of topics ranging from job ads to adult content, and find that spammers post both URLs and phone numbers to promote material. Second, we inspect the nature of spammers themselves. We find that spam is often disseminated by groups of phone numbers, and that spam messages are generally shared for longer duration than non-spam messages. Finally, we devise content and activity based detection algorithms that can counter spam.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

WhatsApp is the most popular messaging app in the world, with over 1.5 billion active users each day and over 5 billion downloads from the Android Play Store alone. Thus, for many, WhatsApp has become a key part of the communications landscape. With this massive popularity, spam has become a challenge for WhatsApp. However, unlike spam email, where platforms can read content, WhatsApp follows an end-to-end encryption model where the message content is not accessible. Although this offers stronger guarantees on privacy, it makes moderation and spam detection difficult.

Although WhatsApp has made progress (Jones, 2017) in detecting users who send unsolicited messages to individuals, there is no solution for spammers who send to public WhatsApp groups (Garimella and Tyson, 2018). Public WhatsApp groups are groups where the admins publicly share a link (e.g., via a website or social platform) to join the group. The links allow external observers such as researchers to join the group and thereby observe the messages exchanged, including spam, although the content remains opaque to the platform itself.

Because of the link-based joining behaviour, public WhatsApp groups typically contain several users who may be strangers to each other, i.e., users who do not have a social connection in the offline world other than via the WhatsApp group. WhatsApp provides strong protection from being contacted by strangers, allowing users to easily block unsolicited messages from those not in their contact list. In contrast, as long as a user is a member of a public WhatsApp group, they cannot avoid messages that strangers may send to that group, regardless of whether the messages sent are germane to the group or not. Thus, spammers can abuse public WhatsApp groups by sending unwanted messages that are irrelevant to the purpose of a group, e.g., links to adult content or sexual services, phony job offers etc.

Membership of public WhatsApp groups is common, e.g., field interviews with Venezuelan migrants in Columbia showed that over 50% of those with a smart phone were part of public WhatsApp groups (Chang, 2020). Similarly, surveys show that 1/6 users in India and Brazil were a part of a public WhatsApp group discussing politics (Lokniti, 2018; Newman et al., 2019). Furthermore, large public groups on WhatsApp account for a disproportionate amount of the total volume of content received (Rosenfeld et al., 2018).

Understanding the nature of spam in public groups is therefore vital for securing WhatsApp and other messaging platforms. In this paper, we focus on India, a country where over 400 of the 460 million people online are on WhatsApp. The country has been a focus of cleanup efforts by the platform itself,111https://www.theguardian.com/technology/2019/feb/06/whatsapp-deleting-two-million-accounts-per-month-to-stop-fake-news thereby presenting an ideal case study. As a topic that attracts interest from across the population, we focus on national politics, and gather 2.6 million messages from 5,051 public political WhatsApp groups in India (§2).

With this dataset, we analyse the characteristics of spam (§3) as well as the actions of spammers (§4). Unsurprisingly, we find parallels with other forms of spam. For example, as with other platforms (Redmiles et al., 2018), WhatsApp spam consists of topics such as job advertisements, click bait and adult content. As with traditional email spam (Thomas and others, 2011) as well as social spam (Cao and Caverlee, 2015), URLs play a key role, comprising 57% of all spam messages. Spammers also use URL shorteners, which are known to be a key mechanism for hiding spam URLs from users (Wang and others, 2013; Klien and Strohmaier, 2012). Since WhatsApp is phone-based, 1/4 of the spam messages include a phone number, possibly as a way to engage users outside of the platform.

We further observe a number of distinct trends, unique to WhatsApp. For instance, we find evidence of spammers using multiple phone numbers, spreading the same spam message over a small number of ‘active’ days (note that phone numbers are typically more difficult to obtain than email addresses in India). We also observe regional trends with, for example, a build-up of spammers based in Russia. These unusual traits lead us to explore spammers’ evasion tactics. For example, nearly a quarter of all phone number changes are performed by spam accounts. Spammers are also being enabled by the public WhatsApp links, with the majority of ‘join via link’ operations being accounted for by spammers.

Finally, building on the above insights, we investigate ways to counter spam, whilst not breaking the end-to-end encryption model of WhatsApp (§5). As a baseline, we train a traditional content-based solution for email, SpamAssassin (Mason, 2002)

, and compare it against a privacy-preserving classifier, that does not make use of the message content but only relies on features like group join events or posting frequency. We also develop a local on-device classifier, showing that we can detect spam without needing to compromise end-to-end encryption requirements. In all cases, we obtain good performance, with accuracy scores exceeding 0.85. This offers a foundation for future moderation efforts in the era of end-to-end encrypted messaging.

2. Background and Dataset

2.1. Overview of WhatsApp

WhatsApp is an end-to-end encrypted mobile messaging application. Contacts can be international as well as domestic phone numbers and WhatsApp does not charge for messages. Apart from direct messages, users can join groups where members get messages and notifications for every post in the group. A user’s phone number is their WhatsApp identifier. Partly because of this near zero cost structure, WhatsApp has gained popularity in developing nations and has acquired a large user base.

Due to this scale, anecdotal evidence suggests that malicious actors have started to take advantage of WhatsApp. Particularly, unwanted or unsolicited messages have been on the rise (Black, 2020; Jones, 2017). This is dangerous in the Indian context where many WhatsApp users are digital neophytes who may not have experienced digital spam previously. Such spam is particularly prevalent on public WhatsApp groups, where any one can join through an openly available link.

2.2. Data collection methodology

We take inspiration from prior work, which found that public groups that discuss politics are widely used in India (Lokniti, 2018) and Brazil (Newman et al., 2019). Our study is based on data from public WhatsApp groups discussing politics in India. Using an extensive set of 349 manually curated keywords222List available: https://www.dropbox.com/sh/cm66rha982f2hlj/AADi5QZLIiz0n6iQ9aEVTF9ua?dl=0 from multiple Indian languages (including English) relating to politicians and political parties in multiple languages, we searched for WhatsApp group links (chat.whatsapp.com) on Facebook, Twitter and Google during November 2018. This yielded 5,051 groups. These are typically created by political parties or party supporters in order to reach an audience which is only available via WhatsApp. Hence, most of these groups have a well defined organizational structure (Banaji and others, 2019). Note, due to end-to-end encryption, WhatsApp is limited in its ability to moderate content in these groups. Instead, moderation is mostly up to group admins, who also have powers to remove users.

We also clarify at the outset that the topic of the groups (i.e., national politics) is not key to our study. Rather, we see these groups as exemplars of the kind of spam that may be seen on public WhatsApp groups, and the choice of topic is simply driven by the fact that it is a topic of interest across all of India. Added to this is the convenient availability of a well understood method to search for the existence of such groups based on keywords (Garimella and Tyson, 2018; Garimella and Eckles, 2020).

We then use the toolkit from (Garimella and Tyson, 2018) to collect data from 5,051 selected groups. Using Selenium to programmatically join each group, we take periodic snapshots of the message database stored by WhatsApp between October 2018 and August 2019. Across the 5,051 groups, we collect 2.6 million messages posted by over 172K unique users over a period of 302 days. We also record 437K action events, covering actions taken by users including entering or leaving groups and changing phone numbers. Table 1 summarizes the actions performed by users within a group. Ethics note: Our data collection abides by the terms of service of WhatsApp and was approved by the IRB at MIT university. All data was anonymised before analysis, and any personally identifiable information was masked. All phone numbers were one-way hashed, after extracting the country code.

Action Description Action Counts Unique Users
added added by a member 61k 37k
added_by_admin added by a group admin 73k 49k
joined_via_link joined via an invite link 132k 54k
left left the group 154k 73k
removed removed from a group 9k 7.3k
number_changed changed from one number to another 6k 1.5k
Table 1. Actions captured within a group.

2.3. Message pre-processing

Our dataset consists of 2.6M messages out of which only 1.4M are unique. We filter and cluster these messages to have a unique and distinct set of messages.

2.3.1. Message filtering.

Because the content of the message is important in identifying spam, we focus on the four top languages (Hindi, English, Telugu and Tamil) which collectively represent 74% of messages in our dataset. We also remove messages containing just URLs (no accompanying text), boilerplate content such as ‘hi’ or ‘good morning’, and, messages consisting solely of emojis, which constitute around 25% of our data. We filter such content to avoid characterising low entropy posts – although they are off-topic and widely considered as spam as well (Gupta et al., 2019), it is relatively easy to implement client-side filters for specific text such as ‘hi’ or ‘good morning’, whereas filtering out the other messages which we classify as spam is a harder problem (studied in §5). Filtering these out, we are left with 766K messages. As later discussed, the percentage of boilerplate messages among spammers is 7% vs. 12% for legitimate users; both groups extensively post boilerplate content.

2.3.2. Message clustering.

We qualitatively observed that many messages are close variants of each other. To group together near-similar variants of the same message, we use MinHash and Locality Sensitive Hashing (LSH) (Gionis et al., 1999; Mullen, 2015).333Note that other approaches such as fuzzy matching or semantic sentence embeddings could be used here. Given the scale of our dataset, LSH was a reasonable choice. LSH has been used for similar tasks in the past (Stringhini et al., 2010). We find that the best clustering performance is obtained by using 10 min-hashes in 5 bands. We made this choice by taking 100 near-identical messages and 100 distinct messages, and experimenting with a range of parameters to derive the optimal for separating out these two sets.

Overall, this results in 73K clusters, with an average of 10 (median 4) messages in each cluster. As a quality check, we randomly select 100 random clusters from each of the 4 languages and manually verify that 97% are similar (the rest are from a news bot (DUTA, 2018), where the end note of the message is identical but the remaining text is disjointed).

Of the 73K clusters, 25.4K have at least 5 messages, and contain 420K messages in total. These 420K messages, which we term “frequently sent messages” (sometimes shortened to “frequent messages”) are the object of study for the rest of the paper, except when studying URLs (§3.2.2), where we include all the messages containing URLs.

2.4. Spam annotation

The above yields a substantial WhatsApp message dataset. We next describe how we annotate these messages as ‘spam’ or ‘ham’ (non-spam).

2.4.1. Manual annotation.

We start by identifying a seed set of users who were manually removed from at least two groups by their admins. We conjecture that these 257 users may be likely to share spam. We then extract the 68K messages (grouped into 1,004 clusters) sent by these users and manually annotate each cluster as ‘spam’ or ‘ham’.

We perform the annotations in two stages. First, two English-speaking annotators work independently to label the 220 English clusters contained within the 1,004 clusters. As guidelines for the annotators, we ask them to identify as spam the following kinds of messages : (i) Promotion messages (if non-political promotion as these groups are for political discussions). (ii) Adult content related messages. (iii) Invitations to register for external services via links, often for money. (iv) Offers to earn money, win prizes etc. or (v) Anything which looks ‘suspicious’ and not relevant to the group at large. Although this may exclude other more nuanced forms of spam, it offers a powerful lower bound to work from. The inter-rater agreement (Cohen’s-Kappa) was high (0.96). The remaining disagreements between annotators were discussed to reach full agreement. Following this initial step, which validated the ability to identify spam messages with high agreement levels, we progress to the second stage. Here, we issue the remaining set of 784 clusters to annotators (native speakers of Hindi, Telugu and Tamil).444We had 2 annotators for English, 3 for Hindi and 1 each for Telugu and Tamil. Each annotator received the subset of messages in their native language, and were given identical guidelines (as described above). In total, the above two steps result in 63K messages (from 663 clusters) being tagged as spam and 5K messages (341 clusters) being tagged as ham.

To verify that our conception of spam and ham is reasonable, we created a panel of 11 independent assessors who looked at a randomly chosen set of 100 messages (split into 50 spam and 50 ham messages). All assessors were Indians, and collectively represented 7 states. The panel were only given the information that these messages are from public WhatsApp groups related to Indian politics and were asked whether they agree with our annotators’ ‘spam’ and ‘ham’ labels, based on their own personal understanding of what they would consider as ‘spam’. We find that there is a median agreement on 95% of the labels across all the assessors.

2.4.2. Semi-Automatic Annotation.

To broaden our analysis, we construct a dictionary of words that are used at least 5 times in the 63K messages classified as spam above, and manually clean the list to obtain a set of 324 high precision spam words.555The threshold 5 was set by manually inspecting the results obtained by selecting various values from 3 to 10 in order to obtain a high precision list of spam words. We then search for the occurrence of these spam words in the entire database of 420K frequently sent messages in Hindi, English, Telugu and Tamil . In each cluster where we find one or more of the spam words, the annotators examine the cluster and validate it as ‘spam’ or ‘ham’ using the same approach as above. Finally, we obtain a labelled dataset containing 295K spam (from 3.5K clusters) and 112K ham (from 3.2K clusters) messages as summarised in Table 2.

Message
type
From
Unique
messages
Total
messages
Spam Removed users 663 63K
Ham Removed users 341 5K
Spam Semi-automation 2.8K 232K
Ham Semi-automation 2.9K 107K
Table 2. Summary of the annotations set.

3. Characteristics of Spam Content

This section systematically analyses our data to better understand the nature of WhatsApp spam. We focus on its scale (§3.1) and the content shared, including URLs (§3.2.2).

Figure 1. Numbers of users spreading a message, indexed by number of times the message or its close variants are seen.

3.1. Understanding the scale of spam

(a)
(b)
(c)
Figure 5. (a) Spam Topics found in the top 100 clusters of spam (48% of all spam messages); (b) Overall fraction of URLs and phone numbers found in the content of messages; (c) Categories of URLs as identified by VirusTotal in messages.

We first consider two aspects of scale: the number of times a message is posted, and the number of users involved in spreading the message. Each message cluster obtained (see §2.3) may contain several messages that are closely related variants. Figure 1 groups message clusters into different buckets based on the number of times those messages are found in our data, across different WhatsApp groups. We then show how many users were involved in spreading the messages in that cluster.

We find that clusters containing spam messages have larger numbers of messages on average although the median numbers are similar (24 for spam clusters vs. 23 for ham clusters; compared to mean 83.6 in spam vs.

35 for ham). This indicates a highly skewed distribution of messages per cluster, particularly for spam clusters. For instance, there are over 37 clusters of spam with more than 1000 messages, the largest cluster having over 27K messages. In contrast, only 2 ham clusters have more than 1000 messages and the largest having 2,065 messages. These were videos URLs and associated text related to political speeches.

We also see that popular clusters, which contain messages that are forwarded many times, inevitably involve more users. However, Figure 1 shows that for messages in spam and ham clusters of similar sizes (i.e., in the same “bucket” size on the x-axis), the spam clusters are driven by fewer users. This indicates that spam is disseminated in a more proactive manner by a smaller set of individuals.

3.2. Understanding spam content

We next inspect the content of spam posts, including both the topics covered and the sharing of URLs.

3.2.1. Spam topics.

To explore the topics discussed within spam, we employ the three annotators to manually examine messages in the top 100 clusters of spam messages. By performing an initial qualitative analysis, we identify 10 core topics. We then ask the annotators to categorise the messages from the top 100 clusters into these 10 topics. The top 100 clusters comprise 48% (141K messages) of the total number of spam messages (255K) in our dataset.

Each message cluster is examined by at least one annotator and the category label applied was then checked by a second annotator. At least one of the two annotators was a native speaker of one of the four languages considered (Hindi, Telugu, Tamil or English). All differences of opinion were resolved by a bilateral discussion and we finally obtained 100% inter-agreement between annotators.

In addition to strategies followed by traditional social media spam — using click-bait or other techniques to take users outside the platform, WhatsApp spam also makes the use of phone numbers. Figure (a)a captures the relative frequencies of the topics, as well as the frequency of URLs and phone numbers within the messages. Each of the 10 topics is described below:

Job Advertisements.:

Comprising nearly 35% of our annotated spam, the most widespread spam is advertisements for jobs. Nearly all (99.6%) of job advertisements provide a contact phone number, or a phone number as well as a URL. Of course, these may be genuine job advertisements, although they are off-topic spam for political WhatsApp groups. Though we were not able to identify if these were genuine job ads, the template structure of these ads makes us believe these are spam and could involve a scam. Interestingly, over 97% of spam in Telugu language forums consist of job advertisements.

Click and Earn.:

These comprise 30% of spam messages, and ask users to click on a URL, promising a reward. 99% of these messages contain a URL, but no phone number.

Sales.:

These constitute 7.6% of spam and offer items for sale. 79% of these messages contain URLs and a phone number, and could be genuine items for sale.

Duta Bot.:

These are (benign) spam messages (7.3%) sent by a news bot service called Duta Bot (DUTA, 2018) comprising regular news or sports updates.

Referral and gifts.:

These spam messages (6.3%) offer a gift in return for referrals of users to an online service subscription, and consist mostly of a URL to click.

Political Survey.:

These (5.6%) mostly contain URLs that invite users to participate in political surveys. These are mostly benign and also partly on-topic as the WhatsApp groups we consider are political.

Adult.:

These (3.4%) mostly contain URLs that lead to adult websites or offer adult sex-related services.

Magic.:

These (1.8%) messages contain text which asks user to forward a message to experience something supernatural, e.g., “Forward and see magic: your phone battery will get charged to 100%”.

Medical.:

These (1.2%) messages offer treatment for common and sometimes embarrassing ailments, e.g., “ayurvedic treatment for piles”.

Other.:

Approximately 1% cannot be categorised into any of the above groups and consist of spams such as “daily event update”.

3.2.2. URLs and Phone Numbers.

As shown above, a significant number of spam messages contain URLs (167K messages) and phone numbers (74K messages). Figure (b)b compares the fraction of spam and ham messages that contain phone numbers, URLs or both. We see a marked difference between spam and ham, with nearly 90% of spam messages containing either a phone number, a URL or both (in contrast to just 36% for ham). We also notice 19.4K unique phone numbers present in the content of the messages (note that for ethical reasons, all numbers are one-way hashed before analysis). Out of this, only 9.5% of the numbers were found to be senders of any messages, and 85% of these were spam. Thus, most of the phone numbers given out in these public group relate to spammers.

We next take the 56.8% of spam and 27.5% of ham messages that contain URLs. To explore the nature of these URLs, we use VirusTotal666https://www.virustotal.com/ to classify each according to its type of activity (Ikram et al., 2019; Kim and others, 2015). For 81.1% of domains we identify the category associated with the domain. Figure (c)c shows the distribution of URL categories for both ham and spam. We see that video URLs are popular in ham messages, as well as newly registered sites (manual analysis reveals these are primarily news websites, e.g., upchaupal.com). In contrast, spam messages carry far more business and shopping URLs. Worryingly, 1% of the messages carry URLs marked as ‘elevated exposure’, e.g., apkmaster.xyz. To explore this further, we run all URLs through the 73 antivirus engines, listed by VirusTotal. Overall, 26.1% of URLs in spam messages are tagged as dangerous by 3 or more antivirus engine, and 15.2% are even tagged by 9 or more e.g., trycryptocoins.com, amazon.bigest-sale-live.in. In contrast, only 1.5% of ham were tagged as malicious by 3 or more engines (0.42% by 9+ engines). This, indeed, confirms the risks associated with allowing spam messages to spread unabated.

In addition, nearly 30% of the URLs we categorise as spam are marked as ‘uncategorized’ by VirusTotal. Digging deeper, we find that 51.2% of the domains hosting these URLs are no longer active, and of the ones that are active 62.2% were new domains registered  within one year of our data collection. This suggests that the uncategorised URLs may actually be new domains bought by spammers for the purpose of spam.

Exacerbating the above, we also find that 16% of URLs in spam are from shortening services such as bit.ly and goo.gl. These are almost exclusively used in spam messages (96.44% of short URLs occur in spam), indicating that they are likely being used to mask the actual domain names being used, thereby increasing risk.

4. Spammers and their Actions

Next we look at the users who have produced spam in terms of their locations, temporal patterns and group membership.

4.1. Operational definition of spammers

Our methodology identifies spam messages by their content rather than spammers directly. Some spam messages may be inadvertently posted by enthusiastic or naïve users who do not realise it is spam. Figure 6 plots the fraction, , of messages by a user that are marked as spam with respect to total messages. As expected, this follows a bimodal pattern, with spammers on one end (nearly 100% of their messages are spam) and non-spammers (“hammers”) at the other end (with almost no spam messages). This suggests that users with a spam fraction beyond any reasonable threshold such as will capture all intentional spammers. In this section, we adopt an operational definition of spammers as any user who has posted more than messages that our methodology identifies as spam (our results are robust to other similar thresholds). Using this methodology, we identify 17.6K users as spammers and 32.9K as non-spammers who share a total of 239K and 1.3M messages, respectively.

Figure 6. Fraction of messages of a user marked as spam.

4.2. Spammer locations

We first use the phone number country codes to geolocate all users. Figure 7 presents the results for both spammers and hammers. Unsurprisingly, the majority have Indian country codes, although we also see a range of other countries. In most cases, these third party countries tend to be primarily spammers. Most striking is Russia, which has 6823 spammers yet no hammers. These users exclusively post spam content (9029 messages spread across 334 text clusters). The top 59 spam message clusters posted by these users cover 94% of all Russian posts, showing highly repetitive (or even coordinated) posting. 96% of these messages have URLs and 74% messages contain the word ‘sex’. Similar patterns are seen in phone numbers from other countries, albeit on a smaller scale (e.g., Romania, Cameroon, Kyrgyzstan). Note that the analysis is based on phone numbers from these countries and may not indicate that the spammers were based in these countries. Spammers could have bought phone numbers in these countries online.

Figure 7. Country codes used by spammers vs. hammers

4.3. Longevity patterns of spammers

We define a cluster of lexically close spam messages (specifically, messages that map to the same LSH cluster in our pre-processing) as a spam campaign and ask whether there are longer time-scale patterns, or focused campaigns in spreading the same messages. We term any day when at least 10 messages relating to a campaign are posted as an active day for the campaign.

We start by inspecting the lifetime of spam vs. ham messages, in terms of the difference between their first and last occurrence, shown in Figure 8. We see that spam campaigns have consistently longer lifetimes than ham messages. The median campaign duration for spam is 29 days compared to 12 days for ham. This means the same messages are sent over a longer period, potentially allowing better capture of attention (Anand and Sternthal, 1990; Schumann and others, 1990). Interestingly, we see the opposite trend when computing the lifetime of accounts. Here, we compute the lifetime of a user as the difference between their first and last post. As shown in Figure 8, non-spam users have substantially longer lifetimes than their spammer counterparts.

Figure 8. Cumulative Distribution Function (CDF) of lifetimes of spam and ham message clusters and users.
Figure 9. Top: Number of times a day that a spam message is posted, for 15 exemplar campaigns. Bottom Left: Fraction of days that are ‘active’ ( 10 spam messages sent) during the lifetime of multi-day spam campaigns. Bottom Right: Fraction of spam phone numbers used during active days.

The above leads us to explore the daily characteristics of these spam campaigns. Figure 9 presents (at the top) the number of messages sent per-day for the top 15 spam campaigns (ranked by number of times the message is posted). These campaigns occurs across multiple days and are highly focused, with aggressive peaks on a small number of days.

To generalise this across all spam campaigns, Figure 9 (bottom left) also plots the number of active days seen during the lifetime of a campaign. This shows that most campaigns focus efforts on relatively few active days: On average less than half the days during the lifetime of a spam campaign are active with 10 or more messages. This suggests spammers take a staggered approach, rather than issuing spam messages every day. Despite this, there are a notable set of highly active campaigns where messages are sent on most days: 25% of campaigns involve sending messages on at least 80% of the days during their lifetime.

We also note that these campaigns span multiple phone numbers. Figure 9 (Bottom Right) shows the fraction of the phone numbers involved in a campaign that are active each day. On average, under 20% of the phone numbers are involved on any active day, suggesting that spammers may be multiplexing their efforts across multiple phone numbers progressively. This may be because individuals purchase multiple SIM cards to maximise their reach and mitigate the risk of removal or, alternatively, because multiple individuals are spreading the spam. Regardless, this may make detection more challenging.

4.4. Joining and leaving groups

Figure 10. Relative proportions of join and leave actions.

We next examine how users (spammers and others) join and leave different groups. Note that users may be added by another user, by an admin or they can join with an invite link. Figure 10 presents the fraction of spammers vs. non-spammers who join using these techniques. A clear difference exists: it is much more common for spammers to join WhatsApp groups via a link. Spammers also comprise a disproportionate fraction of users who are ‘removed’ from a group (this action is usually undertaken by admins when users violate the norms of a group).

Actions Joined via link Added Added by admin
Left 75%(62%) 17%(20%) 8%(18%)
Number changed 74%(37%) 12%(28%) 14%(35%)
Removed 80%(48%) 5%(18%) 15%(34%)
Table 3. Relation between leaving and joining methods for spammers (equivalent numbers for hammers in parenthesis).

Table 3 examines more closely how users leave groups and how these actions are related to the method they used to join. Users who leave the group by any method are overwhelmingly likely to have joined via an invite link, although this is noticeably higher for spammers than hammers. This suggests that such users are less involved in the group. Note that users who have been added by a group admin are the least likely to have left the group, but a fair number of those added by an admin are also later forcibly ejected.

We can further inspect what users do after joining. After joining via a link, nearly 40% of spammers (30% of non-spammers) post URLs. 18% (5%) post messages with a phone number, and 21% (21%) simply leave. Again for spammers, after posting a spam message with a URL, 86% of the time the next action is to post another spam message with a URL. We also check the actions of spammers immediately before they are removed from a group. We find that 54% of the time they posted a spam message with a URL, and 19% of times they post a spam messages with a phone number in it. In total, 73% of user removals by admins are immediately after the user posts a spam message.

5. Spam Mitigation Strategies

In this section, we examine how we can use the findings in §3 and §4 to mitigate spam. Specifically, we look at classification based on both the text content and group metadata (which is compatible with end-to-end encryption).

Model Accuracy F-1 Recall Precision
Content (SpamAssasin) 87% 0.86 0.8 0.92
Content (Word Embedding) 87.5% 0.88 0.87 0.89
Metadata (All action+ISD) 90% 0.83 0.8 0.86
Content+Action per group 86% (mean); sd=0.18 0.63 (mean); sd=0.42 0.63 (mean); sd=0.44 0.64 (mean); sd=0.43
Table 4. Spam detection performance.

5.1. Content-based spam detection by device

We start with the assumption that any individual end user only has limited visibility into the global actions of a spammer, based on which groups they join. Therefore, we look at strategies that can locally evaluate each individual spam message on its own. The idea is that a pre-trained model could run on each end device and locally filter spam.

To test this, we use Apache SpamAssassin (Mason, 2002), which is widely used for mitigating email spam. We use the default SpamAssassin, with the following additional plugins: Bayes, a Naïve Bayes classifier, and Pyzor which checks a realtime database of spam-related URLs, based on Vipul’s Razor (De Guerre, 2007).

We make several minor adjustments to fit the WhatsApp use case rather than email: SpamAssassin treats some signals, such as the absence of a subject, as a sign of spam. Therefore, we take the first 80 characters of a message as the “email subject” as this is similar to the size of text shown in the balloon representing each WhatsApp message. We also convert each sender’s phone number to an email address-like format. Further, some of the signals used by SpamAssassin (e.g., email headers) are not relevant in the context of WhatsApp, so we ignore their outputs.

With this configuration, we train SpamAssassin’s Naïve Bayes classifier on a balanced dataset of 800 randomly selected messages (400 spam, 400 ham). We then test its performance on a separate balanced set of 800 messages, obtaining a promising F1 score of 0.86 (see Table 4). Note that we also experimented on 10, 200, 400, 800, 1000 and 2000 item datasets, but find that 800 performs best; others have made similar observations for spam, attributing it to issues with over-fitting (Massey and others, 2003).

To gain insight into the most predictive features, Figure 11 presents the “spam score” associated with each feature. This, offered by SpamAssassin, measures the importance of each feature in classifying the spam. Mirroring our earlier findings, we see that URLs that are listed on Pyzor, the use of URL shorteners, and sales related terms such as “Guaranteed 100 percent” play important roles. This confirms that content-based classification of WhatsApp spam is feasible.

We also test the content based model performance using state-of-the-art word embedding methods. To do this we make use of MuRIL (Multilingual Representations for Indian Languages) (Khanuja and others, 2021) based word embedding features. With MuRIL, for each message in the 7k unique messages dataset that we label (Table 2

), we obtain embedding vectors of size 768, as described in their Tensor Flow Hub examples 

(Khanuja and others, 2021)

. Using these 768 values as features, we then run three models: Logistic Regression, SVM and Random Forest with an 80:20 train:test split. We find that the Random Forest model achieves the highest accuracy and F1-score. The accuracy of the Random Forest model on the embeddings (87.5%) is nearly the same as SpamAssassin (87%). Given the wide prevalence and usage of SpamAssassin in emails, including the availability of databases of spam URLs such as Vipul’s Razor 

(De Guerre, 2007) that are constantly being updated, we therefore advocate the usage of SpamAssassin for the WhatsApp context as well.

Figure 11. Contribution of different features of SpamAssassin to the overall classification of spam

5.2. Metadata-based spam detection by platform

A limitation of the above strategy is that training and classification requires access to raw text content. This, however, is challenging for the platform to implement due to WhatsApp’s end-to-end encryption. Thus, models cannot easily be trained centrally, and inference must take place on the recipient’s end device. On-device ML may be challenging on low power mobile phones common in countries like India.

With this in mind, we next experiment with an alternative approach that can be computed centrally by the WhatsApp platform, without access to text content. Our key insight is that although content is encrypted, the actions of the users, such as joining or leaving groups are still visible. That said, some of the most important features are use of phone numbers and URLs, which are contained in over 90% of spam messages (§3.2.2). Thus, we propose a simple modification wherein each sender’s WhatsApp client encodes a 2-bit signal on whether the message sent contains a phone number, a URL, both or neither. The truthfulness of this signal can be verified by the recipients after decrypting, and the signal (though not the actual phone number or URL) can be made visible to the platform without compromising privacy.

Using this, we centrally build user profiles upon which classification can be performed. This contains each user’s actions per-group as a vector of counts for the different types of actions (e.g., number of times they joined/left a group). Table 5 (column 1) provides the full list of features. Using these features, we then train777

Note that we tried a number of different methods including SVM, kNN as well as random forest with

trees. Due to space constraints, we report only the best performing classifier (Random Forest with 50 trees) here. a Random Forest Classifier using 50 trees to identify spammers (using sklearn). As a dataset, we consider all users with at least 2 actions. This leaves us with 47K user profiles by 15K spammers and 32K non-spammers across 3.6K groups. We perform a random 80:20 split for training and testing.

Our metadata-based classifier achieves similar performance to that of content-based modelling, with an F1-score of 0.83 and accuracy of 0.9 (Table 4). We can also compare against an alternative model where our classifier does not have access to the 2 bit signal proposed above (this prevents the classifier from checking the presence of URLs or phone numbers). This decreases the F1-score to 0.67 and the accuracy to 0.82, indicating that even without this adaptation, our approach still can filter the majority of spammers.

Table 5 presents the feature importance scores (column 2 and column 3). As noted in §3 and §4, the most important feature is the number of messages posted, followed by the use of a non-domestic phone number. In cases where the 2-bit signal is not available the use of international phone number becomes the most important feature.

Feature With 2-bit signal No 2-bit signal
Posted simple message 0.52 0.37
Non-domestic number 0.15 0.42
Posted URL 0.12 N/A
Joined via link 0.08 0.075
Posted phone number 0.05 N/A
Left group 0.04 0.05
Added by member 0.023 0.03
Added by admin 0.021 0.025
Removed from group 0.01 0.015
Number changed 0.003 0.003
Table 5. Feature importance of Random Forest Classifier to separate non-spammers and spammers.
Figure 12. CDFs of metrics for per-group models.

5.3. Content & metadata detection by device

Finally, we revisit the idea of performing local classification on the end device, such that we can use both content and metadata features without undermining end-to-end encryption. For each user in each group, we construct a local on-device profile. This user profile contains all the features shown in Table 5, as well as the fraction of messages sent that are tagged as spam by SpamAssassin. Importantly, these features can be locally constructed by any user in the group and do not require central computation.

We then train a local Random Forest model on a per-group basis (using the 3/4 of groups in our data that contain at least one spammer). Here, we assume that supervised labels for local training can be initially obtained by user tagging (e.g., by the group admin). Our results show that only a small number of training samples would need to be collectively tagged by the admin and/or end users (a few hundred).

In our experiment, we use 80% of users in each group for training, and 20% for testing. These per-group models, which can be computed on any end device in the group, perform well: Table 4 summarises the results. We obtain 86% mean accuracy across groups, which is close to the global model (0.9). Figure 12 shows the distribution of performance scores across each WhatsApp group. We see that a minority of groups obtain poor performance, largely due to the small number of users. For groups that gain under 60% accuracy, we see just 300 messages and 35 users on average. Such groups contain insufficient data to effectively train the local models. That said, 77% of groups obtain accuracy exceeding 75%. These tend to have larger populations, allowing end devices to better learn. Future work could also involve selectively sharing data between groups to assist in training local models that apply to multiple groups.

5.4. Limitations

We note that the above classifiers are not robust against major changes in spammer strategies (e.g., use of QR codes rather than URLs) or changes to how WhatsApp operates (e.g., disabling joining via links). However, we emphasise that our feature set can be expanded, and our models can easily be retrained in response to such spammer adaptations.

6. Related Work

Although there have been extensive studies of spam on social media (Stringhini et al., 2010; Gao and others, 2010; Yardi et al., 2010), email (Cormack, 2008) and SMS (Pervaiz and others, 2019), spam has not yet been studied on WhatsApp at scale, except for anecdotal observations (Thread-Reader, 2020; Ananth,Venkat and Sharma, 2019; Pathak, 2019).

There have been prior studies on the misuse of messaging services such as SMS fraud in Pakistan (Pervaiz and others, 2019). There are also a small set of related works looking specifically at the dissemination of malicious content via WhatsApp groups. These, however, are so far focused on misinformation. Resende et al. (Resende and others, 2019) investigated the dissemination of misinformation during the Brazilian elections. The authors also studied the impact of introducing limits on message forwarding (de Freitas Melo and others, 2019). In another similar study of Brazilian elections, Victor et al. (Bursztyn and Birnbaum, 2019) find partisan activities in these political groups.

As well as studying the presence of misuse, there have been a number of works attempting to detect and mitigate unsolicited spam campaigns, primarily via emails or social media. For instance, Xiao et al. (Xiao et al., 2015)

used supervised learning methods to detect groups of fake spam accounts on social media. Boykin and Roychowdhury 

(Boykin and Roychowdhury, 2005) investigated the use of social graphs to filter spam. There are various content-based approaches to detecting spam too (Ballı and Karasoy, 2018; Almeida et al., 2011; Ma and others, 2016). These tend to rely on building document models and training machine classifiers (e.g., SVMs) to detect spam messages. In the case of WhatsApp, this can be problematic due to its use of end-to-end encryption. Hence, Reis (Reis et al., 2020) proposed an architecture to flag misinformation in WhatsApp without breaking end-to-end encryption. Unlike our approach, this relies on a manually annotated set of image hashes. Finally, as well as text-based spam, tools for detecting unwanted videos in social media have been developed (Benevenuto and others, 2009).

7. Conclusions

This paper has presented the first study of spam in public WhatsApp groups. We have gathered 2.6 Million messages from 5,051 public politics-related groups in India, and analysed the content, URLs and temporal patterns of spam. We find that spam is commonplace on WhatsApp, and spammers tend to post across a large number of groups. Spammers also exhibit interesting patterns of leaving and rejoining groups multiple times, to avoid being removed by admins. We further find evidence of spammers coordinating campaigns — spreading the same spam message (or close variants) over a small number of ‘active’ days. These strategies may help improve the visibility of spam by providing a longer ‘shelf life’ in the recent messages.

Our results have clear implications for spam detection. For example, a key indicator of spam is the presence of particular URLs and phone numbers. We have shown that this can be used as part of automated detection, and demonstrate that existing solutions (such as SpamAssassin) can be retrained to effectively detect WhatsApp spam. However, as WhatsApp uses end-to-end encryption, such information cannot be accessed by the platform. We have therefore proposed techniques that can be used by the end device or the platform (centrally) to identify spammers, whilst still respecting end-to-end encryption guarantees.

We have a number of future lines of work. We have observed that spam campaigns tend to encompass multiple groups and, thus, we argue that a more rigorous network-based analysis of the dissemination could be useful. We also wish to inspect further the nature of the spam being sent. We have already performed an analysis of URLs, however, we plan to collect further data on the websites these URLs host. We posit this will further support our detection work and hope that this could further feed into spam mitigation efforts on WhatsApp.

To aid reproducibility, our annotated dataset (§2.4) is publicly available for researchers888More information is available at: tiny.cc/netsys-whats-app

. We will also make available our machine learning models (§

5) for non-commercial usage.

References

  • T. A. Almeida, J. M. G. Hidalgo, and A. Yamakami (2011) Contributions to the study of sms spam filtering: new collection and results. In Symposium on Document engineering, Cited by: §6.
  • P. Anand and B. Sternthal (1990) Ease of message processing as a moderator of repetition effects in advertising. J. Mktng. Rsrch.. Cited by: §4.3.
  • Ananth,Venkat and S. Sharma (2019) On instagram, in india, it’s sex for sale. Note: The Economic Times External Links: Link Cited by: §6.
  • S. Ballı and O. Karasoy (2018)

    Development of content-based sms classification application by using word2vec-based feature extraction

    .
    IET Software. Cited by: §6.
  • S. Banaji et al. (2019) WhatsApp vigilantes: an exploration of citizen reception and circulation of whatsapp misinformation linked to mob violence in india. Dept. of Media and Communications, LSE. Cited by: §2.2.
  • F. Benevenuto et al. (2009) Detecting spammers and content promoters in online video social networks. In ACM SIGIR, Cited by: §6.
  • M. Black (2020) Most common whatsapp scams. Note: bit.ly/spam-2020 Cited by: §2.1.
  • P. O. Boykin and V. P. Roychowdhury (2005) Leveraging social networks to fight spam. Computer. Cited by: §6.
  • V. S. Bursztyn and L. Birnbaum (2019) Thousands of small, constant rallies: a large-scale analysis of partisan whatsapp groups. In ASONAM, Cited by: §6.
  • C. Cao and J. Caverlee (2015) Detecting spam urls in social media via behavioral analysis. In ECIR, Cited by: §1.
  • A. Chang (2020) Networks in a world unknown: public whatsapp groups in the venezuelan refugee crisis. Princeton University. Cited by: §1.
  • G. V. Cormack (2008) Email spam filtering: a systematic review. Now Publishers Inc. Cited by: §6.
  • de Freitas Melo et al. (2019) Can whatsapp counter misinformation by limiting message forwarding?. In Conf. Compl. Netw. and Appl., Cited by: §6.
  • J. De Guerre (2007) The mechanics of vipul’s razor technology. Network Security. Cited by: §5.1, §5.1.
  • DUTA (2018) Duta.in: bringing the internet to the next billion. Note: https://duta.in/index.php Cited by: §2.3.2, item Duta Bot..
  • H. Gao et al. (2010) Detecting and characterizing social spam campaigns. In Proc. ACM SIGCOMM IMC, Cited by: §6.
  • K. Garimella and D. Eckles (2020) Images and misinformation in political groups: evidence from whatsapp in india. Harvard Misinformation Review. Cited by: §2.2.
  • K. Garimella and G. Tyson (2018) Whatapp doc? a first look at whatsapp public group data. In ICWSM, Cited by: §1, §2.2, §2.2.
  • A. Gionis, P. Indyk, R. Motwani, et al. (1999) Similarity search in high dimensions via hashing. In Vldb, Cited by: §2.3.2.
  • A. Gupta, S. K. Singh, K. Ahuja, and A. Gupta (2019) Good morning turning to spam morning. In ICICCT, Cited by: §2.3.1.
  • M. Ikram, R. Masood, G. Tyson, M. A. Kaafar, N. Loizon, and R. Ensafi (2019) The chain of implicit trust: an analysis of the web third-party resources loading. WWW. Cited by: §3.2.2.
  • M. Jones (2017) How whatsapp reduced spam while launching end-to-end encryption. Note: usenix.org/conference/enigma2017/conference-program/presentation/jones Cited by: §1, §2.1.
  • S. Khanuja et al. (2021) MuRIL: multilingual representations for indian languages. Note: Arxiv, tfhub.dev/google/MuRIL/1 Cited by: §5.1.
  • D. W. Kim et al. (2015) Detecting fake anti-virus software distribution webpages. Computers & Security. Cited by: §3.2.2.
  • F. Klien and M. Strohmaier (2012) Short links under attack: geographical analysis of spam in a url shortener network. In HT, Cited by: §1.
  • C. Lokniti (2018) How widespread is whatsapp’s usage in india?. Note: Live Mint Cited by: §1, §2.2.
  • J. Ma et al. (2016) Intelligent sms spam filtering using topic model. In Intl. Conf. on Intelligent Netw. and Collab. Systems (INCoS), Cited by: §6.
  • J. Mason (2002) Filtering spam with spamassassin. In HEANet Annual Conference, Cited by: §1, §5.1.
  • B. Massey et al. (2003) Learning spam: simple techniques for freely-available software.. In USENIX ATC, Cited by: §5.1.
  • L. Mullen (2015) Textreuse: detect text reuse and document similarity. rOpenSci. Cited by: §2.3.2.
  • N. Newman, R. Fletcher, A. Kalogeropoulos, and R. K. Nielsen (2019) Reuters Institute Digital News Report 2019 . Cited by: §1, §2.2.
  • P. Pathak (2019) WhatsApp is banning 2 million accounts every month. Note: India Today External Links: Link Cited by: §6.
  • F. Pervaiz et al. (2019) An assessment of SMS fraud in pakistan. In Proc. ACM CCS, Cited by: §6, §6.
  • E. M. Redmiles, N. Chachra, and B. Waismeyer (2018) Examining the demand for spam: who clicks?. In Proc. CHI, Cited by: §1.
  • J. C. Reis, P. Melo, K. Garimella, and F. Benevenuto (2020) Can whatsapp benefit from debunked fact-checked stories to reduce misinformation?. Harvard Misinformation Review. Cited by: §6.
  • G. Resende et al. (2019) (Mis) information dissemination in whatsapp: gathering, analyzing and countermeasures. In WebConf, Cited by: §6.
  • A. Rosenfeld, S. Sina, D. Sarne, O. Avidov, and S. Kraus (2018) WhatsApp usage patterns and prediction of demographic characteristics without access to message content. Demographic Research. Cited by: §1.
  • D. W. Schumann et al. (1990) Predicting the effectiveness of different strategies of advertising variation: a test of the repetition-variation hypotheses. J. Consumer Research.. Cited by: §4.3.
  • G. Stringhini, C. Kruegel, and G. Vigna (2010) Detecting spammers on social networks. In CCS, Cited by: §6, footnote 3.
  • K. Thomas et al. (2011) Design and evaluation of a real-time url spam filtering service. In IEEE S&P, Cited by: §1.
  • Thread-Reader (2020) Fake flipkart website. Note: bit.ly/fake-FK Cited by: §6.
  • D. Wang et al. (2013) Click traffic analysis of short url spam on twitter. In Intl Conf. on Collab. Comput., Cited by: §1.
  • C. Xiao, D. M. Freeman, and T. Hwa (2015) Detecting clusters of fake accounts in online social networks. In Wksp. AI & Security, Cited by: §6.
  • S. Yardi, D. Romero, G. Schoenebeck, et al. (2010) Detecting spam in a twitter network. First Monday. Cited by: §6.