Detecting Misinformation on WhatsApp without Breaking Encryption

06/03/2020 ∙ by Julio C. S. Reis, et al. ∙ MIT Universidade Federal de Minas Gerais 0

The popularity of smartphone messaging apps like WhatsApp are revolutionizing how many users communicate and interact with the internet. Characteristics such as the immediacy of messages directly delivered to the user's phone and secure communication through end-to-end encryption have made this tool unique but also allowed it to be extensively abused to create and spread misinformation. Due to the private encrypted nature of the messages it is hard to track the dissemination of misinformation at scale. In this work, we propose an approach for WhatsApp to counter misinformation that does not rely on content moderation. The idea is based on on-device checking, where WhatsApp can detect when a user shares multimedia content which have been previously labeled as misinformation by fact-checkers, without violating the privacy of the users. We evaluate the potential of this strategy for combating misinformation using data collected from both fact-checking agencies and WhatsApp during recent elections in Brazil and India. Our results show that our approach has the potential to detect a considerable amount of images containing misinformation, reducing 40.7



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Social media platforms have dramatically changed how people to consume and share news. An individual user can reach as many readers as other traditional media nowadays [1]. Also, social communication around news is becoming more private as messaging apps continue to grow around the world. With over 1.5 billion users, WhatsApp plays an important role in this conjecture as it has become a primary network for discussing and sharing news in countries like Brazil and India where smartphones’ use for news access is already much higher than other devices, including desktop computers and tablets [14].

Amid this massive flow of information with more than 55 billion messages a day, of which 4.5 billion are images111, a large amount of misinformation is posted on this network without any moderation. Several works have already shown how misinformation has negatively affected the democratic discussion in some countries [13, 17] and even lead to violent lynchings [2].

Unlike social platforms such as Twitter and Facebook, which can enforce moderation, the end-to-end encrypted (E2EE) structure of WhatsApp creates a very different scenario where this is not possible. Only the users involved in the conversation have access to the content shared, shielding abusive content from being removed. The key challenge is to fight misinformation in WhatsApp keeping it as a secure communication channel based on end-to-end encryption.

In this work we propose a moderation methodology in which WhatsApp can automatically detect when a user shares images and videos which have previously been labeled as misinformation, similar to how Facebook would flag content for fake news222 without violating the privacy of the user and compromising the E2EE within the messaging service. The solution is based on having hashes of previously fact-checked content on the device of the user, which can be quickly checked before the content is encrypted. We evaluate the potential of this strategy of combating misinformation using data collected from both fact-checking agencies and WhatsApp during recent elections in Brazil and India. Our results show that our approach has the potential to detect a considerable amount of images containing misinformation, reducing 40.7% and 82.2% of their shares in Brazil and India, respectively.

Figure 1: Proposed architecture.

Background and Motivation

The emergence of E2EE based communication on WhatsApp has provided a new channel to smartphone users, which is seen as significantly more private and secure than other social media platforms like Facebook [18]. WhatsApp’s E2EE is an important element that ensures user privacy and security, mainly during critical and crisis events. Malka et al.  [9] demonstrate this importance of WhatsApp in the lives of its users during the wartime in Israel. They show the multi-functional role of the messaging app, functioning as a mass as well as an interpersonal communication channel. Facebook, WhatsApp’s parent company has tried to implement this model across other Facebook properties like Messenger, which has lead WhatsApp and encryption to become the subject of public, media, and political discourse, with some governments arguing that WhatsApp needs to provide legal backdoors for security and law enforcement purposes333 or enable tracing the source of problematic messages.444 On the other hand, privacy advocates and WhatsApp argue that such a move would completely break E2EE and hence a threat to protecting users’ information. They argue that because of E2EE, moderating content is a challenging task on WhatsApp since any attempt to look at the content would compromise the security of the communication.

In this paper, we show that there is a simple way to find harmful messages without violating user privacy or creating a threatening surveillance backdoor. Automatic classification through machine learning, users reporting messages and repositories of popular content are means to stop misinformation that are compatible with E2EE, as pointed by recent reports 

[6, 10].

Our proposed solution was designed with three main considerations: (i) It should be easy to implement, by allowing WhatsApp to port our solution to their existing infrastructure without much changes; (ii) it must be flexible, be able to detect as much misinformation as possible, at scale, and adapt to the ever-changing trends in misinformation creation; and, (iii) it must not compromise the end-to-end encryption services that WhatsApp provides. Based on these considerations, we propose our solution, that cover parts of the spectrum, basing on two key ingredients:

1. A database of previously fact-checked content: Since Facebook already has partnerships with several fact-checking agencies around the world, such a database is not hard to obtain. Moreover, Facebook also collects media items reported as problematic images (misinformation, hateful, etc) through its internal review processes.

2. Algorithms for hashing and matching similar media content: A hashing algorithm provides a signature to represent an image or video. Given the exact same content, the hashing algorithm produces the same hash. Multiple types of hash functions exist to achieve this goal. In this work, we are primarily interested in two types of hash functions: a. Cryptographic hash, b. Perceptual hash. A cryptographic hash is a one way hash function based on techniques like MD5 or SHA, and produces a string hash given an image. However, even changing a single pixel in the image changes the hash completely. Hence cryptographic hashes can be used to only detect exact matches. On the other hand, perceptual hashing takes care of the drawbacks of a cryptographic hash and produces a hash that can be used to compare similar images. Even if the image is slightly rotated, cropped or has text added, a good perceptual hashing technique can produce a hash that is similar to the original image. There are multiple algorithms to produce perceptual hashes such as Facebook PDQ Hashing, pHash, Microsoft PhotoDNA 555,, and, etc. Perceptual hashing is already widely used today for detecting known harmful content [4] and authentication of images [20].


An overview of the proposed architecture is shown in Figure 1 can be explained in the following steps: (i) WhatsApp maintains a set of hashes of images which have been previously fact-checked, either from publicly available sources or through internal review processes. (ii) These hashes are shipped with the WhatsApp app, storing it on a user’s phone. This step can be periodically updated based on images that Facebook’s moderators have been fact-checking on Facebook, which is much more openly accessible. This set could be condensed and efficiently stored using existing probabilistic data structures like Bloom Filters [19]. (iii) Once a user intends to send an image, WhatsApp checks whether it already exists in the hashed set on the user’s device. If so, a warning confirmation is displayed, asking if the user really wants to share this content. (iv) The message is encrypted and transferred through the usual E2EE methods. (v) When the recipient user receives the message, WhatsApp decrypts the image on the phone, obtains a perceptual hash and also checks it on hashed set on the receiver’s end. (vi) If it already exists, the content is flagged, and a warning is shown to the user indicating that the image could be potential misinformation. Also, providing information about where the image was fact-checked; and in addition, also prevent the image from being forwarded further.

This architecture requires changes in WhatsApp, as it introduces a new component containing hashes stored on the phone and also checking images. It provides high flexibility and the ability to detect near similar images, hence increasing the coverage and effectiveness in countering misinformation. This architecture also fully abides by the current E2EE pipeline WhatsApp has, where WhatsApp does not have access to any content information.666 All the matching and intervention is done on the device without the need for any aggregate metadata in the message. Facebook could optionally keep statistics of how many times a match occurred to establish the prevalence and virality of different types of misinformation and to collect stats about users who repeatedly send such content. Note that similar designs have been proposed recently in informing policy decisions in light of governments requesting a backdoor in the encryption [6, 10].

It is important to mention that while WhatsApp messages are secure in transit, the endpoint devices, such as smartphones and computers, do not offer security. In this sense, our architecture adds new components to the client, adding also more potential for security breaches.

We say that our solution is practical, and deployable because it is an industry-standard to detect unlawful behavior in social media platforms [4]. For example, WhatsApp scans all unencrypted information on its network such as user/group profile photos, and group metadata for unlawful content such as child pornography, drugs, etc. If flagged, these are manually verified [3] and the abusing accounts are banned. Our proposal extends the same methodology to the user’s device in order to enable private detection.

The method works to prevent coordinated disinformation campaigns that are particularly important during elections777 and other high profile national events888, but also stops basic misinformation, where a lack of awareness leads to spreading. For instance, while manually labeling the fact-checked misinformation images, we observed that roughly 15% of the images in our data were related to false health information. These are forwarded mostly with the assumption that they might help someone in case they are true. In some cases (e.g. the child kidnapping rumors [2]), such benign forwarding of misinformation lead to violence and killing999


In order to evaluate the practical potential of the proposed architecture, we need a large dataset from WhatsApp containing misinformation and a large dataset from fact-checkers identifying which content is fake. In this section, we explain how we gathered (i) a dataset from public WhatsApp groups discussing politics from Brazil and India, and (ii) a dataset of fact-checked misinformation images from publicly available fact-checking websites.

WhatsApp Data. To gather the data explored in this work we use available tools [5] to get access to messages posted on public WhatsApp groups. We selected over 400 and 4,200 groups from Brazil and India, respectively, dedicated to political discussions. The period of data collection for both countries includes the respective national elections in these countries. Public groups have been shown to be well used in both countries [14, 8] and contain a large amount of misinformation [17, 11]. For this work, we choose to filter only messages containing images. The dataset used in this work is publicly available [16] and can be found in the following link: The dataset overview and the total number of users, groups and distinct images are described in Table 1. Note that the volume of content for India is ten times bigger than Brazil.

#Users #Groups
Time Span
Brazil 17,465 414 4,524 34,109 2018/08 - 2018/11
India 63,500 4,250 509k 810k 2019/02 - 2019/06
Table 1: WhatsApp collection.

Fact-checked Images. To collect a set of misinformation that already spread we obtained images that were fact-checked in the past by fact-checking agencies for each country. First, we crawl all images which were fact-checked from popular fact-checking websites from,,,,, and and,,,,,, For each of these images, we also obtained the date when they were fact-checked. Second, we used Google reverse image search to check whether one of the main fact-checking domains were returned when searching for an image in our database. If so, we parsed the fact-checking page and automatically labeled the image as fake or true depending on how the image was tagged on the fact-checking page [17]. In total, we collected over 100k fact-checked images from Brazil and about 20k images from India.

Next, we used a state-of-the-art perceptual hashing based image matching technique, PDQ hashing121212, to look for occurrences of the fact-checked images in our public groups data. The PDQ hashing algorithm is an improvement over the commonly used pHash [21] and produces a 256 bit hash using a discrete cosine transformation algorithm. PDQ is used by Facebook to detect similar content is the best known state-of-the-art approach for clustering together similar images. The hashing algorithm can detect near similar images, even if they were cropped differently or they have small amounts of text overlaid on them.

Finally, not all images which are fact-checked contain misinformation. To make sure our dataset was accurately built, we manually verified each image that appears in both the fact-checking websites and in the WhatsApp data.

As shown in Table 2 our final dataset of images previously fact-checked contains 135 images from Brazil and 205 images from India, which were shown to contain misinformation. It is important to highlight that many checking agencies do not post the actual image that has been disseminated. Often only altered versions of the image are posted and other versions of the false story are omitted to avoid contributing to the spreading of misinformation. This leads to us to have a small number of matches compared to the total number of fact-checked images we obtained, but that is sufficient to properly investigate the feasibility of the proposed architecture. Direct contact with the fact-checking agencies, like Facebook already does, could increase the size of the fact-checked set much more. Note that even though the set of fact-checked images was small, the fact that these images have been fact-checked means that they were popular and spread widely. Table 2 shows a summary of the fact-checked images and their activity in our dataset. While similar perceptual hashes are able to identify more than a hundred images in both countries, using just exactly the same hash to find misinformation, only 5,1% of checked images on Brazil were retrieved and 40% of Indian images.

Potential Prevention of Misinformation

In this section, we evaluate the potential prevention of misinformation in case our architecture was implemented, and the spreading of these images were totally blocked immediately after the fact-checking happens. For this, we computed the timestamp of all the fact-checked images and the occurrence of these images in our WhatsApp data. This way, we are able to measure how many posts were done for each misinformation image before and after the first fact check of this image.

Images found
100% Exact
Max Shares
Brazil 135 7 2,209 40.7 96
India 205 83 2,944 82.2 1,089
Table 2: Amount of misinformation image shared on WhatsApp and comparison of shares before and after the checking date of fact-checking agencies.

Figure 2

shows the cumulative distribution function (CDF) of the number of shares done before and after the checking of the misinformation images. We can observe in both countries that for the most broadly shared images there are as many posts before as after the checking date. Moreover, for India, there are more shares after checking than before and there are even images with up to 1,000 shares after fact-checking while the maximum shares before do not exceed 100.

Summing all shares, we find that 40.7% of the misinformation image shares in Brazil and 82.2%131313

This number drops to 71.7% if we remove the outlier image with the maximum number of shares, as it was shared over 1000 times.

of the shares in India could have been avoided by flagging the image and preventing it from being forwarded after being fact-checked. This demonstrates the importance of using fact-checking agencies to combat misinformation on WhatsApp highlighting the potential of our proposed approach.

(a) Brazil
(b) India
Figure 2: Cumulative distributions of shares misinformation images before/after fact-checking on both countries.


In this work, we propose a practical solution that WhatsApp could implement to prevent misinformation from spreading while ensuring user’s privacy. The solution is based on having a set of already fact-checked image hashes on the user’s device and matching these images with the content being shared. We would expect that by the time fact checking organizations receive and fact check a piece of content, most of its spread would be done, thus defeating the purpose of fact checking. However, as our results show, in part because of the closed nature of the platform, and the lack of a central authority to stop the spread, there are images that keep spreading even long after being fact checked. Looking at the actual sharing of these images in our data, we show that over 40% of the spreading of the misinformation detected in Brazil and 82% in India could have been prevented by implementing these measures in the public groups we monitor. Apart from presenting a simple, practical and deployable solution to the problem, our paper presents a counter-voice to strong claims by governments to allow back doors in the encryption for law and order purposes. Finally, our approach is also in line with WhatsApp’s efforts to limit forwarding. As showed in recent studies [12], this approach can impose delays in the content dissemination, which represent an extra time for fact-checking and more effectiveness for our approach.

Limitations. Labeling and implementing forwarding restrictions on already known fake images can only help to a certain degree. Our proposal has a few limitations: (i) Does labeling actually make a difference? Firstly, careful considerations must be taken in order to prevent backfire effect [15, 7]; (ii) Our dataset from WhatsApp is not representative, since it comes from public groups which are a small fraction of all groups. However, this is the largest available sample of WhatsApp data to test such an architecture. (iii) The amount of misinformation that could be prevented could be an overestimate because these fact-checked images are already popular. Even though our approach does not remove all misinformation, it can help remove popular, viral misinformation that has already been fact-checked. Given that only a small amount of content gets viral on WhatsApp [12], such efforts are helpful to prevent lethal mis/disinformation campaigns and rumors.


This research was partially supported by Ministério Público de Minas Gerais (MPMG), project Analytical Capabilities, as well as grants from FAPEMIG, CNPq, and CAPES.


  • [1] H. Allcott and M. Gentzkow (2017) Social media and fake news in the 2016 election. Technical report National Bureau of Economic Research. Cited by: Introduction.
  • [2] C. Arun (2019) On whatsapp, rumours, and lynchings. Economic & Political Weekly 54 (6), pp. 30–35. Cited by: Introduction, Architecture.
  • [3] J. Constine (2018)(Website) External Links: Link Cited by: Architecture.
  • [4] H. Farid (2018) Reining in online abuses. Technology & Innovation 19 (3), pp. 593–599. External Links: ISSN 1949-8241, Link Cited by: Background and Motivation, Architecture.
  • [5] K. Garimella and G. Tyson (2018) Whatapp doc? a first look at whatsapp public group data. In Proc. of the ICWSM, Cited by: Datasets.
  • [6] H. Gupta and H. Taneja (2018)(Website) External Links: Link Cited by: Background and Motivation, Architecture.
  • [7] S. Levin (2017)(Website) External Links: Link Cited by: Conclusions.
  • [8] C. Lokniti (2018)(Website) External Links: Link Cited by: Datasets.
  • [9] V. Malka, Y. Ariel, and R. Avidar (2015) Fighting, worrying and sharing: Operation Protective Edge’as the first WhatsApp War. Media, War & Conflict 8 (3), pp. 329–344. Cited by: Background and Motivation.
  • [10] J. Mayer (2019) Content Moderation for End-to-End Encrypted Messaging. Note: Princeton University Cited by: Background and Motivation, Architecture.
  • [11] P. Melo, J. Messias, G. Resende, K. Garimella, J. Almeida, and F. Benevenuto (2019) WhatsApp monitor: a fact-checking system for whatsapp. In Proc. of the ICWSM, Cited by: Datasets.
  • [12] P. Melo, C. C. Vieira, K. Garimella, P. O. de Melo, and F. Benevenuto (2019) Can whatsapp counter misinformation by limiting message forwarding?. In Proc. of the Complex Networks, Cited by: Conclusions, Conclusions.
  • [13] A. Moreno, P. Garrison, and K. Bhat (2017) Whatsapp for monitoring and response during critical events: aggie in the ghana 2016 election. In Proc. of the ISCRAM, Cited by: Introduction.
  • [14] N. Newman, R. Fletcher, A. Kalogeropoulos, and R. K. Nielsen (2019) Reuters Institute Digital News Report 2019 . Note: Reuters Institute for the Study of Journalism Cited by: Introduction, Datasets.
  • [15] B. Nyhan and J. Reifler (2010) When corrections fail: the persistence of political misperceptions. Political Behavior 32 (2), pp. 303–330. Cited by: Conclusions.
  • [16] J. C. S. Reis, P. Melo, K. Garimella, J. M. Almeida, D. Eckles, and F. Benevenuto (2020) A dataset of fact-checked images shared on whatsapp during the brazilian and indian elections. In Proc. of the ICWSM, Cited by: Datasets.
  • [17] G. Resende, P. Melo, H. Sousa, J. Messias, M. Vasconcelos, J. Almeida, and F. Benevenuto (2019) (Mis)information dissemination in whatsapp: gathering, analyzing and countermeasures. In Proc. of the WWW, Cited by: Introduction, Datasets, Datasets.
  • [18] T. Simon, A. Goldberg, D. Leykin, and B. Adini (2016) Kidnapping whatsapp–rumors during the search and rescue operation of three kidnapped youth. Computers in Human Behavior 64, pp. 183–190. Cited by: Background and Motivation.
  • [19] H. Song, S. Dharmapurikar, J. Turner, and J. Lockwood (2005) Fast hash table lookup using extended bloom filter: an aid to network processing. In ACM SIGCOMM Computer Communication Review, Vol. 35, pp. 181–192. Cited by: Architecture.
  • [20] A. Swaminathan, Yinian Mao, and Min Wu (2006) Robust and secure image hashing. IEEE Transactions on Information Forensics and Security 1 (2), pp. 215–230. Cited by: Background and Motivation.
  • [21] C. Zauner (2010) Implementation and benchmarking of perceptual image hash functions. Cited by: Datasets.