HateProof: Are Hateful Meme Detection Systems really Robust?

02/11/2023
by   Piush Aggarwal, et al.
0

Exploiting social media to spread hate has tremendously increased over the years. Lately, multi-modal hateful content such as memes has drawn relatively more traction than uni-modal content. Moreover, the availability of implicit content payloads makes them fairly challenging to be detected by existing hateful meme detection systems. In this paper, we present a use case study to analyze such systems' vulnerabilities against external adversarial attacks. We find that even very simple perturbations in uni-modal and multi-modal settings performed by humans with little knowledge about the model can make the existing detection models highly vulnerable. Empirically, we find a noticeable performance drop of as high as 10 As a remedy, we attempt to boost the model's robustness using contrastive learning as well as an adversarial training-based method - VILLA. Using an ensemble of the above two approaches, in two of our high resolution datasets, we are able to (re)gain back the performance to a large extent for certain attacks. We believe that ours is a first step toward addressing this crucial problem in an adversarial setting and would inspire more such investigations in the future.

READ FULL TEXT

page 4

page 5

research
06/17/2022

Is Multi-Modal Necessarily Better? Robustness Evaluation of Multi-modal Fake News Detection

The proliferation of fake news and its serious negative social influence...
research
03/23/2022

On Adversarial Robustness of Large-scale Audio Visual Learning

As audio-visual systems are being deployed for safety-critical tasks suc...
research
01/17/2021

Exploring Adversarial Robustness of Multi-Sensor Perception Systems in Self Driving

Modern self-driving perception systems have been shown to improve upon p...
research
08/21/2023

On the Adversarial Robustness of Multi-Modal Foundation Models

Multi-modal foundation models combining vision and language models such ...
research
03/15/2020

Can Celebrities Burst Your Bubble?

Polarization is a growing, global problem. As such, many social media ba...
research
01/26/2021

Towards Universal Physical Attacks On Cascaded Camera-Lidar 3D Object Detection Models

We propose a universal and physically realizable adversarial attack on a...
research
12/01/2022

FakeOut: Leveraging Out-of-domain Self-supervision for Multi-modal Video Deepfake Detection

Video synthesis methods rapidly improved in recent years, allowing easy ...

Please sign up or login with your details

Forgot password? Click here to reset