On the Adversarial Robustness of Multi-Modal Foundation Models

08/21/2023
by   Christian Schlarmann, et al.
0

Multi-modal foundation models combining vision and language models such as Flamingo or GPT-4 have recently gained enormous interest. Alignment of foundation models is used to prevent models from providing toxic or harmful output. While malicious users have successfully tried to jailbreak foundation models, an equally important question is if honest users could be harmed by malicious third-party content. In this paper we show that imperceivable attacks on images in order to change the caption output of a multi-modal foundation model can be used by malicious content providers to harm honest users e.g. by guiding them to malicious websites or broadcast fake information. This indicates that countermeasures to adversarial attacks should be used by any deployed multi-modal foundation model.

READ FULL TEXT

page 1

page 2

page 5

page 6

research
06/17/2022

Is Multi-Modal Necessarily Better? Robustness Evaluation of Multi-modal Fake News Detection

The proliferation of fake news and its serious negative social influence...
research
12/08/2021

FLAVA: A Foundational Language And Vision Alignment Model

State-of-the-art vision and vision-and-language models rely on large-sca...
research
03/23/2022

On Adversarial Robustness of Large-scale Audio Visual Learning

As audio-visual systems are being deployed for safety-critical tasks suc...
research
09/01/2023

Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Are foundation models secure from malicious actors? In this work, we foc...
research
02/11/2023

HateProof: Are Hateful Meme Detection Systems really Robust?

Exploiting social media to spread hate has tremendously increased over t...
research
07/26/2023

Plug and Pray: Exploiting off-the-shelf components of Multi-Modal Models

The rapid growth and increasing popularity of incorporating additional m...
research
03/18/2021

Reading Isn't Believing: Adversarial Attacks On Multi-Modal Neurons

With Open AI's publishing of their CLIP model (Contrastive Language-Imag...

Please sign up or login with your details

Forgot password? Click here to reset