Ceci n'est pas une pomme: Adversarial Illusions in Multi-Modal Embeddings

08/22/2023
by   Eugene Bagdasaryan, et al.
0

Multi-modal encoders map images, sounds, texts, videos, etc. into a single embedding space, aligning representations across modalities (e.g., associate an image of a dog with a barking sound). We show that multi-modal embeddings can be vulnerable to an attack we call "adversarial illusions." Given an input in any modality, an adversary can perturb it so as to make its embedding close to that of an arbitrary, adversary-chosen input in another modality. Illusions thus enable the adversary to align any image with any text, any text with any sound, etc. Adversarial illusions exploit proximity in the embedding space and are thus agnostic to downstream tasks. Using ImageBind embeddings, we demonstrate how adversarially aligned inputs, generated without knowledge of specific downstream tasks, mislead image generation, text generation, and zero-shot classification.

READ FULL TEXT

page 2

page 4

page 5

research
11/30/2021

Sound-Guided Semantic Image Manipulation

The recent success of the generative model shows that leveraging the mul...
research
07/26/2023

Plug and Pray: Exploiting off-the-shelf components of Multi-Modal Models

The rapid growth and increasing popularity of incorporating additional m...
research
02/08/2023

Diagnosing and Rectifying Vision Models using Language

Recent multi-modal contrastive learning models have demonstrated the abi...
research
07/01/2023

ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models

Large-scale vision-language models (VLMs) like CLIP successfully find co...
research
05/07/2020

COBRA: Contrastive Bi-Modal Representation Algorithm

There are a wide range of applications that involve multi-modal data, su...
research
08/30/2022

Robust Sound-Guided Image Manipulation

Recent successes suggest that an image can be manipulated by a text prom...
research
05/06/2021

Learning Neighborhood Representation from Multi-Modal Multi-Graph: Image, Text, Mobility Graph and Beyond

Recent urbanization has coincided with the enrichment of geotagged data,...

Please sign up or login with your details

Forgot password? Click here to reset