(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

07/19/2023
by   Eugene Bagdasaryan, et al.
0

We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker's instruction. We illustrate this attack with several proof-of-concept examples targeting LLaVa and PandaGPT.

READ FULL TEXT

page 2

page 5

page 7

page 8

research
06/15/2023

Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration

Although instruction-tuned large language models (LLMs) have exhibited r...
research
07/23/2021

Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation

Language instruction plays an essential role in the natural language gro...
research
07/31/2023

Virtual Prompt Injection for Instruction-Tuned Large Language Models

We present Virtual Prompt Injection (VPI) for instruction-tuned Large La...
research
09/15/2022

CLIPping Privacy: Identity Inference Attacks on Multi-Modal Machine Learning Models

As deep learning is now used in many real-world applications, research h...
research
01/28/2019

Multi-modal dialog for browsing large visual catalogs using exploration-exploitation paradigm in a joint embedding space

We present a multi-modal dialog system to assist online shoppers in visu...
research
03/15/2021

Return-Oriented Programming on RISC-V

This paper provides the first analysis on the feasibility of Return-Orie...

Please sign up or login with your details

Forgot password? Click here to reset