On Evaluating Adversarial Robustness of Large Vision-Language Models

05/26/2023
by   Yunqing Zhao, et al.
0

Large vision-language models (VLMs) such as GPT-4 have achieved unprecedented performance in response generation, especially with visual inputs, enabling more creative and adaptable interaction than large language models such as ChatGPT. Nonetheless, multimodal generation exacerbates safety concerns, since adversaries may successfully evade the entire system by subtly manipulating the most vulnerable modality (e.g., vision). To this end, we propose evaluating the robustness of open-source large VLMs in the most realistic and high-risk setting, where adversaries have only black-box system access and seek to deceive the model into returning the targeted responses. In particular, we first craft targeted adversarial examples against pretrained models such as CLIP and BLIP, and then transfer these adversarial examples to other VLMs such as MiniGPT-4, LLaVA, UniDiffuser, BLIP-2, and Img2Prompt. In addition, we observe that black-box queries on these VLMs can further improve the effectiveness of targeted evasion, resulting in a surprisingly high success rate for generating targeted responses. Our findings provide a quantitative understanding regarding the adversarial vulnerability of large VLMs and call for a more thorough examination of their potential security flaws before deployment in practice. Code is at https://github.com/yunqing-me/AttackVLM.

READ FULL TEXT

page 4

page 8

page 22

page 23

page 24

page 25

page 26

page 27

research
09/21/2023

How Robust is Google's Bard to Adversarial Image Attacks?

Multimodal Large Language Models (MLLMs) that integrate text and other m...
research
05/12/2020

Evaluating Ensemble Robustness Against Adversarial Attacks

Adversarial examples, which are slightly perturbed inputs generated with...
research
08/27/2023

Detecting Language Model Attacks with Perplexity

A novel hack involving Large Language Models (LLMs) has emerged, leverag...
research
05/01/2023

Attack-SAM: Towards Evaluating Adversarial Robustness of Segment Anything Model

Segment Anything Model (SAM) has attracted significant attention recentl...
research
06/22/2023

Visual Adversarial Examples Jailbreak Large Language Models

Recently, there has been a surge of interest in introducing vision into ...
research
06/07/2023

PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

The increasing reliance on Large Language Models (LLMs) across academia ...
research
06/28/2022

Rethinking Adversarial Examples for Location Privacy Protection

We have investigated a new application of adversarial examples, namely l...

Please sign up or login with your details

Forgot password? Click here to reset