How Robust is Google's Bard to Adversarial Image Attacks?

09/21/2023
by   Yinpeng Dong, et al.
0

Multimodal Large Language Models (MLLMs) that integrate text and other modalities (especially vision) have achieved unprecedented performance in various multimodal tasks. However, due to the unsolved adversarial robustness problem of vision models, MLLMs can have more severe safety and security risks by introducing the vision inputs. In this work, we study the adversarial robustness of Google's Bard, a competitive chatbot to ChatGPT that released its multimodal capability recently, to better understand the vulnerabilities of commercial MLLMs. By attacking white-box surrogate vision encoders or MLLMs, the generated adversarial examples can mislead Bard to output wrong image descriptions with a 22 show that the adversarial examples can also attack other MLLMs, e.g., a 26 attack success rate against Bing Chat and a 86 ERNIE bot. Moreover, we identify two defense mechanisms of Bard, including face detection and toxicity detection of images. We design corresponding attacks to evade these defenses, demonstrating that the current defenses of Bard are also vulnerable. We hope this work can deepen our understanding on the robustness of MLLMs and facilitate future research on defenses. Our code is available at https://github.com/thu-ml/Attack-Bard.

READ FULL TEXT

page 2

page 6

research
04/10/2018

On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses

Neural networks are known to be vulnerable to adversarial examples. In t...
research
05/26/2023

On Evaluating Adversarial Robustness of Large Vision-Language Models

Large vision-language models (VLMs) such as GPT-4 have achieved unpreced...
research
08/16/2023

Self-Deception: Reverse Penetrating the Semantic Firewall of Large Language Models

Large language models (LLMs), such as ChatGPT, have emerged with astonis...
research
06/22/2023

Visual Adversarial Examples Jailbreak Large Language Models

Recently, there has been a surge of interest in introducing vision into ...
research
02/20/2019

advertorch v0.1: An Adversarial Robustness Toolbox based on PyTorch

advertorch is a toolbox for adversarial robustness research. It contains...
research
05/07/2023

Pick your Poison: Undetectability versus Robustness in Data Poisoning Attacks against Deep Image Classification

Deep image classification models trained on large amounts of web-scraped...
research
05/01/2023

Attack-SAM: Towards Evaluating Adversarial Robustness of Segment Anything Model

Segment Anything Model (SAM) has attracted significant attention recentl...

Please sign up or login with your details

Forgot password? Click here to reset