Adversarial Demonstration Attacks on Large Language Models

05/24/2023
by   Jiongxiao Wang, et al.
0

With the emergence of more powerful large language models (LLMs), such as ChatGPT and GPT-4, in-context learning (ICL) has gained significant prominence in leveraging these models for specific tasks by utilizing data-label pairs as precondition prompts. While incorporating demonstrations can greatly enhance the performance of LLMs across various tasks, it may introduce a new security concern: attackers can manipulate only the demonstrations without changing the input to perform an attack. In this paper, we investigate the security concern of ICL from an adversarial perspective, focusing on the impact of demonstrations. We propose an ICL attack based on TextAttack, which aims to only manipulate the demonstration without changing the input to mislead the models. Our results demonstrate that as the number of demonstrations increases, the robustness of in-context learning would decreases. Furthermore, we also observe that adversarially attacked demonstrations exhibit transferability to diverse input examples. These findings emphasize the critical security risks associated with ICL and underscore the necessity for extensive research on the robustness of ICL, particularly given its increasing significance in the advancement of LLMs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2022

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

Large language models (LMs) are able to in-context learn – perform a new...
research
06/16/2022

Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator

Large-scale pre-trained language models (PLMs) are well-known for being ...
research
04/15/2023

Constructing Effective In-Context Demonstration for Code Intelligence Tasks: An Empirical Study

Pre-trained models of code have gained widespread popularity in many cod...
research
05/24/2023

From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads

This research article critically examines the potential risks and implic...
research
05/23/2023

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning

In-context learning (ICL) emerges as a promising capability of large lan...
research
05/07/2023

Unified Demonstration Retriever for In-Context Learning

In-context learning is a new learning paradigm where a language model co...
research
05/24/2023

Prompt Optimization of Large Language Model for Interactive Tasks without Gradient and Demonstrations

Large language models (LLMs) have demonstrated remarkable language profi...

Please sign up or login with your details

Forgot password? Click here to reset