COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in Language Models

06/09/2023
by   Zihao Tan, et al.
0

Prompt-based learning has been proved to be an effective way in pre-trained language models (PLMs), especially in low-resource scenarios like few-shot settings. However, the trustworthiness of PLMs is of paramount significance and potential vulnerabilities have been shown in prompt-based templates that could mislead the predictions of language models, causing serious security concerns. In this paper, we will shed light on some vulnerabilities of PLMs, by proposing a prompt-based adversarial attack on manual templates in black box scenarios. First of all, we design character-level and word-level heuristic approaches to break manual templates separately. Then we present a greedy algorithm for the attack based on the above heuristic destructive approaches. Finally, we evaluate our approach with the classification tasks on three variants of BERT series models and eight datasets. And comprehensive experimental results justify the effectiveness of our approach in terms of attack success rate and attack speed. Further experimental studies indicate that our proposed method also displays good capabilities in scenarios with varying shot counts, template lengths and query counts, exhibiting good generalizability.

READ FULL TEXT
research
04/21/2020

BERT-ATTACK: Adversarial Attack Against BERT Using BERT

Adversarial attacks for discrete data (such as text) has been proved sig...
research
09/19/2023

GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

Large language models (LLMs) have recently experienced tremendous popula...
research
04/11/2022

Exploring the Universal Vulnerability of Prompt-based Learning Paradigm

Prompt-based learning paradigm bridges the gap between pre-training and ...
research
09/20/2022

A Few-shot Approach to Resume Information Extraction via Prompts

Prompt learning has been shown to achieve near-Fine-tune performance in ...
research
06/12/2023

TrojPrompt: A Black-box Trojan Attack on Pre-trained Language Models

Prompt learning has been proven to be highly effective in improving pre-...
research
09/05/2022

PromptAttack: Prompt-based Attack for Language Models via Gradient Search

As the pre-trained language models (PLMs) continue to grow, so do the ha...
research
09/11/2023

FuzzLLM: A Novel and Universal Fuzzing Framework for Proactively Discovering Jailbreak Vulnerabilities in Large Language Models

Jailbreak vulnerabilities in Large Language Models (LLMs), which exploit...

Please sign up or login with your details

Forgot password? Click here to reset