Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples

09/05/2022
by   Hezekiah J. Branch, et al.
0

Recent advances in the development of large language models have resulted in public access to state-of-the-art pre-trained language models (PLMs), including Generative Pre-trained Transformer 3 (GPT-3) and Bidirectional Encoder Representations from Transformers (BERT). However, evaluations of PLMs, in practice, have shown their susceptibility to adversarial attacks during the training and fine-tuning stages of development. Such attacks can result in erroneous outputs, model-generated hate speech, and the exposure of users' sensitive information. While existing research has focused on adversarial attacks during either the training or the fine-tuning of PLMs, there is a deficit of information on attacks made between these two development phases. In this work, we highlight a major security vulnerability in the public release of GPT-3 and further investigate this vulnerability in other state-of-the-art PLMs. We restrict our work to pre-trained models that have not undergone fine-tuning. Further, we underscore token distance-minimized perturbations as an effective adversarial approach, bypassing both supervised and unsupervised quality measures. Following this approach, we observe a significant decrease in text classification quality when evaluating for semantic similarity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/14/2020

Utilizing Bidirectional Encoder Representations from Transformers for Answer Selection

Pre-training a transformer-based model for the language modeling task in...
research
08/14/2021

Few-Sample Named Entity Recognition for Security Vulnerability Reports by Fine-Tuning Pre-Trained Language Models

Public security vulnerability reports (e.g., CVE reports) play an import...
research
06/02/2023

Unsupervised Paraphrasing of Multiword Expressions

We propose an unsupervised approach to paraphrasing multiword expression...
research
03/13/2023

Robust Contrastive Language-Image Pretraining against Adversarial Attacks

Contrastive vision-language representation learning has achieved state-o...
research
04/16/2021

Towards Variable-Length Textual Adversarial Attacks

Adversarial attacks have shown the vulnerability of machine learning mod...
research
07/25/2022

Fine-Tuning BERT for Automatic ADME Semantic Labeling in FDA Drug Labeling to Enhance Product-Specific Guidance Assessment

Product-specific guidances (PSGs) recommended by the United States Food ...
research
10/28/2022

Assessing Phrase Break of ESL speech with Pre-trained Language Models

This work introduces an approach to assessing phrase break in ESL learne...

Please sign up or login with your details

Forgot password? Click here to reset