Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!

03/18/2021
by   Xuanli He, et al.
0

Natural language processing (NLP) tasks, ranging from text classification to text generation, have been revolutionised by the pre-trained language models, such as BERT. This allows corporations to easily build powerful APIs by encapsulating fine-tuned BERT models for downstream tasks. However, when a fine-tuned BERT model is deployed as a service, it may suffer from different attacks launched by malicious users. In this work, we first present how an adversary can steal a BERT-based API service (the victim/target model) on multiple benchmark datasets with limited prior knowledge and queries. We further show that the extracted model can lead to highly transferable adversarial attacks against the victim model. Our studies indicate that the potential vulnerabilities of BERT-based API services still hold, even when there is an architectural mismatch between the victim model and the attack model. Finally, we investigate two defence strategies to protect the victim model and find that unless the performance of the victim model is sacrificed, both model ex-traction and adversarial transferability can effectively compromise the target models

READ FULL TEXT
research
05/23/2021

Killing Two Birds with One Stone: Stealing Model and Inferring Attribute from BERT-based APIs

The advances in pre-trained models (e.g., BERT, XLNET and etc) have larg...
research
04/21/2020

BERT-ATTACK: Adversarial Attack Against BERT Using BERT

Adversarial attacks for discrete data (such as text) has been proved sig...
research
06/13/2021

Target Model Agnostic Adversarial Attacks with Query Budgets on Language Understanding Models

Despite significant improvements in natural language understanding model...
research
05/17/2023

Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark

Large language models (LLMs) have demonstrated powerful capabilities in ...
research
10/27/2019

Thieves on Sesame Street! Model Extraction of BERT-based APIs

We study the problem of model extraction in natural language processing,...
research
08/01/2020

Trojaning Language Models for Fun and Profit

Recent years have witnessed a new paradigm of building natural language ...
research
07/14/2022

Active Data Pattern Extraction Attacks on Generative Language Models

With the wide availability of large pre-trained language model checkpoin...

Please sign up or login with your details

Forgot password? Click here to reset