Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

03/29/2021
by   Wenkai Yang, et al.
2

Recent studies have revealed a security threat to natural language processing (NLP) models, called the Backdoor Attack. Victim models can maintain competitive performance on clean samples while behaving abnormally on samples with a specific trigger word inserted. Previous backdoor attacking methods usually assume that attackers have a certain degree of data knowledge, either the dataset which users would use or proxy datasets for a similar task, for implementing the data poisoning procedure. However, in this paper, we find that it is possible to hack the model in a data-free way by modifying one single word embedding vector, with almost no accuracy sacrificed on clean samples. Experimental results on sentiment analysis and sentence-pair classification tasks show that our method is more efficient and stealthier. We hope this work can raise the awareness of such a critical security risk hidden in the embedding layers of NLP models. Our code is available at https://github.com/lancopku/Embedding-Poisoning.

READ FULL TEXT

Authors

page 8

10/15/2021

RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

Backdoor attacks, which maliciously control a well-trained model's outpu...
06/11/2021

Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution

Recent studies show that neural natural language processing (NLP) models...
06/21/2021

Membership Inference on Word Embedding and Beyond

In the text processing context, most ML models are built on word embeddi...
04/28/2020

Conversational Word Embedding for Retrieval-Based Dialog System

Human conversations contain many types of information, e.g., knowledge, ...
11/19/2020

Exploring Text Specific and Blackbox Fairness Algorithms in Multimodal Clinical NLP

Clinical machine learning is increasingly multimodal, collected in both ...
02/16/2021

Revisiting Language Encoding in Learning Multilingual Representations

Transformer has demonstrated its great power to learn contextual word re...
08/12/2020

The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models

We present the Language Interpretability Tool (LIT), an open-source plat...

Code Repositories

Embedding-Poisoning

Code for the paper "Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models" (NAACL-HLT 2021)


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.