Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

by   Wenkai Yang, et al.

Recent studies have revealed a security threat to natural language processing (NLP) models, called the Backdoor Attack. Victim models can maintain competitive performance on clean samples while behaving abnormally on samples with a specific trigger word inserted. Previous backdoor attacking methods usually assume that attackers have a certain degree of data knowledge, either the dataset which users would use or proxy datasets for a similar task, for implementing the data poisoning procedure. However, in this paper, we find that it is possible to hack the model in a data-free way by modifying one single word embedding vector, with almost no accuracy sacrificed on clean samples. Experimental results on sentiment analysis and sentence-pair classification tasks show that our method is more efficient and stealthier. We hope this work can raise the awareness of such a critical security risk hidden in the embedding layers of NLP models. Our code is available at



page 8


RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

Backdoor attacks, which maliciously control a well-trained model's outpu...

Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution

Recent studies show that neural natural language processing (NLP) models...

Membership Inference on Word Embedding and Beyond

In the text processing context, most ML models are built on word embeddi...

Conversational Word Embedding for Retrieval-Based Dialog System

Human conversations contain many types of information, e.g., knowledge, ...

Exploring Text Specific and Blackbox Fairness Algorithms in Multimodal Clinical NLP

Clinical machine learning is increasingly multimodal, collected in both ...

Revisiting Language Encoding in Learning Multilingual Representations

Transformer has demonstrated its great power to learn contextual word re...

The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models

We present the Language Interpretability Tool (LIT), an open-source plat...

Code Repositories


Code for the paper "Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models" (NAACL-HLT 2021)

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.