Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification

07/11/2020
by   Chuanshuai Chen, et al.
0

It has been proved that deep neural networks are facing a new threat called backdoor attacks, where the adversary can inject backdoors into the neural network model through poisoning the training dataset. When the input containing some special pattern called the backdoor trigger, the model with backdoor will carry out malicious task such as misclassification specified by adversaries. In text classification systems, backdoors inserted in the models can cause spam or malicious speech to escape detection. Previous work mainly focused on the defense of backdoor attacks in computer vision, little attention has been paid to defense method for RNN backdoor attacks regarding text classification. In this paper, through analyzing the changes in inner LSTM neurons, we proposed a defense method called Backdoor Keyword Identification (BKI) to mitigate backdoor attacks which the adversary performs against LSTM-based text classification by data poisoning. This method can identify and exclude poisoning samples crafted to insert backdoor into the model from training data without a verified and trusted dataset. We evaluate our method on text classification models trained on IMDB dataset and DBpedia ontology dataset, and it achieves good performance regardless of the trigger sentences.

READ FULL TEXT
research
05/29/2019

A backdoor attack against LSTM-based text classification systems

With the widespread use of deep learning system in many applications, th...
research
10/11/2022

Detecting Backdoors in Deep Text Classifiers

Deep neural networks are vulnerable to adversarial attacks, such as back...
research
03/03/2023

NCL: Textual Backdoor Defense Using Noise-augmented Contrastive Learning

At present, backdoor attacks attract attention as they do great harm to ...
research
11/09/2018

Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering

While machine learning (ML) models are being increasingly trusted to mak...
research
05/25/2022

Textual Backdoor Attacks with Iterative Trigger Injection

The backdoor attack has become an emerging threat for Natural Language P...
research
04/29/2022

Backdoor Attacks in Federated Learning by Rare Embeddings and Gradient Ensembling

Recent advances in federated learning have demonstrated its promising ca...
research
11/23/2022

Embedding Compression for Text Classification Using Dictionary Screening

In this paper, we propose a dictionary screening method for embedding co...

Please sign up or login with your details

Forgot password? Click here to reset