TrojText: Test-time Invisible Textual Trojan Insertion

03/03/2023
by   Yepeng Liu, et al.
0

In Natural Language Processing (NLP), intelligent neuron models can be susceptible to textual Trojan attacks. Such attacks occur when Trojan models behave normally for standard inputs but generate malicious output for inputs that contain a specific trigger. Syntactic-structure triggers, which are invisible, are becoming more popular for Trojan attacks because they are difficult to detect and defend against. However, these types of attacks require a large corpus of training data to generate poisoned samples with the necessary syntactic structures for Trojan insertion. Obtaining such data can be difficult for attackers, and the process of generating syntactic poisoned triggers and inserting Trojans can be time-consuming. This paper proposes a solution called TrojText, which aims to determine whether invisible textual Trojan attacks can be performed more efficiently and cost-effectively without training data. The proposed approach, called the Representation-Logit Trojan Insertion (RLI) algorithm, uses smaller sampled test data instead of large training data to achieve the desired attack. The paper also introduces two additional techniques, namely the accumulated gradient ranking (AGR) and Trojan Weights Pruning (TWP), to reduce the number of tuned parameters and the attack overhead. The TrojText approach was evaluated on three datasets (AG's News, SST-2, and OLID) using three NLP models (BERT, XLNet, and DeBERTa). The experiments demonstrated that the TrojText approach achieved a 98.35% classification accuracy for test sentences in the target class on the BERT model for the AG's News dataset. The source code for TrojText is available at https://github.com/UCF-ML-Research/TrojText.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2021

Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger

Backdoor attacks are a kind of insidious security threat against machine...
research
11/22/2022

A Survey on Backdoor Attack and Defense in Natural Language Processing

Deep learning is becoming increasingly popular in real-life applications...
research
04/30/2023

Assessing Vulnerabilities of Adversarial Learning Algorithm through Poisoning Attacks

Adversarial training (AT) is a robust learning algorithm that can defend...
research
06/03/2021

Defending against Backdoor Attacks in Natural Language Generation

The frustratingly fragile nature of neural network models make current n...
research
10/14/2022

Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks

Natural language processing (NLP) models are known to be vulnerable to b...
research
04/11/2021

Disentangled Contrastive Learning for Learning Robust Textual Representations

Although the self-supervised pre-training of transformer models has resu...
research
08/04/2023

ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP

Backdoor attacks have emerged as a prominent threat to natural language ...

Please sign up or login with your details

Forgot password? Click here to reset