Adversarial Robustness of Neural-Statistical Features in Detection of Generative Transformers

03/02/2022
by   Evan Crothers, et al.
0

The detection of computer-generated text is an area of rapidly increasing significance as nascent generative models allow for efficient creation of compelling human-like text, which may be abused for the purposes of spam, disinformation, phishing, or online influence campaigns. Past work has studied detection of current state-of-the-art models, but despite a developing threat landscape, there has been minimal analysis of the robustness of detection methods to adversarial attacks. To this end, we evaluate neural and non-neural approaches on their ability to detect computer-generated text, their robustness against text adversarial attacks, and the impact that successful adversarial attacks have on human judgement of text quality. We find that while statistical features underperform neural features, statistical features provide additional adversarial robustness that can be leveraged in ensemble detection models. In the process, we find that previously effective complex phrasal features for detection of computer-generated text hold little predictive power against contemporary generative models, and identify promising statistical features to use instead. Finally, we pioneer the usage of ΔMAUVE as a proxy measure for human judgement of adversarial text quality.

READ FULL TEXT
research
02/01/2019

Robustness of Generalized Learning Vector Quantization Models against Adversarial Attacks

Adversarial attacks and the development of (deep) neural networks robust...
research
10/17/2022

Deepfake Text Detection: Limitations and Opportunities

Recent advances in generative models for language have enabled the creat...
research
10/13/2022

Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods

Advances in natural language generation (NLG) have resulted in machine g...
research
02/11/2023

Mutation-Based Adversarial Attacks on Neural Text Detectors

Neural text detectors aim to decide the characteristics that distinguish...
research
12/30/2022

Defense Against Adversarial Attacks on Audio DeepFake Detection

Audio DeepFakes are artificially generated utterances created using deep...
research
05/26/2019

Generalizable Adversarial Attacks Using Generative Models

Adversarial attacks on deep neural networks traditionally rely on a cons...
research
04/23/2021

Evaluating Deception Detection Model Robustness To Linguistic Variation

With the increasing use of machine-learning driven algorithmic judgement...

Please sign up or login with your details

Forgot password? Click here to reset