Red Teaming Language Model Detectors with Language Models

05/31/2023
by   Zhouxing Shi, et al.
0

The prevalence and high capacity of large language models (LLMs) present significant safety and ethical risks when malicious users exploit them for automated content generation. To prevent the potentially deceptive usage of LLMs, recent works have proposed several algorithms to detect machine-generated text. In this paper, we systematically test the reliability of the existing detectors, by designing two types of attack strategies to fool the detectors: 1) replacing words with their synonyms based on the context; 2) altering the writing style of generated text. These strategies are implemented by instructing LLMs to generate synonymous word substitutions or writing directives that modify the style without human involvement, and the LLMs leveraged in the attack can also be protected by detectors. Our research reveals that our attacks effectively compromise the performance of all tested detectors, thereby underscoring the urgent need for the development of more robust machine-generated text detection systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2023

Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect?

Recent advances in natural language processing (NLP) have led to the dev...
research
03/23/2023

Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense

To detect the deployment of large language models for malicious use case...
research
02/19/2020

Attacking Neural Text Detectors

Machine learning based language models have recently made significant pr...
research
05/18/2023

Large Language Models can be Guided to Evade AI-Generated Text Detection

Large Language Models (LLMs) have demonstrated exceptional performance i...
research
05/05/2023

Simulating H.P. Lovecraft horror literature with the ChatGPT large language model

In this paper, we present a novel approach to simulating H.P. Lovecraft'...
research
07/21/2023

OUTFOX: LLM-generated Essay Detection through In-context Learning with Adversarially Generated Examples

Large Language Models (LLMs) have achieved human-level fluency in text g...
research
04/16/2023

ArguGPT: evaluating, understanding and identifying argumentative essays generated by GPT models

AI generated content (AIGC) presents considerable challenge to educators...

Please sign up or login with your details

Forgot password? Click here to reset