OUTFOX: LLM-generated Essay Detection through In-context Learning with Adversarially Generated Examples

07/21/2023
by   Ryuto Koike, et al.
0

Large Language Models (LLMs) have achieved human-level fluency in text generation, making it difficult to distinguish between human-written and LLM-generated texts. This poses a growing risk of misuse of LLMs and demands the development of detectors to identify LLM-generated texts. However, existing detectors degrade detection accuracy by simply paraphrasing LLM-generated texts. Furthermore, the effectiveness of these detectors in real-life situations, such as when students use LLMs for writing homework assignments (e.g., essays) and quickly learn how to evade these detectors, has not been explored. In this paper, we propose OUTFOX, a novel framework that improves the robustness of LLM-generated-text detectors by allowing both the detector and the attacker to consider each other's output and apply this to the domain of student essays. In our framework, the attacker uses the detector's prediction labels as examples for in-context learning and adversarially generates essays that are harder to detect. While the detector uses the adversarially generated essays as examples for in-context learning to learn to detect essays from a strong attacker. Our experiments show that our proposed detector learned in-context from the attacker improves the detection performance on the attacked dataset by up to +41.3 point F1-score. While our proposed attacker can drastically degrade the performance of the detector by up to -57.0 point F1-score compared to the paraphrasing method.

READ FULL TEXT
research
07/21/2023

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

The remarkable capabilities of large-scale language models, such as Chat...
research
06/07/2023

Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts

Rapidly increasing quality of AI-generated content makes it difficult to...
research
05/29/2023

Multiscale Positive-Unlabeled Detection of AI-Generated Texts

Recent releases of Large Language Models (LLMs), e.g. ChatGPT, are aston...
research
07/07/2023

RADAR: Robust AI-Text Detection via Adversarial Learning

Recent advances in large language models (LLMs) and the intensifying pop...
research
05/31/2023

Red Teaming Language Model Detectors with Language Models

The prevalence and high capacity of large language models (LLMs) present...
research
04/16/2023

ArguGPT: evaluating, understanding and identifying argumentative essays generated by GPT models

AI generated content (AIGC) presents considerable challenge to educators...
research
06/07/2023

Check Me If You Can: Detecting ChatGPT-Generated Academic Writing using CheckGPT

With ChatGPT under the spotlight, utilizing large language models (LLMs)...

Please sign up or login with your details

Forgot password? Click here to reset