Check Me If You Can: Detecting ChatGPT-Generated Academic Writing using CheckGPT

06/07/2023
by   Zeyan Liu, et al.
0

With ChatGPT under the spotlight, utilizing large language models (LLMs) for academic writing has drawn a significant amount of discussions and concerns in the community. While substantial research efforts have been stimulated for detecting LLM-Generated Content (LLM-content), most of the attempts are still in the early stage of exploration. In this paper, we present a holistic investigation of detecting LLM-generate academic writing, by providing a dataset, evidence, and algorithms, in order to inspire more community effort to address the concern of LLM academic misuse. We first present GPABenchmark, a benchmarking dataset of 600,000 samples of human-written, GPT-written, GPT-completed, and GPT-polished abstracts of research papers in CS, physics, and humanities and social sciences (HSS). We show that existing open-source and commercial GPT detectors provide unsatisfactory performance on GPABenchmark, especially for GPT-polished text. Moreover, through a user study of 150+ participants, we show that it is highly challenging for human users, including experienced faculty members and researchers, to identify GPT-generated abstracts. We then present CheckGPT, a novel LLM-content detector consisting of a general representation module and an attentive-BiLSTM classification module, which is accurate, transferable, and interpretable. Experimental results show that CheckGPT achieves an average classification accuracy of 98 task-specific discipline-specific detectors and the unified detectors. CheckGPT is also highly transferable that, without tuning, it achieves  90 new domains, such as news articles, while a model tuned with approximately 2,000 samples in the target domain achieves  98 demonstrate the explainability insights obtained from CheckGPT to reveal the key behaviors of how LLM generates texts.

READ FULL TEXT

page 11

page 22

page 23

research
04/24/2023

CHEAT: A Large-scale Dataset for Detecting ChatGPT-writtEn AbsTracts

The powerful ability of ChatGPT has caused widespread concern in the aca...
research
07/10/2023

Detecting LLM-Generated Text in Computing Education: A Comparative Study for ChatGPT Cases

Due to the recent improvements and wide availability of Large Language M...
research
04/11/2023

Evaluating AIGC Detectors on Code Content

Artificial Intelligence Generated Content (AIGC) has garnered considerab...
research
07/21/2023

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

The remarkable capabilities of large-scale language models, such as Chat...
research
04/06/2023

GPT detectors are biased against non-native English writers

The rapid adoption of generative language models has brought about subst...
research
12/15/2020

Writing Polishment with Simile: Task, Dataset and A Neural Approach

A simile is a figure of speech that directly makes a comparison, showing...
research
07/21/2023

OUTFOX: LLM-generated Essay Detection through In-context Learning with Adversarially Generated Examples

Large Language Models (LLMs) have achieved human-level fluency in text g...

Please sign up or login with your details

Forgot password? Click here to reset