CHEAT: A Large-scale Dataset for Detecting ChatGPT-writtEn AbsTracts

04/24/2023
by   Peipeng Yu, et al.
0

The powerful ability of ChatGPT has caused widespread concern in the academic community. Malicious users could synthesize dummy academic content through ChatGPT, which is extremely harmful to academic rigor and originality. The need to develop ChatGPT-written content detection algorithms call for large-scale datasets. In this paper, we initially investigate the possible negative impact of ChatGPT on academia,and present a large-scale CHatGPT-writtEn AbsTract dataset (CHEAT) to support the development of detection algorithms. In particular, the ChatGPT-written abstract dataset contains 35,304 synthetic abstracts, with Generation, Polish, and Mix as prominent representatives. Based on these data, we perform a thorough analysis of the existing text synthesis detection algorithms. We show that ChatGPT-written abstracts are detectable, while the detection difficulty increases with human involvement.

READ FULL TEXT

page 1

page 4

page 7

research
03/17/2022

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

Toxic language detection systems often falsely flag text that contains m...
research
06/07/2023

Check Me If You Can: Detecting ChatGPT-Generated Academic Writing using CheckGPT

With ChatGPT under the spotlight, utilizing large language models (LLMs)...
research
07/31/2023

HouYi: An open-source large language model specially designed for renewable energy and carbon neutrality field

Renewable energy is important for achieving carbon neutrality goal. With...
research
07/23/2018

AceKG: A Large-scale Knowledge Graph for Academic Data Mining

Most existing knowledge graphs (KGs) in academic domains suffer from pro...
research
10/15/2019

From Academia to Software Development: Publication Citations in Source Code Comments

Academic publications have been evaluated with the impact on research co...
research
04/11/2023

Towards an Understanding and Explanation for Mixed-Initiative Artificial Scientific Text Detection

Large language models (LLMs) have gained popularity in various fields fo...
research
08/18/2021

What is an Algorithm?: a Modern View

Although algorithm is one of the central subjects, there have been littl...

Please sign up or login with your details

Forgot password? Click here to reset