Classification of Human- and AI-Generated Texts: Investigating Features for ChatGPT

08/10/2023
by   Lorenz Mindner, et al.
0

Recently, generative AIs like ChatGPT have become available to the wide public. These tools can for instance be used by students to generate essays or whole theses. But how does a teacher know whether a text is written by a student or an AI? In our work, we explore traditional and new features to (1) detect text generated by AI from scratch and (2) text rephrased by AI. Since we found that classification is more difficult when the AI has been instructed to create the text in a way that a human would not recognize that it was generated by an AI, we also investigate this more advanced case. For our experiments, we produced a new text corpus covering 10 school topics. Our best systems to classify basic and advanced human-generated/AI-generated texts have F1-scores of over 96 human-generated/AI-rephrased texts have F1-scores of more than 78 use a combination of perplexity, semantic, list lookup, error-based, readability, AI feedback, and text vector features. Our results show that the new features substantially help to improve the performance of many classifiers. Our best basic text rephrasing detection system even outperforms GPTZero by 183.8

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2023

Generative AI Text Classification using Ensemble LLM Approaches

Large Language Models (LLMs) have shown impressive performance across a ...
research
06/07/2023

Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts

Rapidly increasing quality of AI-generated content makes it difficult to...
research
01/11/2023

ChatGPT is not all you need. A State of the Art Review of large Generative AI models

During the last two years there has been a plethora of large generative ...
research
04/17/2023

Everyone Can Be Picasso? A Computational Framework into the Myth of Human versus AI Painting

The recent advances of AI technology, particularly in AI-Generated Conte...
research
04/11/2023

Distinguishing ChatGPT(-3.5, -4)-generated and human-written papers through Japanese stylometric analysis

Text-generative artificial intelligence (AI), including ChatGPT, equippe...
research
05/17/2023

Large-Scale Text Analysis Using Generative Language Models: A Case Study in Discovering Public Value Expressions in AI Patents

Labeling data is essential for training text classifiers but is often di...
research
04/04/2022

Efficient, Uncertainty-based Moderation of Neural Networks Text Classifiers

To maximize the accuracy and increase the overall acceptance of text cla...

Please sign up or login with your details

Forgot password? Click here to reset