ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine Learning Model for Detecting Short ChatGPT-generated Text

by   Sandra Mitrović, et al.

ChatGPT has the ability to generate grammatically flawless and seemingly-human replies to different types of questions from various domains. The number of its users and of its applications is growing at an unprecedented rate. Unfortunately, use and abuse come hand in hand. In this paper, we study whether a machine learning model can be effectively trained to accurately distinguish between original human and seemingly human (that is, ChatGPT-generated) text, especially when this text is short. Furthermore, we employ an explainable artificial intelligence framework to gain insight into the reasoning behind the model trained to differentiate between ChatGPT-generated and human-generated text. The goal is to analyze model's decisions and determine if any specific patterns or characteristics can be identified. Our study focuses on short online reviews, conducting two experiments comparing human-generated and ChatGPT-generated text. The first experiment involves ChatGPT text generated from custom queries, while the second experiment involves text generated by rephrasing original human-generated reviews. We fine-tune a Transformer-based model and use it to make predictions, which are then explained using SHAP. We compare our model with a perplexity score-based approach and find that disambiguation between human and ChatGPT-generated reviews is more challenging for the ML model when using rephrased text. However, our proposed approach still achieves an accuracy of 79 without specific details, using fancy and atypical vocabulary, impersonal, and typically it does not express feelings.


Real or Fake Text?: Investigating Human Ability to Detect Boundaries Between Human-Written and Machine-Generated Text

As text generated by large language models proliferates, it becomes vita...

CultureBERT: Fine-Tuning Transformer-Based Language Models for Corporate Culture

This paper introduces supervised machine learning to the literature meas...

Detection of Fake Generated Scientific Abstracts

The widespread adoption of Large Language Models and publicly available ...

Identifying Adversarial Sentences by Analyzing Text Complexity

Attackers create adversarial text to deceive both human perception and t...

Detecting Bot-Generated Text by Characterizing Linguistic Accommodation in Human-Bot Interactions

Language generation models' democratization benefits many domains, from ...

Human-in-the-loop model explanation via verbatim boundary identification in generated neighborhoods

The black-box nature of machine learning models limits their use in case...

Please sign up or login with your details

Forgot password? Click here to reset