Contrasting Linguistic Patterns in Human and LLM-Generated Text

08/17/2023
by   Alberto Muñoz-Ortiz, et al.
0

We conduct a quantitative analysis contrasting human-written English news text with comparable large language model (LLM) output from 4 LLMs from the LLaMa family. Our analysis spans several measurable linguistic dimensions, including morphological, syntactic, psychometric and sociolinguistic aspects. The results reveal various measurable differences between human and AI-generated texts. Among others, human texts exhibit more scattered sentence length distributions, a distinct use of dependency and constituent types, shorter constituents, and more aggressive emotions (fear, disgust) than LLM-generated texts. LLM outputs use more numbers, symbols and auxiliaries (suggesting objective language) than human texts, as well as more pronouns. The sexist bias prevalent in human text is also expressed by LLMs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2023

Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts

Rapidly increasing quality of AI-generated content makes it difficult to...
research
06/07/2023

Long-form analogies generated by chatGPT lack human-like psycholinguistic properties

Psycholinguistic analyses provide a means of evaluating large language m...
research
05/02/2018

Robustness of sentence length measures in written texts

Hidden structural patterns in written texts have been subject of conside...
research
08/22/2023

Using ChatGPT as a CAT tool in Easy Language translation

This study sets out to investigate the feasibility of using ChatGPT to t...
research
08/25/2020

Comparative Computational Analysis of Global Structure in Canonical, Non-Canonical and Non-Literary Texts

This study investigates global properties of literary and non-literary t...
research
06/05/2020

Human or Machine: Automating Human Likeliness Evaluation of NLG Texts

Automatic evaluation of various text quality criteria produced by data-d...
research
01/16/2023

CRYPTEXT: Database and Interactive Toolkit of Human-Written Text Perturbations in the Wild

User-generated textual contents on the Internet are often noisy, erroneo...

Please sign up or login with your details

Forgot password? Click here to reset