Limits of Detecting Text Generated by Large-Scale Language Models

02/09/2020
by   Lav R. Varshney, et al.
0

Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated. We show that error exponents for particular language models are bounded in terms of their perplexity, a standard measure of language generation performance. Under the assumption that human language is stationary and ergodic, the formulation is extended from considering specific language models to considering maximum likelihood language models, among the class of k-order Markov approximations; error probabilities are characterized. Some discussion of incorporating semantic side information is also given.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2023

Beyond the limitations of any imaginable mechanism: large language models and psycholinguistics

Large language models are not detailed models of human linguistic proces...
research
05/24/2023

LLMDet: A Large Language Models Detection Tool

With the advancement of generative language models, the generated text h...
research
04/25/2022

AI Personification: Estimating the Personality of Language Models

Technology for open-ended language generation, a key application of arti...
research
05/11/2023

Autocorrelations Decay in Texts and Applicability Limits of Language Models

We show that the laws of autocorrelations decay in texts are closely rel...
research
05/05/2023

Simulating H.P. Lovecraft horror literature with the ChatGPT large language model

In this paper, we present a novel approach to simulating H.P. Lovecraft'...
research
06/16/2022

Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models

Large language models produce human-like text that drive a growing numbe...
research
02/23/2023

Language Model Crossover: Variation through Few-Shot Prompting

This paper pursues the insight that language models naturally enable an ...

Please sign up or login with your details

Forgot password? Click here to reset