Algorithmic Detection of Computer Generated Text

08/04/2010
by   Allen Lavoie, et al.
0

Computer generated academic papers have been used to expose a lack of thorough human review at several computer science conferences. We assess the problem of classifying such documents. After identifying and evaluating several quantifiable features of academic papers, we apply methods from machine learning to build a binary classifier. In tests with two hundred papers, the resulting classifier correctly labeled papers either as human written or as computer generated with no false classifications of computer generated papers as human and a 2 generated. We believe generalizations of these features are applicable to similar classification problems. While most current text-based spam detection techniques focus on the keyword-based classification of email messages, a new generation of unsolicited computer-generated advertisements masquerade as legitimate postings in online groups, message boards and social news sites. Our results show that taking the formatting and contextual clues offered by these environments into account may be of central importance when selecting features with which to identify such unwanted postings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2020

Machine Identification of High Impact Research through Text and Image Analysis

The volume of academic paper submissions and publications is growing at ...
research
10/14/2017

Popularity of arXiv.org within Computer Science

It may seem surprising that, out of all areas of science, computer scien...
research
10/11/2022

Optimal AdaBoost Converges

The following work is a preprint collection of formal proofs regarding t...
research
04/09/2018

Towards Reproducible Research: Automatic Classification of Empirical Requirements Engineering Papers

Research must be reproducible in order to make an impact on science and ...
research
07/10/2023

Detecting LLM-Generated Text in Computing Education: A Comparative Study for ChatGPT Cases

Due to the recent improvements and wide availability of Large Language M...
research
07/18/2023

Detecting Throat Cancer from Speech Signals Using Machine Learning: A Reproducible Literature Review

In this work we perform a scoping review of the current literature on th...
research
04/11/2023

Distinguishing ChatGPT(-3.5, -4)-generated and human-written papers through Japanese stylometric analysis

Text-generative artificial intelligence (AI), including ChatGPT, equippe...

Please sign up or login with your details

Forgot password? Click here to reset