Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphrase Detection

03/23/2021
by   Jan Philip Wahle, et al.
0

The rise of language models such as BERT allows for high-quality text paraphrasing. This is a problem to academic integrity, as it is difficult to differentiate between original and machine-generated content. We propose a benchmark consisting of paraphrased articles using recent language models relying on the Transformer architecture. Our contribution fosters future research of paraphrase detection systems as it offers a large collection of aligned original and paraphrased documents, a study regarding its structure, classification experiments with state-of-the-art systems, and we make our findings publicly available.

READ FULL TEXT
research
10/07/2022

How Large Language Models are Transforming Machine-Paraphrased Plagiarism

The recent success of large language models for text generation poses a ...
research
06/02/2023

LyricSIM: A novel Dataset and Benchmark for Similarity Detection in Spanish Song LyricS

In this paper, we present a new dataset and benchmark tailored to the ta...
research
10/18/2021

Deep Transfer Learning Beyond: Transformer Language Models in Information Systems Research

AI is widely thought to be poised to transform business, yet current per...
research
03/22/2021

Identifying Machine-Paraphrased Plagiarism

Employing paraphrasing tools to conceal plagiarized text is a severe thr...
research
04/12/2023

Galactic ChitChat: Using Large Language Models to Converse with Astronomy Literature

We demonstrate the potential of the state-of-the-art OpenAI GPT-4 large ...
research
03/24/2023

Paraphrase Detection: Human vs. Machine Content

The growing prominence of large language models, such as GPT-4 and ChatG...
research
05/23/2020

A First Step Towards Content Protecting Plagiarism Detection

Plagiarism detection systems are essential tools for safeguarding academ...

Please sign up or login with your details

Forgot password? Click here to reset