Attacking Neural Text Detectors

02/19/2020
by   Max Wolff, et al.
0

Machine learning based language models have recently made significant progress, which introduces a danger to spread misinformation. To combat this potential danger, several methods have been proposed for detecting text written by these language models. This paper presents two classes of black-box attacks on these detectors, one which randomly replaces characters with homoglyphs, and the other a simple scheme to purposefully misspell words. The homoglyph and misspelling attacks decrease a popular neural text detector's recall on neural text from 97.44 the attacks are transferable to other neural text detectors.

READ FULL TEXT
research
05/17/2023

Smaller Language Models are Better Black-box Machine-Generated Text Detectors

With the advent of fluent generative language models that can produce co...
research
05/31/2023

Red Teaming Language Model Detectors with Language Models

The prevalence and high capacity of large language models (LLMs) present...
research
02/11/2023

Mutation-Based Adversarial Attacks on Neural Text Detectors

Neural text detectors aim to decide the characteristics that distinguish...
research
05/18/2023

Large Language Models can be Guided to Evade AI-Generated Text Detection

Large Language Models (LLMs) have demonstrated exceptional performance i...
research
05/14/2023

Watermarking Text Generated by Black-Box Language Models

LLMs now exhibit human-like skills in various fields, leading to worries...
research
09/15/2023

Adversarial Attacks on Tables with Entity Swap

The capabilities of large language models (LLMs) have been successfully ...
research
06/07/2023

On the Reliability of Watermarks for Large Language Models

As LLMs become commonplace, machine-generated text has the potential to ...

Please sign up or login with your details

Forgot password? Click here to reset