Attacking Neural Text Detectors

02/19/2020

∙

Machine learning based language models have recently made significant progress, which introduces a danger to spread misinformation. To combat this potential danger, several methods have been proposed for detecting text written by these language models. This paper presents two classes of black-box attacks on these detectors, one which randomly replaces characters with homoglyphs, and the other a simple scheme to purposefully misspell words. The homoglyph and misspelling attacks decrease a popular neural text detector's recall on neural text from 97.44 the attacks are transferable to other neural text detectors.

READ FULL TEXT

Attacking Neural Text Detectors

Sign in with Google

Consider DeepAI Pro