The State of Profanity Obfuscation in Natural Language Processing

10/14/2022
by   Debora Nozza, et al.
0

Work on hate speech has made the consideration of rude and harmful examples in scientific publications inevitable. This raises various problems, such as whether or not to obscure profanities. While science must accurately disclose what it does, the unwarranted spread of hate speech is harmful to readers, and increases its internet frequency. While maintaining publications' professional appearance, obfuscating profanities makes it challenging to evaluate the content, especially for non-native speakers. Surveying 150 ACL papers, we discovered that obfuscation is usually employed for English but not other languages, and even so quite uneven. We discuss the problems with obfuscation and suggest a multilingual community resource called PrOf that has a Python module to standardize profanity obfuscation processes. We believe PrOf can help scientific publication policies to make hate speech work accessible and comparable, irrespective of language.

READ FULL TEXT
research
07/15/2019

Non-English language publications in Citation Indexes – quantity and quality

We analyzed publications data in WoS and Scopus to compare publications ...
research
08/23/2019

Deploying Technology to Save Endangered Languages

Computer scientists working on natural language processing, native speak...
research
09/26/2017

Integration of Japanese Papers Into the DBLP Data Set

If someone is looking for a certain publication in the field of computer...
research
02/22/2018

LIDIOMS: A Multilingual Linked Idioms Data Set

In this paper, we describe the LIDIOMS data set, a multilingual RDF repr...
research
10/09/2020

Langsmith: An Interactive Academic Text Revision System

Despite the current diversity and inclusion initiatives in the academic ...
research
06/04/2021

MexPub: Deep Transfer Learning for Metadata Extraction from German Publications

Extracting metadata from scientific papers can be considered a solved pr...
research
04/29/2022

Handling and Presenting Harmful Text

Textual data can pose a risk of serious harm. These harms can be categor...

Please sign up or login with your details

Forgot password? Click here to reset