Leashing the Inner Demons: Self-Detoxification for Language Models

03/06/2022
by   Canwen Xu, et al.
0

Language models (LMs) can reproduce (or amplify) toxic language seen during training, which poses a risk to their practical application. In this paper, we conduct extensive experiments to study this phenomenon. We analyze the impact of prompts, decoding strategies and training corpora on the output toxicity. Based on our findings, we propose a simple yet effective method for language models to "detoxify" themselves without an additional large corpus or external discriminator. Compared to a supervised baseline, our proposed method shows better toxicity reduction with good generation quality in the generated content under multiple settings. Warning: some examples shown in the paper may contain uncensored offensive content.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/06/2021

What's in the Box? An Analysis of Undesirable Content in the Common Crawl Corpus

Whereas much of the success of the current generation of neural language...
research
02/28/2021

Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

When trained on large, unfiltered crawls from the internet, language mod...
research
04/30/2022

Detoxifying Language Models with a Toxic Corpus

Existing studies have investigated the tendency of autoregressive langua...
research
09/07/2023

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Despite their impressive capabilities, large language models (LLMs) are ...
research
05/04/2020

A New Data Normalization Method to Improve Dialogue Generation by Minimizing Long Tail Effect

Recent neural models have shown significant progress in dialogue generat...
research
08/06/2023

Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies

Large language models (LLMs) have demonstrated remarkable performance ac...
research
11/05/2016

Reference-Aware Language Models

We propose a general class of language models that treat reference as an...

Please sign up or login with your details

Forgot password? Click here to reset