Language Model Detoxification in Dialogue with Contextualized Stance Control

01/25/2023
by   Jing Qian, et al.
0

To reduce the toxic degeneration in a pretrained Language Model (LM), previous work on Language Model detoxification has focused on reducing the toxicity of the generation itself (self-toxicity) without consideration of the context. As a result, a type of implicit offensive language where the generations support the offensive language in the context is ignored. Different from the LM controlling tasks in previous work, where the desired attributes are fixed for generation, the desired stance of the generation depends on the offensiveness of the context. Therefore, we propose a novel control method to do context-dependent detoxification with the stance taken into consideration. We introduce meta prefixes to learn the contextualized stance control strategy and to generate the stance control prefix according to the input context. The generated stance prefix is then combined with the toxicity control prefix to guide the response generation. Experimental results show that our proposed method can effectively learn the context-dependent stance control strategies while keeping a low self-toxicity of the underlying LM.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2022

Controllable Natural Language Generation with Contrastive Prefixes

To guide the generation of large pretrained language models (LM), previo...
research
03/02/2022

Controlling the Focus of Pretrained Language Generation Models

The finetuning of pretrained transformer-based language generation model...
research
10/15/2020

Pretrained Language Models for Dialogue Generation with Multiple Input Sources

Large-scale pretrained language models have achieved outstanding perform...
research
12/10/2020

Towards Neural Programming Interfaces

It is notoriously difficult to control the behavior of artificial neural...
research
05/24/2022

PoeLM: A Meter- and Rhyme-Controllable Language Model for Unsupervised Poetry Generation

Formal verse poetry imposes strict constraints on the meter and rhyme sc...
research
11/10/2022

Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control

Pretrained language models have demonstrated extraordinary capabilities ...
research
03/12/2023

Self-planning Code Generation with Large Language Model

Although large language models have demonstrated impressive ability in c...

Please sign up or login with your details

Forgot password? Click here to reset