Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

02/14/2023
by   Shrimai Prabhumoye, et al.
0

Pretrained large language models have become indispensable for solving various natural language processing (NLP) tasks. However, safely deploying them in real world applications is challenging because they generate toxic content. To address this challenge, we propose two novel pretraining data augmentation strategies that significantly reduce model toxicity without compromising its utility. Our two strategies are: (1) MEDA: adds raw toxicity score as meta-data to the pretraining samples, and (2) INST: adds instructions to those samples indicating their toxicity. Our results indicate that our best performing strategy (INST) substantially reduces the toxicity probability up to 61 preserving the accuracy on five benchmark NLP tasks as well as improving AUC scores on four bias detection tasks by 1.3 generalizability of our techniques by scaling the number of training samples and the number of model parameters.

READ FULL TEXT

page 7

page 15

research
03/18/2021

All NLP Tasks Are Generation Tasks: A General Pretraining Framework

There have been various types of pretraining architectures including aut...
research
04/01/2019

Using Similarity Measures to Select Pretraining Data for NER

Word vectors and Language Models (LMs) pretrained on a large amount of u...
research
09/22/2021

Small-Bench NLP: Benchmark for small single GPU trained models in Natural Language Processing

Recent progress in the Natural Language Processing domain has given us s...
research
11/07/2021

NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework

Pretrained language models have become the standard approach for many NL...
research
02/09/2022

pNLP-Mixer: an Efficient all-MLP Architecture for Language

Large pre-trained language models drastically changed the natural langua...
research
06/15/2020

To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on Resource Rich Tasks

Pretraining NLP models with variants of Masked Language Model (MLM) obje...
research
10/05/2021

Leveraging the Inductive Bias of Large Language Models for Abstract Textual Reasoning

Large natural language models (such as GPT-3 or T5) demonstrate impressi...

Please sign up or login with your details

Forgot password? Click here to reset