Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

02/08/2022
by   Boxin Wang, et al.
16

Pre-trained language models (LMs) are shown to easily generate toxic language. In this work, we systematically explore domain-adaptive training to reduce the toxicity of language models. We conduct this study on three dimensions: training corpus, model size, and parameter efficiency. For the training corpus, we propose to leverage the generative power of LMs and generate nontoxic datasets for domain-adaptive training, which mitigates the exposure bias and is shown to be more data-efficient than using a curated pre-training corpus. We demonstrate that the self-generation method consistently outperforms the existing baselines across various model sizes on both automatic and human evaluations, even when it uses a 1/3 smaller training corpus. We then comprehensively study detoxifying LMs with parameter sizes ranging from 126M up to 530B (3x larger than GPT-3), a scale that has never been studied before. We find that i) large LMs have similar toxicity levels as smaller ones given the same pre-training corpus, and ii) large LMs require more endeavor to detoxify. We also explore parameter-efficient training methods for detoxification. We demonstrate that adding and training adapter-only layers in LMs not only saves a lot of parameters but also achieves a better trade-off between toxicity and perplexity than whole model adaptation for the large-scale models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2021

Knowledge Inheritance for Pre-trained Language Models

Recent explorations of large-scale pre-trained language models (PLMs) su...
research
04/25/2023

Stable and low-precision training for large-scale vision-language models

We introduce new methods for 1) accelerating and 2) stabilizing training...
research
04/28/2022

RobBERTje: a Distilled Dutch BERT Model

Pre-trained large-scale language models such as BERT have gained a lot o...
research
05/11/2021

Benchmarking down-scaled (not so large) pre-trained language models

Large Transformer-based language models are pre-trained on corpora of va...
research
08/05/2022

Construction of English Resume Corpus and Test with Pre-trained Language Models

Information extraction(IE) has always been one of the essential tasks of...
research
07/20/2022

Doge Tickets: Uncovering Domain-general Language Models by Playing Lottery Tickets

Over-parameterized models, typically pre-trained language models (LMs), ...
research
04/30/2022

Detoxifying Language Models with a Toxic Corpus

Existing studies have investigated the tendency of autoregressive langua...

Please sign up or login with your details

Forgot password? Click here to reset