DIRECTOR: Generator-Classifiers For Supervised Language Modeling

06/15/2022
by   Kushal Arora, et al.
11

Current language models achieve low perplexity but their resulting generations still suffer from toxic responses, repetitiveness and contradictions. The standard language modeling setup fails to address these issues. In this paper, we introduce a new architecture, Director, that consists of a unified generator-classifier with both a language modeling and a classification head for each output token. Training is conducted jointly using both standard language modeling data, and data labeled with desirable and undesirable sequences. Experiments in several settings show that the model has competitive training and decoding speed compared to standard language models while yielding superior results, alleviating known issues while maintaining generation quality. It also outperforms existing model guiding approaches in terms of both accuracy and efficiency.

READ FULL TEXT

page 12

page 13

research
02/24/2021

When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute

Large language models have become increasingly difficult to train becaus...
research
06/01/2016

Generalizing and Hybridizing Count-based and Neural Language Models

Language models (LMs) are statistical models that calculate probabilitie...
research
08/24/2017

A Study on Neural Network Language Modeling

An exhaustive study on neural network language modeling (NNLM) is perfor...
research
12/14/2021

Towards Interactive Language Modeling

Interaction between caregivers and children plays a critical role in hum...
research
09/01/2019

Repurposing Decoder-Transformer Language Models for Abstractive Summarization

Neural network models have shown excellent fluency and performance when ...
research
05/16/2023

Application-Agnostic Language Modeling for On-Device ASR

On-device automatic speech recognition systems face several challenges c...
research
08/16/2023

Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation

Large language models (LLMs) have been widely used in various applicatio...

Please sign up or login with your details

Forgot password? Click here to reset