Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information

08/24/2018
by   Mario Giulianelli, et al.
0

How do neural language models keep track of number agreement between subject and verb? We show that `diagnostic classifiers', trained to predict number from the internal states of a language model, provide a detailed understanding of how, when, and where this information is represented. Moreover, they give us insight into when and where number information is corrupted in cases where the language model ends up making agreement errors. To demonstrate the causal role played by the representations we find, we then use agreement information to influence the course of the LSTM during the processing of difficult sentences. Results from such an intervention reveal a large increase in the language model's accuracy. Together, these results show that diagnostic classifiers give us an unrivalled detailed look into the representation of linguistic information in neural models, and demonstrate that this knowledge can be used to improve their performance.

READ FULL TEXT

page 5

page 6

research
05/03/2020

Influence Paths for Characterizing Subject-Verb Number Agreement in LSTM Language Models

LSTM-based recurrent neural networks are the state-of-the-art for many n...
research
05/23/2023

Assessing Linguistic Generalisation in Language Models: A Dataset for Brazilian Portuguese

Much recent effort has been devoted to creating large-scale language mod...
research
06/10/2021

Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models

Targeted syntactic evaluations have demonstrated the ability of language...
research
06/13/2023

NoCoLA: The Norwegian Corpus of Linguistic Acceptability

While there has been a surge of large language models for Norwegian in r...
research
07/28/2023

The Hydra Effect: Emergent Self-repair in Language Model Computations

We investigate the internal structure of language model computations usi...
research
09/26/2022

Entailment Semantics Can Be Extracted from an Ideal Language Model

Language models are often trained on text alone, without additional grou...
research
06/06/2023

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

We introduce Inference-Time Intervention (ITI), a technique designed to ...

Please sign up or login with your details

Forgot password? Click here to reset