Probing for Constituency Structure in Neural Language Models

04/13/2022
by   David Arps, et al.
0

In this paper, we investigate to which extent contextual neural language models (LMs) implicitly learn syntactic structure. More concretely, we focus on constituent structure as represented in the Penn Treebank (PTB). Using standard probing techniques based on diagnostic classifiers, we assess the accuracy of representing constituents of different categories within the neuron activations of a LM such as RoBERTa. In order to make sure that our probe focuses on syntactic knowledge and not on implicit semantic generalizations, we also experiment on a PTB version that is obtained by randomly replacing constituents with each other while keeping syntactic structure, i.e., a semantically ill-formed but syntactically well-formed version of the PTB. We find that 4 pretrained transfomer LMs obtain high performance on our probing tasks even on manipulated data, suggesting that semantic and syntactic knowledge in their representations can be separated and that constituency information is in fact learned by the LM. Moreover, we show that a complete constituency tree can be linearly separated from LM representations.

READ FULL TEXT
research
09/30/2021

Syntactic Persistence in Language Models: Priming as a Window into Abstract Language Representations

We investigate the extent to which modern, neural language models are su...
research
10/25/2022

Causal Analysis of Syntactic Agreement Neurons in Multilingual Language Models

Structural probing work has found evidence for latent syntactic informat...
research
09/23/2019

Using Priming to Uncover the Organization of Syntactic Representations in Neural Language Models

Neural language models (LMs) perform well on tasks that require sensitiv...
research
11/17/2022

Probing for Incremental Parse States in Autoregressive Language Models

Next-word predictions from autoregressive neural language models show re...
research
05/21/2021

A Non-Linear Structural Probe

Probes are models devised to investigate the encoding of knowledge – e.g...
research
04/30/2018

Syntactic Patterns Improve Information Extraction for Medical Search

Medical professionals search the published literature by specifying the ...
research
11/11/2022

The Architectural Bottleneck Principle

In this paper, we seek to measure how much information a component in a ...

Please sign up or login with your details

Forgot password? Click here to reset