Schrödinger's Tree – On Syntax and Neural Language Models

by   Artur Kulmizev, et al.

In the last half-decade, the field of natural language processing (NLP) has undergone two major transitions: the switch to neural networks as the primary modeling paradigm and the homogenization of the training regime (pre-train, then fine-tune). Amidst this process, language models have emerged as NLP's workhorse, displaying increasingly fluent generation capabilities and proving to be an indispensable means of knowledge transfer downstream. Due to the otherwise opaque, black-box nature of such models, researchers have employed aspects of linguistic theory in order to characterize their behavior. Questions central to syntax – the study of the hierarchical structure of language – have factored heavily into such work, shedding invaluable insights about models' inherent biases and their ability to make human-like generalizations. In this paper, we attempt to take stock of this growing body of literature. In doing so, we observe a lack of clarity across numerous dimensions, which influences the hypotheses that researchers form, as well as the conclusions they draw from their findings. To remedy this, we urge researchers make careful considerations when investigating coding properties, selecting representations, and evaluating via downstream tasks. Furthermore, we outline the implications of the different types of research questions exhibited in studies on syntax, as well as the inherent pitfalls of aggregate metrics. Ultimately, we hope that our discussion adds nuance to the prospect of studying language models and paves the way for a less monolithic perspective on syntax in this context.



There are no comments yet.


page 1

page 2

page 3

page 4


Probing Linguistic Information For Logical Inference In Pre-trained Language Models

Progress in pre-trained language models has led to a surge of impressive...

LaoPLM: Pre-trained Language Models for Lao

Trained on the large corpus, pre-trained language models (PLMs) can capt...

Syntax Representation in Word Embeddings and Neural Networks – A Survey

Neural networks trained on natural language processing tasks capture syn...

Shallow Syntax in Deep Water

Shallow syntax provides an approximation of phrase-syntactic structure o...

On the Universality of Deep COntextual Language Models

Deep Contextual Language Models (LMs) like ELMO, BERT, and their success...

Measuring Fairness with Biased Rulers: A Survey on Quantifying Biases in Pretrained Language Models

An increasing awareness of biased patterns in natural language processin...

When Bert Forgets How To POS: Amnesic Probing of Linguistic Properties and MLM Predictions

A growing body of work makes use of probing in order to investigate the ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.