Linguistic dependencies and statistical dependence

04/18/2021
by   Jacob Louis Hoover, et al.
0

What is the relationship between linguistic dependencies and statistical dependence? Building on earlier work in NLP and cognitive science, we study this question. We introduce a contextualized version of pointwise mutual information (CPMI), using pretrained language models to estimate probabilities of words in context. Extracting dependency trees which maximize CPMI, we compare the resulting structures against gold dependencies. Overall, we find that these maximum-CPMI trees correspond to linguistic dependencies more often than trees extracted from non-contextual PMI estimate, but only roughly as often as a simple baseline formed by connecting adjacent words. We also provide evidence that the extent to which the two kinds of dependency align cannot be explained by the distance between words or by the category of the dependency relation. Finally, our analysis sheds some light on the differences between large pretrained language models, specifically in the kinds of inductive biases they encode.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

04/12/2021

On the Inductive Bias of Masked Language Modeling: From Statistical to Syntactic Dependencies

We study how masking and predicting tokens in an unsupervised fashion ca...
04/29/2020

Do Neural Language Models Show Preferences for Syntactic Formalisms?

Recent work on the interpretability of deep neural language models has c...
10/11/2020

Do Language Embeddings Capture Scales?

Pretrained Language Models (LMs) have been shown to possess significant ...
04/25/2021

Reranking Machine Translation Hypotheses with Structured and Web-based Language Models

In this paper, we investigate the use of linguistically motivated and co...
07/30/2015

Information-theoretical analysis of the statistical dependencies among three variables: Applications to written language

We develop the information-theoretical concepts required to study the st...
09/02/2019

All Roads Lead to UD: Converting Stanford and Penn Parses to English Universal Dependencies with Multilayer Annotations

We describe and evaluate different approaches to the conversion of gold ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.