Dependency distance minimization predicts compression

09/18/2021
by   Ramon Ferrer-i-Cancho, et al.
0

Dependency distance minimization (DDm) is a well-established principle of word order. It has been predicted theoretically that DDm implies compression, namely the minimization of word lengths. This is a second order prediction because it links a principle with another principle, rather than a principle and a manifestation as in a first order prediction. Here we test that second order prediction with a parallel collection of treebanks controlling for annotation style with Universal Dependencies and Surface-Syntactic Universal Dependencies. To test it, we use a recently introduced score that has many mathematical and statistical advantages with respect to the widely used sum of dependency distances. We find that the prediction is confirmed by the new score when word lengths are measured in phonemes, independently of the annotation style, but not when word lengths are measured in syllables. In contrast, one of the most widely used scores, i.e. the sum of dependency distances, fails to confirm that prediction, showing the weakness of raw dependency distances for research on word order. Finally, our findings expand the theory of natural communication by linking two distinct levels of organization, namely syntax (word order) and word internal structure.

READ FULL TEXT

page 5

page 6

research
06/13/2019

Anti dependency distance minimization in short sequences. A graph theoretic approach

Dependency distance minimization (DDm) is a word order principle favouri...
research
05/28/2017

The placement of the head that maximizes predictability. An information theoretic approach

The minimization of the length of syntactic dependencies is a well-estab...
research
08/22/2022

The optimality of word lengths. Theoretical foundations and an empirical study

One of the most robust patterns found in human languages is Zipf's law o...
research
03/17/2023

Direct and indirect evidence of compression of word lengths. Zipf's law of abbreviation revisited

Zipf's law of abbreviation, the tendency of more frequent words to be sh...
research
07/07/2021

Linear-time calculation of the expected sum of edge lengths in random projective linearizations of trees

The syntactic structure of a sentence is often represented using syntact...
research
08/26/2015

Crossings as a side effect of dependency lengths

The syntactic structure of sentences exhibits a striking regularity: dep...
research
11/26/2022

The distribution of syntactic dependency distances

The syntactic structure of a sentence can be represented as a graph wher...

Please sign up or login with your details

Forgot password? Click here to reset