Finding Concept-specific Biases in Form–Meaning Associations

by   Tiago Pimentel, et al.

This work presents an information-theoretic operationalisation of cross-linguistic non-arbitrariness. It is not a new idea that there are small, cross-linguistic associations between the forms and meanings of words. For instance, it has been claimed (Blasi et al., 2016) that the word for "tongue" is more likely than chance to contain the phone [l]. By controlling for the influence of language family and geographic proximity within a very large concept-aligned cross-lingual lexicon, we extend methods previously used to detect within language non-arbitrariness (Pimentel et al., 2019) to measure cross-linguistic associations. We find that there is a significant effect of non-arbitrariness, but it is unsurprisingly small (less than 0.5 according to our information-theoretic estimate). We also provide a concept-level analysis which shows that a quarter of the concepts considered in our work exhibit a significant level of cross-linguistic non-arbitrariness. In sum, the paper provides new methods to detect cross-linguistic associations at scale.


A Massively Multilingual Analysis of Cross-linguality in Shared Embedding Space

In cross-lingual language models, representations for many different lan...

Disambiguatory Signals are Stronger in Word-initial Positions

Psycholinguistic studies of human word processing and lexical access pro...

ValNorm: A New Word Embedding Intrinsic Evaluation Method Reveals Valence Biases are Consistent Across Languages and Over Decades

Word embeddings learn implicit biases from linguistic regularities captu...

A Matter of Framing: The Impact of Linguistic Formalism on Probing Results

Deep pre-trained contextualized encoders like BERT (Delvin et al., 2019)...

An information-theoretic approach to the analysis of location and co-location patterns

We propose a statistical framework to quantify location and co-location ...

Probing Multilingual BERT for Genetic and Typological Signals

We probe the layers in multilingual BERT (mBERT) for phylogenetic and ge...

A visual remote associates test and its validation

The Remote Associates Test (RAT) is a widely used test for measuring cre...