Finding Concept-specific Biases in Form–Meaning Associations

04/13/2021
by   Tiago Pimentel, et al.
0

This work presents an information-theoretic operationalisation of cross-linguistic non-arbitrariness. It is not a new idea that there are small, cross-linguistic associations between the forms and meanings of words. For instance, it has been claimed (Blasi et al., 2016) that the word for "tongue" is more likely than chance to contain the phone [l]. By controlling for the influence of language family and geographic proximity within a very large concept-aligned cross-lingual lexicon, we extend methods previously used to detect within language non-arbitrariness (Pimentel et al., 2019) to measure cross-linguistic associations. We find that there is a significant effect of non-arbitrariness, but it is unsurprisingly small (less than 0.5 according to our information-theoretic estimate). We also provide a concept-level analysis which shows that a quarter of the concepts considered in our work exhibit a significant level of cross-linguistic non-arbitrariness. In sum, the paper provides new methods to detect cross-linguistic associations at scale.

READ FULL TEXT
research
08/09/2023

Information-Theoretic Characterization of Vowel Harmony: A Cross-Linguistic Study on Word Lists

We present a cross-linguistic study that aims to quantify vowel harmony ...
research
02/03/2021

Disambiguatory Signals are Stronger in Word-initial Positions

Psycholinguistic studies of human word processing and lexical access pro...
research
09/13/2021

A Massively Multilingual Analysis of Cross-linguality in Shared Embedding Space

In cross-lingual language models, representations for many different lan...
research
05/03/2023

Identifying the Correlation Between Language Distance and Cross-Lingual Transfer in a Multilingual Representation Space

Prior research has investigated the impact of various linguistic feature...
research
12/16/2022

A unified information-theoretic model of EEG signatures of human language processing

We advance an information-theoretic model of human language processing i...
research
04/30/2020

A Matter of Framing: The Impact of Linguistic Formalism on Probing Results

Deep pre-trained contextualized encoders like BERT (Delvin et al., 2019)...
research
07/27/2023

A Geometric Notion of Causal Probing

Large language models rely on real-valued representations of text to mak...

Please sign up or login with your details

Forgot password? Click here to reset