Variation and Instability in Dialect-Based Embedding Spaces

03/27/2023
by   Jonathan Dunn, et al.
0

This paper measures variation in embedding spaces which have been trained on different regional varieties of English while controlling for instability in the embeddings. While previous work has shown that it is possible to distinguish between similar varieties of a language, this paper experiments with two follow-up questions: First, does the variety represented in the training data systematically influence the resulting embedding space after training? This paper shows that differences in embeddings across varieties are significantly higher than baseline instability. Second, is such dialect-based variation spread equally throughout the lexicon? This paper shows that specific parts of the lexicon are particularly subject to variation. Taken together, these experiments confirm that embedding spaces are significantly influenced by the dialect represented in the training data. This finding implies that there is semantic variation across dialects, in addition to previously-studied lexical and syntactic variation.

READ FULL TEXT
research
08/31/2023

A variation of Reynolds-Hurkens Paradox

We present a variation of Hurkens paradox, which can itself be seen as a...
research
09/21/2023

Syntactic Variation Across the Grammar: Modelling a Complex Adaptive System

While language is a complex adaptive system, most work on syntactic vari...
research
12/30/2020

Deriving Contextualised Semantic Features from BERT (and Other Transformer Model) Embeddings

Models based on the transformer architecture, such as BERT, have marked ...
research
10/08/2015

Mapping Unseen Words to Task-Trained Embedding Spaces

We consider the supervised training setting in which we learn task-speci...
research
02/26/2023

Bochner integrals and neural networks

A Bochner integral formula is derived that represents a function in term...
research
02/26/2020

Towards Universal Representation Learning for Deep Face Recognition

Recognizing wild faces is extremely hard as they appear with all kinds o...
research
06/07/2022

How to Dissect a Muppet: The Structure of Transformer Embedding Spaces

Pretrained embeddings based on the Transformer architecture have taken t...

Please sign up or login with your details

Forgot password? Click here to reset