A Systematic Analysis of Morphological Content in BERT Models for Multiple Languages

04/06/2020
by   Daniel Edmiston, et al.
0

This work describes experiments which probe the hidden representations of several BERT-style models for morphological content. The goal is to examine the extent to which discrete linguistic structure, in the form of morphological features and feature values, presents itself in the vector representations and attention distributions of pre-trained language models for five European languages. The experiments contained herein show that (i) Transformer architectures largely partition their embedding space into convex sub-regions highly correlated with morphological feature value, (ii) the contextualized nature of transformer embeddings allows models to distinguish ambiguous morphological forms in many, but not all cases, and (iii) very specific attention head/layer combinations appear to hone in on subject-verb agreement.

READ FULL TEXT
research
09/13/2021

A Massively Multilingual Analysis of Cross-linguality in Shared Embedding Space

In cross-lingual language models, representations for many different lan...
research
06/09/2023

Morphosyntactic probing of multilingual BERT models

We introduce an extensive dataset for multilingual probing of morphologi...
research
11/11/2020

Morphological Disambiguation from Stemming Data

Morphological analysis and disambiguation is an important task and a cru...
research
01/19/2023

Language Embeddings Sometimes Contain Typological Generalizations

To what extent can neural network models learn generalizations about lan...
research
10/18/2022

Post-hoc analysis of Arabic transformer models

Arabic is a Semitic language which is widely spoken with many dialects. ...
research
06/21/2023

Morphological Inflection with Phonological Features

Recent years have brought great advances into solving morphological task...
research
03/16/2022

Morphological Reinflection with Multiple Arguments: An Extended Annotation schema and a Georgian Case Study

In recent years, a flurry of morphological datasets had emerged, most no...

Please sign up or login with your details

Forgot password? Click here to reset