A computational psycholinguistic evaluation of the syntactic abilities of Galician BERT models at the interface of dependency resolution and training time

06/06/2022
by   Iria de-Dios-Flores, et al.
0

This paper explores the ability of Transformer models to capture subject-verb and noun-adjective agreement dependencies in Galician. We conduct a series of word prediction experiments in which we manipulate dependency length together with the presence of an attractor noun that acts as a lure. First, we evaluate the overall performance of the existing monolingual and multilingual models for Galician. Secondly, to observe the effects of the training process, we compare the different degrees of achievement of two monolingual BERT models at different training points. We also release their checkpoints and propose an alternative evaluation metric. Our results confirm previous findings by similar works that use the agreement prediction task and provide interesting insights into the number of training steps required by a Transformer model to solve long-distance dependencies.

READ FULL TEXT
research
05/01/2020

Cross-Linguistic Syntactic Evaluation of Word Prediction Models

A range of studies have concluded that neural word prediction models can...
research
12/20/2021

Training dataset and dictionary sizes matter in BERT models: the case of Baltic languages

Large pretrained masked language models have become state-of-the-art sol...
research
03/25/2021

Bertinho: Galician BERT Representations

This paper presents a monolingual BERT model for Galician. We follow the...
research
04/14/2022

Does BERT really agree ? Fine-grained Analysis of Lexical Dependence on a Syntactic Task

Although transformer-based Neural Language Models demonstrate impressive...
research
08/31/2021

Monolingual versus Multilingual BERTology for Vietnamese Extractive Multi-Document Summarization

Recent researches have demonstrated that BERT shows potential in a wide ...
research
11/07/2019

Blockwise Self-Attention for Long Document Understanding

We present BlockBERT, a lightweight and efficient BERT model that is des...
research
01/11/2022

The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild

This paper presents a new training dataset for automatic genre identific...

Please sign up or login with your details

Forgot password? Click here to reset