Bertinho: Galician BERT Representations

03/25/2021
by   David Vilares, et al.
0

This paper presents a monolingual BERT model for Galician. We follow the recent trend that shows that it is feasible to build robust monolingual BERT models even for relatively low-resource languages, while performing better than the well-known official multilingual BERT (mBERT). More particularly, we release two monolingual Galician BERT models, built using 6 and 12 transformer layers, respectively; trained with limited resources ( 45 million tokens on a single GPU of 24GB). We then provide an exhaustive evaluation on a number of tasks such as POS-tagging, dependency parsing and named entity recognition. For this purpose, all these tasks are cast in a pure sequence labeling setup in order to run BERT without the need to include any additional layers on top of it (we only use an output classification layer to map the contextualized representations into the predicted label). The experiments show that our models, especially the 12-layer one, outperform the results of mBERT in most tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2021

Training dataset and dictionary sizes matter in BERT models: the case of Baltic languages

Large pretrained masked language models have become state-of-the-art sol...
research
11/21/2022

L3Cube-HindBERT and DevBERT: Pre-Trained BERT Transformer models for Devanagari based Hindi and Marathi Languages

The monolingual Hindi BERT models currently available on the model hub d...
research
03/24/2022

Mono vs Multilingual BERT: A Case Study in Hindi and Marathi Named Entity Recognition

Named entity recognition (NER) is the process of recognising and classif...
research
05/06/2021

Adapting Monolingual Models: Data can be Scarce when Language Similarity is High

For many (minority) languages, the resources needed to train large model...
research
05/15/2021

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter

Lexicon information and pre-trained models, such as BERT, have been comb...
research
08/31/2019

Small and Practical BERT Models for Sequence Labeling

We propose a practical scheme to train a single multilingual sequence la...

Please sign up or login with your details

Forgot password? Click here to reset