Evaluating Contextualized Language Models for Hungarian

02/22/2021
by   Judit Acs, et al.
9

We present an extended comparison of contextualized language models for Hungarian. We compare huBERT, a Hungarian model against 4 multilingual models including the multilingual BERT model. We evaluate these models through three tasks, morphological probing, POS tagging and NER. We find that huBERT works better than the other models, often by a large margin, particularly near the global optimum (typically at the middle layers). We also find that huBERT tends to generate fewer subwords for one word and that using the last subword for token-level tasks is generally a better choice than using the first one.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

page 9

10/01/2020

Evaluating Multilingual BERT for Estonian

Recently, large pre-trained language models, such as BERT, have reached ...
07/27/2021

gaBERT – an Irish Language Model

The BERT family of neural language models have become highly popular due...
02/22/2021

Subword Pooling Makes a Difference

Contextual word-representations became a standard in modern natural lang...
09/13/2021

Evaluating Transferability of BERT Models on Uralic Languages

Transformer-based language models such as BERT have outperformed previou...
02/16/2021

Searching for Search Errors in Neural Morphological Inflection

Neural sequence-to-sequence models are currently the predominant choice ...
09/11/2021

The Impact of Positional Encodings on Multilingual Compression

In order to preserve word-order information in a non-autoregressive sett...
05/16/2021

How is BERT surprised? Layerwise detection of linguistic anomalies

Transformer language models have shown remarkable ability in detecting w...

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.