DeepAI AI Chat
Log In Sign Up

Evaluating Contextualized Language Models for Hungarian

by   Judit Acs, et al.

We present an extended comparison of contextualized language models for Hungarian. We compare huBERT, a Hungarian model against 4 multilingual models including the multilingual BERT model. We evaluate these models through three tasks, morphological probing, POS tagging and NER. We find that huBERT works better than the other models, often by a large margin, particularly near the global optimum (typically at the middle layers). We also find that huBERT tends to generate fewer subwords for one word and that using the last subword for token-level tasks is generally a better choice than using the first one.


page 8

page 9


Evaluating Multilingual BERT for Estonian

Recently, large pre-trained language models, such as BERT, have reached ...

Incorporating Context into Subword Vocabularies

Most current popular subword tokenizers are trained based on word freque...

Subword Pooling Makes a Difference

Contextual word-representations became a standard in modern natural lang...

The futility of STILTs for the classification of lexical borrowings in Spanish

The first edition of the IberLEF 2021 shared task on automatic detection...

Searching for Search Errors in Neural Morphological Inflection

Neural sequence-to-sequence models are currently the predominant choice ...

The Scenario Refiner: Grounding subjects in images at the morphological level

Derivationally related words, such as "runner" and "running", exhibit se...

Morphosyntactic probing of multilingual BERT models

We introduce an extensive dataset for multilingual probing of morphologi...

Code Repositories