Evaluating Contextualized Language Models for Hungarian

02/22/2021
by   Judit Acs, et al.
9

We present an extended comparison of contextualized language models for Hungarian. We compare huBERT, a Hungarian model against 4 multilingual models including the multilingual BERT model. We evaluate these models through three tasks, morphological probing, POS tagging and NER. We find that huBERT works better than the other models, often by a large margin, particularly near the global optimum (typically at the middle layers). We also find that huBERT tends to generate fewer subwords for one word and that using the last subword for token-level tasks is generally a better choice than using the first one.

READ FULL TEXT

page 8

page 9

research
10/01/2020

Evaluating Multilingual BERT for Estonian

Recently, large pre-trained language models, such as BERT, have reached ...
research
10/13/2022

Incorporating Context into Subword Vocabularies

Most current popular subword tokenizers are trained based on word freque...
research
02/22/2021

Subword Pooling Makes a Difference

Contextual word-representations became a standard in modern natural lang...
research
09/17/2021

The futility of STILTs for the classification of lexical borrowings in Spanish

The first edition of the IberLEF 2021 shared task on automatic detection...
research
02/16/2021

Searching for Search Errors in Neural Morphological Inflection

Neural sequence-to-sequence models are currently the predominant choice ...
research
09/20/2023

The Scenario Refiner: Grounding subjects in images at the morphological level

Derivationally related words, such as "runner" and "running", exhibit se...
research
06/09/2023

Morphosyntactic probing of multilingual BERT models

We introduce an extensive dataset for multilingual probing of morphologi...

Please sign up or login with your details

Forgot password? Click here to reset