Czert – Czech BERT-like Model for Language Representation

03/24/2021
by   Jakub Sido, et al.
0

This paper describes the training process of the first Czech monolingual language representation models based on BERT and ALBERT architectures. We pre-train our models on more than 340K of sentences, which is 50 times more than multilingual models that include Czech data. We outperform the multilingual models on 7 out of 10 datasets. In addition, we establish the new state-of-the-art results on seven datasets. At the end, we discuss properties of monolingual and multilingual models based upon our results. We publish all the pre-trained and fine-tuned models freely for the research community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2020

LT@Helsinki at SemEval-2020 Task 12: Multilingual or language-specific BERT?

This paper presents the different models submitted by the LT@Helsinki te...
research
04/19/2022

Mono vs Multilingual BERT for Hate Speech Detection and Text Classification: A Case Study in Marathi

Transformers are the most eminent architectures used for a vast range of...
research
02/25/2021

Are pre-trained text representations useful for multilingual and multi-dimensional language proficiency modeling?

Development of language proficiency models for non-native learners has b...
research
07/19/2020

Mono vs Multilingual Transformer-based Models: a Comparison across Several Language Tasks

BERT (Bidirectional Encoder Representations from Transformers) and ALBER...
research
05/24/2021

RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model

We present RobeCzech, a monolingual RoBERTa language representation mode...
research
06/25/2021

Exploring the Representation of Word Meanings in Context: A Case Study on Homonymy and Synonymy

This paper presents a multilingual study of word meaning representations...

Please sign up or login with your details

Forgot password? Click here to reset