BERTje: A Dutch BERT Model

12/19/2019
by   Wietse de Vries, et al.
0

The transformer-based pre-trained language model BERT has helped to improve state-of-the-art performance on many natural language processing (NLP) tasks. Using the same architecture and parameters, we developed and evaluated a monolingual Dutch BERT model called BERTje. Compared to the multilingual BERT model, which includes Dutch but is only based on Wikipedia text, BERTje is based on a large and diverse dataset of 2.4 billion tokens. BERTje consistently outperforms the equally-sized multilingual BERT model on downstream NLP tasks (part-of-speech tagging, named-entity recognition, semantic role labeling, and sentiment analysis). Our pre-trained Dutch BERT model is made available at https://github.com/wietsedv/bertje.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2020

GREEK-BERT: The Greeks visiting Sesame Street

Transformer-based language models, such as BERT and its variants, have a...
research
05/26/2020

ParsBERT: Transformer-based Model for Persian Language Understanding

The surge of pre-trained language models has begun a new era in the fiel...
research
02/03/2023

Bioformer: an efficient transformer language model for biomedical text mining

Pretrained language models such as Bidirectional Encoder Representations...
research
03/25/2019

Fine-tune BERT for Extractive Summarization

BERT, a pre-trained Transformer model, has achieved ground-breaking perf...
research
10/25/2019

HUBERT Untangles BERT to Improve Transfer across NLP Tasks

We introduce HUBERT which combines the structured-representational power...
research
05/24/2021

RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model

We present RobeCzech, a monolingual RoBERTa language representation mode...
research
01/28/2021

BERTaú: Itaú BERT for digital customer service

In the last few years, three major topics received increased interest: d...

Please sign up or login with your details

Forgot password? Click here to reset