Playing with Words at the National Library of Sweden – Making a Swedish BERT

07/03/2020
by   Martin Malmsten, et al.
0

This paper introduces the Swedish BERT ("KB-BERT") developed by the KBLab for data-driven research at the National Library of Sweden (KB). Building on recent efforts to create transformer-based BERT models for languages other than English, we explain how we used KB's collections to create and train a new language-specific BERT model for Swedish. We also present the results of our model in comparison with existing models - chiefly that produced by the Swedish Public Employment Service, Arbetsförmedlingen, and Google's multilingual M-BERT - where we demonstrate that KB-BERT outperforms these in a range of NLP tasks from named entity recognition (NER) to part-of-speech tagging (POS). Our discussion highlights the difficulties that continue to exist given the lack of training data and testbeds for smaller languages like Swedish. We release our model for further exploration and research here: https://github.com/Kungbib/swedish-bert-models .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2020

EstBERT: A Pretrained Language-Specific BERT for Estonian

This paper presents EstBERT, a large pretrained transformer-based langua...
research
04/19/2021

Operationalizing a National Digital Library: The Case for a Norwegian Transformer Model

In this work, we show the process of building a large-scale training set...
research
05/24/2021

Diacritics Restoration using BERT with Analysis on Czech language

We propose a new architecture for diacritics restoration based on contex...
research
12/15/2019

Multilingual is not enough: BERT for Finnish

Deep learning-based language models pretrained on large unannotated text...
research
02/28/2020

AraBERT: Transformer-based Model for Arabic Language Understanding

The Arabic language is a morphologically rich and complex language with ...
research
11/19/2019

Towards Lingua Franca Named Entity Recognition with BERT

Information extraction is an important task in NLP, enabling the automat...
research
01/28/2021

BERTaú: Itaú BERT for digital customer service

In the last few years, three major topics received increased interest: d...

Please sign up or login with your details

Forgot password? Click here to reset