SwissBERT: The Multilingual Language Model for Switzerland

03/23/2023
by   Jannis Vamvas, et al.
0

We present SwissBERT, a masked language model created specifically for processing Switzerland-related text. SwissBERT is a pre-trained model that we adapted to news articles written in the national languages of Switzerland – German, French, Italian, and Romansh. We evaluate SwissBERT on natural language understanding tasks related to Switzerland and find that it tends to outperform previous models on these tasks, especially when processing contemporary news and/or Romansh Grischun. Since SwissBERT uses language adapters, it may be extended to Swiss German dialects in future work. The model and our open-source code are publicly released at https://github.com/ZurichNLP/swissbert.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2022

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive languag...
research
06/07/2023

Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages

This work introduces Zambezi Voice, an open-source multilingual speech r...
research
06/07/2023

Can current NLI systems handle German word order? Investigating language model performance on a new German challenge set of minimal pairs

Compared to English, German word order is freer and therefore poses addi...
research
11/12/2022

AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities

In this work, we present a conceptually simple and effective method to t...
research
06/09/2022

SsciBERT: A Pre-trained Language Model for Social Science Texts

The academic literature of social sciences is the literature that record...
research
03/23/2020

Improving Yorùbá Diacritic Restoration

Yorùbá is a widely spoken West African language with a writing system ri...

Please sign up or login with your details

Forgot password? Click here to reset