Large-Scale Contextualised Language Modelling for Norwegian

04/13/2021
by   Andrey Kutuzov, et al.
0

We present the ongoing NorLM initiative to support the creation and use of very large contextualised language models for Norwegian (and in principle other Nordic languages), including a ready-to-use software environment, as well as an experience report for data preparation and training. This paper introduces the first large-scale monolingual language models for Norwegian, based on both the ELMo and BERT frameworks. In addition to detailing the training process, we present contrastive benchmark results on a suite of NLP tasks for Norwegian. For additional background and access to the data, models, and software, please see http://norlm.nlpl.eu

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/06/2023

NorBench – A Benchmark for Norwegian Language Models

We present NorBench: a streamlined suite of NLP tasks and probes for eva...
research
10/22/2020

Towards Fully Bilingual Deep Language Modeling

Language models based on deep neural networks have facilitated great adv...
research
04/13/2022

Scalable Training of Language Models using JAX pjit and TPUv4

Modern large language models require distributed training strategies due...
research
09/18/2020

The birth of Romanian BERT

Large-scale pretrained language models have become ubiquitous in Natural...
research
04/19/2023

A Theory on Adam Instability in Large-Scale Machine Learning

We present a theory for the previously unexplained divergent behavior no...
research
06/27/2023

SparseOptimizer: Sparsify Language Models through Moreau-Yosida Regularization and Accelerate via Compiler Co-design

This paper introduces SparseOptimizer, a novel deep learning optimizer t...
research
09/22/2017

Improving Language Modelling with Noise-contrastive estimation

Neural language models do not scale well when the vocabulary is large. N...

Please sign up or login with your details

Forgot password? Click here to reset