NoCoLA: The Norwegian Corpus of Linguistic Acceptability

06/13/2023
by   Matias Jentoft, et al.
0

While there has been a surge of large language models for Norwegian in recent years, we lack any tool to evaluate their understanding of grammaticality. We present two new Norwegian datasets for this task. NoCoLA_class is a supervised binary classification task where the goal is to discriminate between acceptable and non-acceptable sentences. On the other hand, NoCoLA_zero is a purely diagnostic task for evaluating the grammatical judgement of a language model in a completely zero-shot manner, i.e. without any further training. In this paper, we describe both datasets in detail, show how to use them for different flavors of language models, and conduct a comparative study of the existing Norwegian language models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2023

Beyond the limitations of any imaginable mechanism: large language models and psycholinguistics

Large language models are not detailed models of human linguistic proces...
research
09/09/2020

Comparative Study of Language Models on Cross-Domain Data with Model Agnostic Explainability

With the recent influx of bidirectional contextualized transformer langu...
research
12/20/2022

Go-tuning: Improving Zero-shot Learning Abilities of Smaller Language Models

With increasing scale, large language models demonstrate both quantitati...
research
10/23/2022

RuCoLA: Russian Corpus of Linguistic Acceptability

Linguistic acceptability (LA) attracts the attention of the research com...
research
01/05/2023

Critical Perspectives: A Benchmark Revealing Pitfalls in PerspectiveAPI

Detecting "toxic" language in internet content is a pressing social and ...
research
08/24/2018

Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information

How do neural language models keep track of number agreement between sub...
research
05/19/2023

Visualizing Linguistic Diversity of Text Datasets Synthesized by Large Language Models

Large language models (LLMs) can be used to generate smaller, more refin...

Please sign up or login with your details

Forgot password? Click here to reset