DUMB: A Benchmark for Smart Evaluation of Dutch Models

05/22/2023
by   Wietse de Vries, et al.
0

We introduce the Dutch Model Benchmark: DUMB. The benchmark includes a diverse set of datasets for low-, medium- and high-resource tasks. The total set of eight tasks include three tasks that were previously not available in Dutch. Instead of relying on a mean score across tasks, we propose Relative Error Reduction (RER), which compares the DUMB performance of models to a strong baseline which can be referred to in the future even when assessing different sets of models. Through a comparison of 14 pre-trained models (mono- and multi-lingual, of varying sizes), we assess the internal consistency of the benchmark tasks, as well as the factors that likely enable high performance. Our results indicate that current Dutch monolingual models under-perform and suggest training larger Dutch models with other architectures and pre-training objectives. At present, the highest performance is achieved by DeBERTaV3 (large), XLM-R (large) and mDeBERTaV3 (base). In addition to highlighting best strategies for training larger Dutch models, DUMB will foster further research on Dutch. A public leaderboard is available at https://dumbench.nl.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2020

XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

In this paper, we introduce XGLUE, a new benchmark dataset to train larg...
research
11/18/2021

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

This paper presents a new pre-trained language model, DeBERTaV3, which i...
research
07/21/2021

Comparison of Czech Transformers on Text Classification Tasks

In this paper, we present our progress in pre-training monolingual Trans...
research
02/01/2023

CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models

Code generation models based on the pre-training and fine-tuning paradig...
research
11/21/2022

L3Cube-HindBERT and DevBERT: Pre-Trained BERT Transformer models for Devanagari based Hindi and Marathi Languages

The monolingual Hindi BERT models currently available on the model hub d...
research
07/22/2022

Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark

Most existing works on few-shot object detection (FSOD) focus on a setti...
research
08/07/2023

XFlow: Benchmarking Flow Behaviors over Graphs

The occurrence of diffusion on a graph is a prevalent and significant ph...

Please sign up or login with your details

Forgot password? Click here to reset