RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark

10/29/2020
by   Tatiana Shavrina, et al.
0

In this paper, we introduce an advanced Russian general language understanding evaluation benchmark – RussianGLUE. Recent advances in the field of universal language models and transformers require the development of a methodology for their broad diagnostics and testing for general intellectual skills - detection of natural language inference, commonsense reasoning, ability to perform simple logical operations regardless of text subject or lexicon. For the first time, a benchmark of nine tasks, collected and organized analogically to the SuperGLUE methodology, was developed from scratch for the Russian language. We provide baselines, human level evaluation, an open-source framework for evaluating models (https://github.com/RussianNLP/RussianSuperGLUE), and an overall leaderboard of transformer models for the Russian language. Besides, we present the first results of comparing multilingual models in the adapted diagnostic test set and offer the first steps to further expanding or assessing state-of-the-art models independently of language.

READ FULL TEXT
research
12/03/2021

The Catalan Language CLUB

The Catalan Language Understanding Benchmark (CLUB) encompasses various ...
research
04/13/2022

Curriculum: A Broad-Coverage Benchmark for Linguistic Phenomena in Natural Language Understanding

In the age of large transformer language models, linguistic evaluation p...
research
07/11/2023

BLUEX: A benchmark based on Brazilian Leading Universities Entrance eXams

One common trend in recent studies of language models (LMs) is the use o...
research
09/10/2021

Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers

As large-scale, pre-trained language models achieve human-level and supe...
research
05/22/2023

clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents

Recent work has proposed a methodology for the systematic evaluation of ...
research
10/31/2019

Adversarial NLI: A New Benchmark for Natural Language Understanding

We introduce a new large-scale NLI benchmark dataset, collected via an i...
research
06/12/2021

Can Transformer Language Models Predict Psychometric Properties?

Transformer-based language models (LMs) continue to advance state-of-the...

Please sign up or login with your details

Forgot password? Click here to reset