Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP models

by   Alena Fenogenova, et al.

In the last year, new neural architectures and multilingual pre-trained models have been released for Russian, which led to performance evaluation problems across a range of language understanding tasks. This paper presents Russian SuperGLUE 1.1, an updated benchmark styled after GLUE for Russian NLP models. The new version includes a number of technical, user experience and methodological improvements, including fixes of the benchmark vulnerabilities unresolved in the previous version: novel and improved tests for understanding the meaning of a word in context (RUSSE) along with reading comprehension and common sense reasoning (DaNetQA, RuCoS, MuSeRC). Together with the release of the updated datasets, we improve the benchmark toolkit based on framework for consistent training and evaluation of NLP-models of various architectures which now supports the most recent models for Russian. Finally, we provide the integration of Russian SuperGLUE with a framework for industrial evaluation of the open-source models, MOROCCO (MOdel ResOurCe COmparison), in which the models are evaluated according to the weighted average metric over all tasks, the inference speed, and the occupied amount of RAM. Russian SuperGLUE is publicly available at


page 1

page 2

page 3

page 4


SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

In the last year, new models and methods for pretraining and transfer le...

WYWEB: A NLP Evaluation Benchmark For Classical Chinese

To fully evaluate the overall performance of different NLP models in a g...

Towards Efficient NLP: A Standard Evaluation and A Strong Baseline

Supersized pre-trained language models have pushed the accuracy of vario...

IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding

Although Indonesian is known to be the fourth most frequently used langu...

Evaluating NLP Systems On a Novel Cloze Task: Judging the Plausibility of Possible Fillers in Instructional Texts

Cloze task is a widely used task to evaluate an NLP system's language un...

OpenICL: An Open-Source Framework for In-context Learning

In recent years, In-context Learning (ICL) has gained increasing attenti...

Please sign up or login with your details

Forgot password? Click here to reset