Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP models

02/15/2022
by   Alena Fenogenova, et al.
0

In the last year, new neural architectures and multilingual pre-trained models have been released for Russian, which led to performance evaluation problems across a range of language understanding tasks. This paper presents Russian SuperGLUE 1.1, an updated benchmark styled after GLUE for Russian NLP models. The new version includes a number of technical, user experience and methodological improvements, including fixes of the benchmark vulnerabilities unresolved in the previous version: novel and improved tests for understanding the meaning of a word in context (RUSSE) along with reading comprehension and common sense reasoning (DaNetQA, RuCoS, MuSeRC). Together with the release of the updated datasets, we improve the benchmark toolkit based on framework for consistent training and evaluation of NLP-models of various architectures which now supports the most recent models for Russian. Finally, we provide the integration of Russian SuperGLUE with a framework for industrial evaluation of the open-source models, MOROCCO (MOdel ResOurCe COmparison), in which the models are evaluated according to the weighted average metric over all tasks, the inference speed, and the occupied amount of RAM. Russian SuperGLUE is publicly available at https://russiansuperglue.com/.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2019

Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets

Inspired by the success of the General Language Understanding Evaluation...
research
05/02/2019

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

In the last year, new models and methods for pretraining and transfer le...
research
05/23/2023

WYWEB: A NLP Evaluation Benchmark For Classical Chinese

To fully evaluate the overall performance of different NLP models in a g...
research
10/13/2021

Towards Efficient NLP: A Standard Evaluation and A Strong Baseline

Supersized pre-trained language models have pushed the accuracy of vario...
research
09/11/2020

IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding

Although Indonesian is known to be the fourth most frequently used langu...
research
12/03/2021

Evaluating NLP Systems On a Novel Cloze Task: Judging the Plausibility of Possible Fillers in Instructional Texts

Cloze task is a widely used task to evaluate an NLP system's language un...
research
03/06/2023

OpenICL: An Open-Source Framework for In-context Learning

In recent years, In-context Learning (ICL) has gained increasing attenti...

Please sign up or login with your details

Forgot password? Click here to reset