Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks

05/03/2021
by   Tatyana Iazykova, et al.
0

Leader-boards like SuperGLUE are seen as important incentives for active development of NLP, since they provide standard benchmarks for fair comparison of modern language models. They have driven the world's best engineering teams as well as their resources to collaborate and solve a set of tasks for general language understanding. Their performance scores are often claimed to be close to or even higher than the human performance. These results encouraged more thorough analysis of whether the benchmark datasets featured any statistical cues that machine learning based language models can exploit. For English datasets, it was shown that they often contain annotation artifacts. This allows solving certain tasks with very simple rules and achieving competitive rankings. In this paper, a similar analysis was done for the Russian SuperGLUE (RSG), a recently published benchmark set and leader-board for Russian natural language understanding. We show that its test datasets are vulnerable to shallow heuristics. Often approaches based on simple rules outperform or come close to the results of the notorious pre-trained language models like GPT-3 or BERT. It is likely (as the simplest explanation) that a significant part of the SOTA models performance in the RSG leader-board is due to exploiting these shallow heuristics and that has nothing in common with real language understanding. We provide a set of recommendations on how to improve these datasets, making the RSG leader-board even more representative of the real progress in Russian NLU.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/03/2021

The Catalan Language CLUB

The Catalan Language Understanding Benchmark (CLUB) encompasses various ...
research
09/15/2021

Can Machines Read Coding Manuals Yet? – A Benchmark for Building Better Language Models for Code Understanding

Code understanding is an increasingly important application of Artificia...
research
08/25/2023

Leveraging Knowledge and Reinforcement Learning for Enhanced Reliability of Language Models

The Natural Language Processing(NLP) community has been using crowd sour...
research
05/24/2022

FLUTE: Figurative Language Understanding and Textual Explanations

In spite of the prevalence of figurative language, transformer-based mod...
research
07/20/2022

The Game of Hidden Rules: A New Kind of Benchmark Challenge for Machine Learning

As machine learning (ML) is more tightly woven into society, it is imper...
research
06/18/2023

Automated Assignment and Classification of Software Issues

Software issues contain units of work to fix, improve or create new thre...

Please sign up or login with your details

Forgot password? Click here to reset