TAPE: Assessing Few-shot Russian Language Understanding

10/23/2022
by   Ekaterina Taktasheva, et al.
0

Recent advances in zero-shot and few-shot learning have shown promise for a scope of research and practical purposes. However, this fast-growing area lacks standardized evaluation suites for non-English languages, hindering progress outside the Anglo-centric paradigm. To address this line of research, we propose TAPE (Text Attack and Perturbation Evaluation), a novel benchmark that includes six more complex NLU tasks for Russian, covering multi-hop reasoning, ethical concepts, logic and commonsense knowledge. The TAPE's design focuses on systematic zero-shot and few-shot NLU evaluation: (i) linguistic-oriented adversarial attacks and perturbations for analyzing robustness, and (ii) subpopulations for nuanced interpretation. The detailed analysis of testing the autoregressive baselines indicates that simple spelling-based perturbations affect the performance the most, while paraphrasing the input has a more negligible effect. At the same time, the results demonstrate a significant gap between the neural and human baselines for most tasks. We publicly release TAPE (tape-benchmark.com) to foster research on robust LMs that can generalize to new tasks when little to no supervision is available.

READ FULL TEXT

page 23

page 24

page 25

page 26

research
08/17/2020

A Deep Dive into Adversarial Robustness in Zero-Shot Learning

Machine learning (ML) systems have introduced significant advances in va...
research
07/03/2017

Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly

Due to the importance of zero-shot learning, i.e. classifying images whe...
research
05/21/2023

GPT-3.5 vs GPT-4: Evaluating ChatGPT's Reasoning Performance in Zero-shot Learning

Large Language Models (LLMs) have exhibited remarkable performance on va...
research
12/21/2022

JASMINE: Arabic GPT Models for Few-Shot Learning

Task agnostic generative pretraining (GPT) has recently proved promising...
research
01/26/2022

How Robust are Discriminatively Trained Zero-Shot Learning Models?

Data shift robustness has been primarily investigated from a fully super...
research
07/15/2021

FLEX: Unifying Evaluation for Few-Shot NLP

Few-shot NLP research is highly active, yet conducted in disjoint resear...
research
05/23/2023

Is Information Extraction Solved by ChatGPT? An Analysis of Performance, Evaluation Criteria, Robustness and Errors

ChatGPT has stimulated the research boom in the field of large language ...

Please sign up or login with your details

Forgot password? Click here to reset