Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks

02/14/2020
by   Carlos Aspillaga, et al.
0

There has been significant progress in recent years in the field of Natural Language Processing thanks to the introduction of the Transformer architecture. Current state-of-the-art models, via a large number of parameters and pre-training on massive text corpus, have shown impressive results on several downstream tasks. Many researchers have studied previous (non-transformer) models to understand their actual behavior under different scenarios, showing that these models are taking advantage of clues or failures of datasets and that slight perturbations on the input data can severely reduce their performance. In contrast, recent models have not been systematically tested with adversarial-examples in order to show their robustness under severe stress conditions. For that reason, this work evaluates three transformer-based models (RoBERTa, XLNet, and BERT) in Natural Language Inference (NLI) and Question Answering (QA) tasks to know if they are more robust or if they have the same flaws as their predecessors. As a result, our experiments reveal that RoBERTa, XLNet and BERT are more robust than recurrent neural network models to stress tests for both NLI and QA tasks. Nevertheless, they are still very fragile and demonstrate various unexpected behaviors, thus revealing that there is still room for future improvement in this field.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2020

Pre-training Polish Transformer-based Language Models at Scale

Transformer-based language models are now widely used in Natural Languag...
research
06/03/2022

TCE at Qur'an QA 2022: Arabic Language Question Answering Over Holy Qur'an Using a Post-Processed Ensemble of BERT-based Models

In recent years, we witnessed great progress in different tasks of natur...
research
08/31/2019

Knowledge Enhanced Attention for Robust Natural Language Inference

Neural network models have been very successful at achieving high accura...
research
08/13/2023

An Ensemble Approach to Question Classification: Integrating Electra Transformer, GloVe, and LSTM

This paper introduces a novel ensemble approach for question classificat...
research
06/01/2021

Comparing Test Sets with Item Response Theory

Recent years have seen numerous NLP datasets introduced to evaluate the ...
research
06/25/2022

PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance

Large Transformer-based models have exhibited superior performance in va...
research
09/26/2020

Techniques to Improve Q A Accuracy with Transformer-based models on Large Complex Documents

This paper discusses the effectiveness of various text processing techni...

Please sign up or login with your details

Forgot password? Click here to reset