On Sensitivity of Deep Learning Based Text Classification Algorithms to Practical Input Perturbations

01/02/2022
by   Aamir Miyajiwala, et al.
0

Text classification is a fundamental Natural Language Processing task that has a wide variety of applications, where deep learning approaches have produced state-of-the-art results. While these models have been heavily criticized for their black-box nature, their robustness to slight perturbations in input text has been a matter of concern. In this work, we carry out a data-focused study evaluating the impact of systematic practical perturbations on the performance of the deep learning based text classification models like CNN, LSTM, and BERT-based algorithms. The perturbations are induced by the addition and removal of unwanted tokens like punctuation and stop-words that are minimally associated with the final performance of the model. We show that these deep learning approaches including BERT are sensitive to such legitimate input perturbations on four standard benchmark datasets SST2, TREC-6, BBC News, and tweet_eval. We observe that BERT is more susceptible to the removal of tokens as compared to the addition of tokens. Moreover, LSTM is slightly more sensitive to input perturbations as compared to CNN based model. The work also serves as a practical guide to assessing the impact of discrepancies in train-test conditions on the final performance of models.

READ FULL TEXT

page 10

page 11

page 12

research
01/30/2021

ShufText: A Simple Black Box Approach to Evaluate the Fragility of Text Classification Models

Text classification is the most basic natural language processing task. ...
research
04/04/2020

BAE: BERT-based Adversarial Examples for Text Classification

Modern text classification models are susceptible to adversarial example...
research
01/18/2022

Hierarchical Neural Network Approaches for Long Document Classification

Text classification algorithms investigate the intricate relationships b...
research
01/13/2021

Experimental Evaluation of Deep Learning models for Marathi Text Classification

The Marathi language is one of the prominent languages used in India. It...
research
07/30/2023

LaFiCMIL: Rethinking Large File Classification from the Perspective of Correlated Multiple Instance Learning

Transformer-based models have revolutionized the performance of a wide r...
research
08/05/2022

Model Blending for Text Classification

Deep neural networks (DNNs) have proven successful in a wide variety of ...
research
12/14/2019

Towards Robust Toxic Content Classification

Toxic content detection aims to identify content that can offend or harm...

Please sign up or login with your details

Forgot password? Click here to reset