ShufText: A Simple Black Box Approach to Evaluate the Fragility of Text Classification Models

01/30/2021
by   Rutuja Taware, et al.
0

Text classification is the most basic natural language processing task. It has a wide range of applications ranging from sentiment analysis to topic classification. Recently, deep learning approaches based on CNN, LSTM, and Transformers have been the de facto approach for text classification. In this work, we highlight a common issue associated with these approaches. We show that these systems are over-reliant on the important words present in the text that are useful for classification. With limited training data and discriminative training strategy, these approaches tend to ignore the semantic meaning of the sentence and rather just focus on keywords or important n-grams. We propose a simple black box technique ShutText to present the shortcomings of the model and identify the over-reliance of the model on keywords. This involves randomly shuffling the words in a sentence and evaluating the classification accuracy. We see that on common text classification datasets there is very little effect of shuffling and with high probability these models predict the original class. We also evaluate the effect of language model pretraining on these models and try to answer questions around model robustness to out of domain sentences. We show that simple models based on CNN or LSTM as well as complex models like BERT are questionable in terms of their syntactic and semantic understanding.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/02/2022

On Sensitivity of Deep Learning Based Text Classification Algorithms to Practical Input Perturbations

Text classification is a fundamental Natural Language Processing task th...
research
01/13/2021

Experimental Evaluation of Deep Learning models for Marathi Text Classification

The Marathi language is one of the prominent languages used in India. It...
research
09/06/2019

Natural Adversarial Sentence Generation with Gradient-based Perturbation

This work proposes a novel algorithm to generate natural language advers...
research
09/16/2019

Short-Text Classification Using Unsupervised Keyword Expansion

Short-text classification, like all data science, struggles to achieve h...
research
01/19/2018

Investigating the Working of Text Classifiers

Text classification is one of the most widely studied task in natural la...
research
02/29/2020

Depth-Adaptive Graph Recurrent Network for Text Classification

The Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph ...
research
06/05/2023

CELDA: Leveraging Black-box Language Model as Enhanced Classifier without Labels

Utilizing language models (LMs) without internal access is becoming an a...

Please sign up or login with your details

Forgot password? Click here to reset