Re-evaluating the need for Modelling Term-Dependence in Text Classification Problems

10/25/2017
by   Sounak Banerjee, et al.
0

A substantial amount of research has been carried out in developing machine learning algorithms that account for term dependence in text classification. These algorithms offer acceptable performance in most cases but they are associated with a substantial cost. They require significantly greater resources to operate. This paper argues against the justification of the higher costs of these algorithms, based on their performance in text classification problems. In order to prove the conjecture, the performance of one of the best dependence models is compared to several well established algorithms in text classification. A very specific collection of datasets have been designed, which would best reflect the disparity in the nature of text data, that are present in real world applications. The results show that even one of the best term dependence models, performs decent at best when compared to other independence models. Coupled with their substantially greater requirement for hardware resources for operation, this makes them an impractical choice for being used in real world scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2020

A Chinese Text Classification Method With Low Hardware Requirement Based on Improved Model Concatenation

In order to improve the accuracy performance of Chinese text classificat...
research
04/17/2019

Text Classification Algorithms: A Survey

In recent years, there has been an exponential growth in the number of c...
research
12/19/2019

Empirical Comparisons of CNN with Other Learning Algorithms for Text Classification in Legal Document Review

Research has shown that Convolutional Neural Networks (CNN) can be effec...
research
09/11/2018

Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data

Industry datasets used for text classification are rarely created for th...
research
09/01/2021

What Have Been Learned What Should Be Learned? An Empirical Study of How to Selectively Augment Text for Classification

Text augmentation techniques are widely used in text classification prob...
research
11/20/2020

Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text classification

Text classification is one of the challenging computational tasks in mac...

Please sign up or login with your details

Forgot password? Click here to reset