Early text classification: a Naive solution

09/20/2015
by   Hugo Jair Escalante, et al.
0

Text classification is a widely studied problem, and it can be considered solved for some domains and under certain circumstances. There are scenarios, however, that have received little or no attention at all, despite its relevance and applicability. One of such scenarios is early text classification, where one needs to know the category of a document by using partial information only. A document is processed as a sequence of terms, and the goal is to devise a method that can make predictions as fast as possible. The importance of this variant of the text classification problem is evident in domains like sexual predator detection, where one wants to identify an offender as early as possible. This paper analyzes the suitability of the standard naive Bayes classifier for approaching this problem. Specifically, we assess its performance when classifying documents after seeing an increasingly number of terms. A simple modification to the standard naive Bayes implementation allows us to make predictions with partial information. To the best of our knowledge naive Bayes has not been used for this purpose before. Throughout an extensive experimental evaluation we show the effectiveness of the classifier for early text classification. What is more, we show that this simple solution is very competitive when compared with state of the art methodologies that are more elaborated. We foresee our work will pave the way for the development of more effective early text classification techniques based in the naive Bayes formulation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/08/2019

Naive Bayes with Correlation Factor for Text Classification Problem

Naive Bayes estimator is widely used in text classification problems. Ho...
research
09/23/2010

A hybrid learning algorithm for text classification

Text classification is the process of classifying documents into predefi...
research
01/18/2021

Fast Privacy-Preserving Text Classification based on Secure Multiparty Computation

We propose a privacy-preserving Naive Bayes classifier and apply it to t...
research
04/13/2023

Improved Naive Bayes with Mislabeled Data

Labeling mistakes are frequently encountered in real-world applications....
research
05/03/2014

Kaggle LSHTC4 Winning Solution

Our winning submission to the 2014 Kaggle competition for Large Scale Hi...
research
11/01/2021

Comparative Study of Long Document Classification

The amount of information stored in the form of documents on the interne...

Please sign up or login with your details

Forgot password? Click here to reset