
Bag of Tricks for Efficient Text Classification
This paper explores a simple and efficient baseline for text classificat...
read it

Word Vector Enrichment of Low Frequency Words in the BagofWords Model for Short Text Multiclass Classification Problems
The bagofwords model is a standard representation of text for many lin...
read it

Semantic classifier approach to document classification
In this paper we propose a new document classification method, bridging ...
read it

Text Classification Algorithms: A Survey
In recent years, there has been an exponential growth in the number of c...
read it

Distributed Representations of Sentences and Documents
Many machine learning algorithms require the input to be represented as ...
read it

Cotraining for Demographic Classification Using Deep Learning from Label Proportions
Deep learning algorithms have recently produced stateoftheart accurac...
read it

Tighter Bound Estimation of Sensitivity Analysis for Incremental and Decremental Data Modification
In largescale classification problems, the data set may be faced with f...
read it
Analysis and Optimization of fastText Linear Text Classifier
The paper [1] shows that simple linear classifier can compete with complex deep learning algorithms in text classification applications. Combining bag of words (BoW) and linear classification techniques, fastText [1] attains same or only slightly lower accuracy than deep learning algorithms [29] that are orders of magnitude slower. We proved formally that fastText can be transformed into a simpler equivalent classifier, which unlike fastText does not have any hidden layer. We also proved that the necessary and sufficient dimensionality of the word vector embedding space is exactly the number of document classes. These results help constructing more optimal linear text classifiers with guaranteed maximum classification capabilities. The results are proven exactly by pure formal algebraic methods without attracting any empirical data.
READ FULL TEXT
Comments
There are no comments yet.