A Comparative Study on using Principle Component Analysis with Different Text Classifiers

07/04/2018
by   Ahmed I. Taloba, et al.
0

Text categorization (TC) is the task of automatically organizing a set of documents into a set of pre-defined categories. Over the last few years, increased attention has been paid to the use of documents in digital form and this makes text categorization becomes a challenging issue. The most significant problem of text categorization is its huge number of features. Most of these features are redundant, noisy and irrelevant that cause over fitting with most of the classifiers. Hence, feature extraction is an important step to improve the overall accuracy and the performance of the text classifiers. In this paper, we will provide an overview of using principle component analysis (PCA) as a feature extraction with various classifiers. It was observed that the performance rate of the classifiers after using PCA to reduce the dimension of data improved. Experiments are conducted on three UCI data sets, Classic03, CNAE-9 and DBWorld e-mails. We compare the classification performance results of using PCA with popular and well-known text classifiers. Results show that using PCA encouragingly enhances classification performance on most of the classifiers.

READ FULL TEXT
research
11/02/2018

Comparison of Classification Algorithms Used Medical Documents Categorization

Volume of text based documents have been increasing day by day. Medical ...
research
09/14/2020

Principle Component Analysis for Classification of the Quality of Aromatic Rice

This research introduces an instrument for performing quality control on...
research
02/14/2018

Authorship Attribution Using the Chaos Game Representation

The Chaos Game Representation, a method for creating images from nucleot...
research
10/26/2012

Large-Scale Sparse Principal Component Analysis with Application to Text Data

Sparse PCA provides a linear combination of small number of features tha...
research
01/20/2018

Efficient Text Classification Using Tree-structured Multi-linear Principle Component Analysis

A novel text data dimension reduction technique, called the tree-structu...
research
12/29/2021

Application of Hierarchical Temporal Memory Theory for Document Categorization

The current work intends to study the performance of the Hierarchical Te...
research
01/10/2023

Adaptive and Scalable Compression of Multispectral Images using VVC

The VVC codec is applied to the task of multispectral image (MSI) compre...

Please sign up or login with your details

Forgot password? Click here to reset