Web Document Categorization Using Naive Bayes Classifier and Latent Semantic Analysis

06/02/2020
by   Alireza Saleh Sedghpour, et al.
0

A rapid growth of web documents due to heavy use of World Wide Web necessitates efficient techniques to efficiently classify the document on the web. It is thus produced High volumes of data per second with high diversity. Automatically classification of these growing amounts of web document is One of the biggest challenges facing us today. Probabilistic classification algorithms such as Naive Bayes have become commonly used for web document classification. This problem is mainly because of the irrelatively high classification accuracy on plenty application areas as well as their lack of support to handle high dimensional and sparse data which is the exclusive characteristics of textual data representation. also it is common to Lack of attention and support the semantic relation between words using traditional feature selection method When dealing with the big data and large-scale web documents. In order to solve the problem, we proposed a method for web document classification that uses LSA to increase similarity of documents under the same class and improve the classification precision. Using this approach, we designed a faster and much accurate classifier for Web Documents. Experimental results have shown that using the mentioned preprocessing can improve accuracy and speed of Naive Bayes availably, the precision and recall metrics have indicated the improvement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/14/2018

Classification of Scientific Papers With Big Data Technologies

Data sizes that cannot be processed by conventional data storage and ana...
research
05/08/2019

Naive Bayes with Correlation Factor for Text Classification Problem

Naive Bayes estimator is widely used in text classification problems. Ho...
research
11/29/2011

An Enhanced Indexing And Ranking Technique On The Semantic Web

With the fast growth of the Internet, more and more information is avail...
research
01/16/2017

Semantic classifier approach to document classification

In this paper we propose a new document classification method, bridging ...
research
02/25/2023

HADES: Homologous Automated Document Exploration and Summarization

This paper introduces HADES, a novel tool for automatic comparative docu...
research
08/19/1999

Representing Scholarly Claims in Internet Digital Libraries: A Knowledge Modelling Approach

This paper is concerned with tracking and interpreting scholarly documen...
research
11/01/2021

AutoShard – Declaratively Managing Hot Spot Data Objects in NoSQL Document Stores

NoSQL document stores are becoming increasingly popular as backends in w...

Please sign up or login with your details

Forgot password? Click here to reset