Effects of Stop Words Elimination for Arabic Information Retrieval: A Comparative Study

02/07/2017
by   Ibrahim Abu El-Khair, et al.
0

The effectiveness of three stop words lists for Arabic Information Retrieval---General Stoplist, Corpus-Based Stoplist, Combined Stoplist ---were investigated in this study. Three popular weighting schemes were examined: the inverse document frequency weight, probabilistic weighting, and statistical language modelling. The Idea is to combine the statistical approaches with linguistic approaches to reach an optimal performance, and compare their effect on retrieval. The LDC (Linguistic Data Consortium) Arabic Newswire data set was used with the Lemur Toolkit. The Best Match weighting scheme used in the Okapi retrieval system had the best overall performance of the three weighting algorithms used in the study, stoplists improved retrieval effectiveness especially when used with the BM25 weight. The overall performance of a general stoplist was better than the other two lists.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2019

An Accuracy-Enhanced Stemming Algorithm for Arabic Information Retrieval

This paper provides a method for indexing and retrieving Arabic texts, b...
research
10/18/2017

Build Fast and Accurate Lemmatization for Arabic

In this paper we describe the complexity of building a lemmatizer for Ar...
research
01/11/2018

Applying Vector Space Model (VSM) Techniques in Information Retrieval for Arabic Language

Information Retrieval (IR) is a part of Neutral Language Processing (NLP...
research
06/06/2012

Feature Weighting for Improving Document Image Retrieval System Performance

Feature weighting is a technique used to approximate the optimal degree ...
research
08/19/2017

Arabic Multi-Dialect Segmentation: bi-LSTM-CRF vs. SVM

Arabic word segmentation is essential for a variety of NLP applications ...
research
07/31/2018

An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization

The fast-growing amount of information on the Internet makes the researc...
research
09/14/2017

T^2K^2: The Twitter Top-K Keywords Benchmark

Information retrieval from textual data focuses on the construction of v...

Please sign up or login with your details

Forgot password? Click here to reset