CURE: Collection for Urdu Information Retrieval Evaluation and Ranking

11/01/2020
by   Muntaha Iqbal, et al.
0

Urdu is a widely spoken language with 163 million speakers worldwide across the globe. Information Retrieval (IR) for Urdu entails special consideration of research community due to its rich morphological features and a large number of speakers. In general, IR evaluation task is not extensively explored for Urdu. The most important missing element is the availability of a standardized evaluation corpus specific to Urdu. In this research work, we propose and construct a standard test collection of Urdu documents for IR evaluation and named it Collection for Urdu Retrieval Evaluation (CURE). We select 1,096 unique documents against 50 diverse queries from a large collection of 0.5 million crawled documents using two IR models. The purpose of test collection is the evaluation of IR models, ranking algorithms, and different natural language processing techniques. Next, we perform binary relevance judgment on the selected documents. We also built two other language resources for lemmatization and query expansion specific to our test collection. Evaluation of test collection is carried out using four retrieval models as well using the stop-words list, lemmatization, and query expansion. Furthermore, error analysis was performed for each query with different NLP techniques. To the best of our knowledge, this work is the first attempt for preparing a standardized information retrieval evaluation test collection for the Urdu language.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2013

Query Expansion in Information Retrieval Systems using a Bayesian Network-Based Thesaurus

Information Retrieval (IR) is concerned with the identification of docum...
research
11/01/2020

Cheap IR Evaluation: Fewer Topics, No Relevance Judgements, and Crowdsourced Assessments

To evaluate Information Retrieval (IR) effectiveness, a possible approac...
research
05/12/2023

NevIR: Negation in Neural Information Retrieval

Negation is a common everyday phenomena and has been a consistent area o...
research
01/11/2018

Applying Vector Space Model (VSM) Techniques in Information Retrieval for Arabic Language

Information Retrieval (IR) is a part of Neutral Language Processing (NLP...
research
03/14/2021

TripClick: The Log Files of a Large Health Web Search Engine

Click logs are valuable resources for a variety of information retrieval...
research
03/23/2019

Action-Centered Information Retrieval

Information Retrieval (IR) aims at retrieving documents that are most re...
research
07/01/2017

An Approach for Weakly-Supervised Deep Information Retrieval

Recent developments in neural information retrieval models have been pro...

Please sign up or login with your details

Forgot password? Click here to reset