Experiments with Different Indexing Techniques for Text Retrieval tasks on Gujarati Language using Bag of Words Approach

02/05/2020
by   Dr. Jyoti Pareek, et al.
0

This paper presents results of various experiments carried out to improve text retrieval of gujarati text documents. Text retrieval involves searching and ranking of text documents for a given set of query terms. We have tested various retrieval models that uses bag-of-words approach. Bag-of-words approach is a traditional approach that is being used till date where the text document is represented as collection of words. Measures like frequency count, inverse document frequency etc. are used to signify and rank relevant documents for user queries. Different ranking models have been used to quantify ranking performance using the metric of mean average precision. Gujarati is a morphologically rich language, we have compared techniques like stop word removal, stemming and frequent case generation against baseline to measure the improvements in information retrieval tasks. Most of the techniques are language dependent and requires development of language specific tools. We used plain unprocessed word index as the baseline, we have seen significant improvements in comparison of MAP values after applying different indexing techniques when compared to the baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2020

Information retrieval system for silte language using BM25 weighting

The main aim of an information retrieval system is to extract appropriat...
research
04/20/2021

An Analysis of Indexing and Querying Strategies on a Technologically Assisted Review Task

This paper presents a preliminary experimentation study using the CLEF 2...
research
02/28/2021

An Efficient Indexing and Searching Technique for Information Retrieval for Urdu Language

Indexing techniques are used to improve retrieval of data in response to...
research
07/01/2020

Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval

Recognition and retrieval of textual content from the large document col...
research
10/01/2019

BioNLP-OST 2019 RDoC Tasks: Multi-grain Neural Relevance Ranking Using Topics and Attention Based Query-Document-Sentence Interactions

This paper presents our system details and results of participation in t...
research
05/10/2020

Transformer-Based Language Models for Similar Text Retrieval and Ranking

Most approaches for similar text retrieval and ranking with long natural...
research
03/04/2021

The effects of having lists of synonyms on the performance of Afaan Oromo Text Retrieval system

Obtaining relevant information from a collection of informational resour...

Please sign up or login with your details

Forgot password? Click here to reset