An efficient algorithm for three-component key index construction

In this paper, proximity full-text searches in large text arrays are considered. A search query consists of several words. The search result is a list of documents containing these words. In a modern search system, documents that contain search query words that are near each other are more relevant than documents that do not share this trait. To solve this task, for each word in each indexed document, we need to store a record in the index. In this case, the query search time is proportional to the number of occurrences of the queried words in the indexed documents. Consequently, it is common for search systems to evaluate queries that contain frequently occurring words much more slowly than queries that contain less frequently occurring, ordinary words. For each word in the text, we use additional indexes to store information about nearby words at distances from the given word of less than or equal to MaxDistance, which is a parameter. This parameter can take a value of 5, 7, or even more. Three-component key indexes can be created for faster query execution. Previously, we presented the results of experiments showing that when queries contain very frequently occurring words, the average time of the query execution with three-component key indexes is 94.7 times less than that required when using ordinary inverted indexes. In the current work, we describe a new three-component key index building algorithm and demonstrate the correctness of the algorithm. We present the results of experiments creating such an index that is dependent on the value of MaxDistance.

READ FULL TEXT

page 6

page 8

page 9

research
09/06/2020

Proximity full-text searches of frequently occurring words with a response time guarantee

Full-text search engines are important tools for information retrieval. ...
research
12/18/2018

Proximity Full-Text Search with a Response Time Guarantee by Means of Additional Indexes with Multi-Component Keys

Full-text search engines are important tools for information retrieval. ...
research
08/01/2021

Relevance ranking for proximity full-text search based on additional indexes with multi-component keys

The problem of proximity full-text search is considered. If a search que...
research
09/06/2020

An Improved Algorithm for Fast K-Word Proximity Search Based on Multi-Component Key Indexes

A search query consists of several words. In a proximity full-text searc...
research
01/09/2021

Selection of Optimal Parameters in the Fast K-Word Proximity Search Based on Multi-component Key Indexes

Proximity full-text search is commonly implemented in contemporary full-...
research
05/23/2019

COBS: a Compact Bit-Sliced Signature Index

We present COBS, a compact bit-sliced signature index, which is a cross-...
research
01/26/2021

pdfPapers: shell-script utilities for frequency-based multi-word phrase extraction from PDF documents

Biomedical research is intensive in processing information in the previo...

Please sign up or login with your details

Forgot password? Click here to reset