Efficient Online String Matching Based on Characters Distance Text Sampling

08/16/2019
by   Simone Faro, et al.
0

Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. Sampled string matching is an efficient approach recently introduced in order to overcome the prohibitive space requirements of an index construction, on the one hand, and drastically reduce searching time for the online solutions, on the other hand. In this paper we present a new algorithm for the sampled string matching problem, based on a characters distance sampling approach. The main idea is to sample the distances between consecutive occurrences of a given pivot character and then to search online the sampled data for any occurrence of the sampled pattern, before verifying the original text. From a theoretical point of view we prove that, under suitable conditions, our solution can achieve both linear worst-case time complexity and optimal average-time complexity. From a practical point of view it turns out that our solution shows a sub-linear behaviour in practice and speeds up online searching by a factor of up to 9, using limited additional space whose amount goes from 11 of the text size, with a gain up to 50

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2020

Detecting k-(Sub-)Cadences and Equidistant Subsequence Occurrences

The equidistant subsequence pattern matching problem is considered. Give...
research
11/05/2019

An Efficient Word Lookup System by using Improved Trie Algorithm

Efficiently word storing and searching is an important task in computer ...
research
02/28/2020

Fast Indexes for Gapped Pattern Matching

We describe indexes for searching large data sets for variable-length-ga...
research
01/03/2021

Text Searching Allowing for Non-Overlapping Adjacent Unbalanced Translocations

In this paper we investigate the approximate string matching problem whe...
research
03/07/2018

Flexible and Efficient Algorithms for Abelian Matching in Strings

The abelian pattern matching problem consists in finding all substrings ...
research
06/19/2017

The E-Average Common Submatrix: Approximate Searching in a Restricted Neighborhood

This paper introduces a new (dis)similarity measure for 2D arrays, exten...
research
12/02/2018

Sequence Searching Allowing for Non-Overlapping Adjacent Unbalanced Translocations

Unbalanced translocations are among the most frequent chromosomal altera...

Please sign up or login with your details

Forgot password? Click here to reset