The Statistical Dictionary-based String Matching Problem

11/22/2018
by   M. Suri, et al.
0

In the Dictionary-based String Matching (DSM) problem, a retrieval system has access to a source sequence and stores the position of a certain number of strings in a posting table. When a user inquires the position of a string, the retrieval system, instead of searching in the source sequence directly, relies on the the posting table to answer the query more efficiently. In this paper, the Statistical DSM problem is a proposed as a statistical and information-theoretic formulation of the classic DSM problem in which both the source and the query have a statistical description while the strings stored in the posting sequence are described as a code. Through this formulation, we are able to define the efficiency of the retrieval system as the average cost in answering a users' query in the limit of sufficiently long source sequence. This formulation is used to study the retrieval performance for the case in which (i) all the strings of a given length, referred to as k-grams , and (ii) prefix-free codes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2019

A Simple Reduction for Full-Permuted Pattern Matching Problems on Multi-Track Strings

In this paper we study a variant of string pattern matching which deals ...
research
03/14/2019

The Parameterized Position Heap of a Trie

Let Σ and Π be disjoint alphabets of respective size σ and π. Two string...
research
05/28/2020

Classical and Quantum Algorithms for Constructing Text from Dictionary Problem

We study algorithms for solving the problem of constructing a text (long...
research
08/02/2018

Reconstructing Strings from Substrings: Optimal Randomized and Average-Case Algorithms

The problem called "String reconstruction from substrings" is a mathemat...
research
06/18/2023

Quantum Algorithms for the Shortest Common Superstring and Text Assembling Problems

In this paper, we consider two versions of the Text Assembling problem. ...
research
06/29/2020

Pattern Masking for Dictionary Matching

In the Pattern Masking for Dictionary Matching (PMDM) problem, we are gi...
research
01/27/2018

A Characterization of Guesswork on Swiftly Tilting Curves

Given a collection of strings, each with an associated probability of oc...

Please sign up or login with your details

Forgot password? Click here to reset