Locally consistent decomposition of strings with applications to edit distance sketching

02/09/2023
by   Sudatta Bhattacharya, et al.
0

In this paper we provide a new locally consistent decomposition of strings. Each string x is decomposed into blocks that can be described by grammars of size O(k) (using some amount of randomness). If we take two strings x and y of edit distance at most k then their block decomposition uses the same number of grammars and the i-th grammar of x is the same as the i-th grammar of y except for at most k indexes i. The edit distance of x and y equals to the sum of edit distances of pairs of blocks where x and y differ. Our decomposition can be used to design a sketch of size O(k^2) for edit distance, and also a rolling sketch for edit distance of size O(k^2). The rolling sketch allows to update the sketched string by appending a symbol or removing a symbol from the beginning of the string.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2013

Towards Normalizing the Edit Distance Using a Genetic Algorithms Based Scheme

The normalized edit distance is one of the distances derived from the ed...
research
09/01/2021

Algorithme de recherche approximative dans un dictionnaire fondé sur une distance d'édition définie par blocs

We propose an algorithm for approximative dictionary lookup, where alter...
research
03/11/2021

Imagined-Trailing-Whitespace-Agnostic Levenshtein Distance For Plaintext Table Detection

The standard algorithm for Levenshtein distance, treats trailing whitesp...
research
10/01/2019

Scalable String Reconciliation by Recursive Content-Dependent Shingling

We consider the problem of reconciling similar, but remote, strings with...
research
07/04/2012

A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance

The need to measure sequence similarity arises in information extraction...
research
05/01/2023

Streaming k-edit approximate pattern matching via string decomposition

In this paper we give an algorithm for streaming k-edit approximate patt...
research
11/10/2018

Efficiently Approximating Edit Distance Between Pseudorandom Strings

We present an algorithm for approximating the edit distance ed(x, y) bet...

Please sign up or login with your details

Forgot password? Click here to reset