Efficiently Approximating Edit Distance Between Pseudorandom Strings

11/10/2018
by   William Kuszmaul, et al.
0

We present an algorithm for approximating the edit distance ed(x, y) between two strings x and y in time parameterized by the degree to which one of the strings x satisfies a natural pseudorandomness property. The pseudorandomness model is asymmetric in that no requirements are placed on the second string y, which may be constructed by an adversary with full knowledge of x. We say that x is (p, B)-pseudorandom if all pairs a and b of disjoint B-letter substrings of x satisfy ed(a, b) > pB. Given parameters p and B, our algorithm computes the edit distance between a (p, B)-pseudorandom string x and an arbitrary string y within a factor of O(1/p) in time Õ(nB), with high probability. Our algorithm is robust in the sense that it can handle a small portion of x being adversarial (i.e., not satisfying the pseudorandomness property). In this case, the algorithm incurs an additive approximation error proportional to the fraction of x which behaves maliciously. The asymmetry of our pseudorandomness model has particular appeal for the case where x is a source string, meaning that ed(x, y) will be computed for many strings y. Suppose that one wishes to achieve an O(α)-approximation for each ed(x, y) computation, and that B is the smallest block-size for which the string x is (1/α, B)-pseudorandom. We show that without knowing B beforehand, x may be preprocessed in time Õ(n^1.5√(B)), so that all future computations of the form ed(x, y) may be O(α)-approximated in time Õ(nB). Furthermore, for the special case where only a single ed(x, y) computation will be performed, we show how to achieve an O(α)-approximation in time Õ(n^4/3B^2/3).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2022

Improved Sublinear-Time Edit Distance for Preprocessed Strings

We study the problem of approximating the edit distance of two strings i...
research
10/08/2018

Approximating Edit Distance Within Constant Factor in Truly Sub-Quadratic Time

Edit distance is a measure of similarity of two strings based on the min...
research
08/20/2021

Does Preprocessing help in Fast Sequence Comparisons?

We study edit distance computation with preprocessing: the preprocessing...
research
02/09/2023

Locally consistent decomposition of strings with applications to edit distance sketching

In this paper we provide a new locally consistent decomposition of strin...
research
07/04/2012

A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance

The need to measure sequence similarity arises in information extraction...
research
11/08/2020

The Harmonic Edit Distance

This short note introduces a new distance between strings, where the cos...
research
10/01/2019

Scalable String Reconciliation by Recursive Content-Dependent Shingling

We consider the problem of reconciling similar, but remote, strings with...

Please sign up or login with your details

Forgot password? Click here to reset