Sublinear-Time Algorithms for Computing Embedding Gap Edit Distance

07/24/2020
by   Tomasz Kociumaka, et al.
0

In this paper, we design new sublinear-time algorithms for solving the gap edit distance problem and for embedding edit distance to Hamming distance. For the gap edit distance problem, we give a greedy algorithm that distinguishes in time Õ(n/k+k^2) between length-n input strings with edit distance at most k and those with edit distance more than 4k^2. This is an improvement and a simplification upon the main result of [Goldenberg, Krauthgamer, Saha, FOCS 2019], where the k vs Θ(k^2) gap edit distance problem is solved in Õ(n/k+k^3) time. We further generalize our result to solve the k vs α k gap edit distance problem in time Õ(n/α+k^2+ k/α√(nk)), strictly improving upon the previously known bound Õ(n/α+k^3). Finally, we show that if the input strings do not have long highly periodic substrings, then the gap edit distance problem can be solved in sublinear time within any factor α>1. We further give the first sublinear-time algorithm for the probabilistic embedding of edit distance to Hamming distance. Our Õ(n/p)-time procedure yields an embedding with distortion k^2p, where k is the edit distance of the original strings. Specifically, the Hamming distance of the resultant strings is between k/p and k^2 with good probability. This generalizes the linear-time embedding of [Chakraborty, Goldenberg, Koucky, STOC 2016], where the resultant Hamming distance is between k and k^2. Our algorithm is based on a random walk over samples, which we believe will find other applications in sublinear-time algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2022

Improved Sublinear-Time Edit Distance for Preprocessed Strings

We study the problem of approximating the edit distance of two strings i...
research
10/02/2019

Sublinear Algorithms for Gap Edit Distance

The edit distance is a way of quantifying how similar two strings are to...
research
11/22/2022

An Algorithmic Bridge Between Hamming and Levenshtein Distances

The edit distance between strings classically assigns unit cost to every...
research
04/26/2018

On Estimating Edit Distance: Alignment, Dimension Reduction, and Embeddings

Edit distance is a fundamental measure of distance between strings and h...
research
02/16/2022

Almost-Optimal Sublinear-Time Edit Distance in the Low Distance Regime

We revisit the task of computing the edit distance in sublinear time. In...
research
07/28/2020

A Simple Sublinear Algorithm for Gap Edit Distance

We study the problem of estimating the edit distance between two n-chara...
research
11/12/2017

Longest Alignment with Edits in Data Streams

Analyzing patterns in data streams generated by network traffic, sensor ...

Please sign up or login with your details

Forgot password? Click here to reset