Near-Linear Time Insertion-Deletion Codes and (1+ε)-Approximating Edit Distance via Indexing

10/28/2018
by   Bernhard Haeupler, et al.
0

We introduce fast-decodable indexing schemes for edit distance which can be used to speed up edit distance computations to near-linear time if one of the strings is indexed by an indexing string I. In particular, for every length n and every ε >0, one can in near linear time construct a string I ∈Σ'^n with |Σ'| = O_ε(1), such that, indexing any string S ∈Σ^n, symbol-by-symbol, with I results in a string S' ∈Σ"^n where Σ" = Σ×Σ' for which edit distance computations are easy, i.e., one can compute a (1+ε)-approximation of the edit distance between S' and any other string in O(n poly( n)) time. Our indexing schemes can be used to improve the decoding complexity of state-of-the-art error correcting codes for insertion and deletions. In particular, they lead to near-linear time decoding algorithms for the insertion-deletion codes of [Haeupler, Shahrasbi; STOC `17] and list-decodable insertion-deletion codes of [Haeupler, Shahrasbi, Sudan; ICALP `18]. Interestingly, the latter codes are a crucial ingredient in the construction of fast-decodable indexing schemes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2019

Constant-factor approximation of near-linear edit distance in near-linear time

We show that the edit distance between two strings of length n can be co...
research
11/08/2020

The Harmonic Edit Distance

This short note introduces a new distance between strings, where the cos...
research
04/15/2020

Decoding algorithms of monotone codes and azinv codes and their unified view

This paper investigates linear-time decoding algorithms for two classes ...
research
06/24/2019

Dynamic Palindrome Detection

Lately, there is a growing interest in dynamic string matching problems....
research
10/15/2019

Optimal Codes Correcting a Single Indel / Edit for DNA-Based Data Storage

An indel refers to a single insertion or deletion, while an edit refers ...
research
03/03/2023

On Sensitivity of Compact Directed Acyclic Word Graphs

Compact directed acyclic word graphs (CDAWGs) [Blumer et al. 1987] are a...
research
03/11/2021

Imagined-Trailing-Whitespace-Agnostic Levenshtein Distance For Plaintext Table Detection

The standard algorithm for Levenshtein distance, treats trailing whitesp...

Please sign up or login with your details

Forgot password? Click here to reset