Approximating Edit Distance in the Fully Dynamic Model

07/14/2023
by   Tomasz Kociumaka, et al.
0

The edit distance is a fundamental measure of sequence similarity, defined as the minimum number of character insertions, deletions, and substitutions needed to transform one string into the other. Given two strings of length at most n, simple dynamic programming computes their edit distance exactly in O(n^2) time, which is also the best possible (up to subpolynomial factors) assuming the Strong Exponential Time Hypothesis (SETH). The last few decades have seen tremendous progress in edit distance approximation, where the runtime has been brought down to subquadratic, near-linear, and even sublinear at the cost of approximation. In this paper, we study the dynamic edit distance problem, where the strings change dynamically as the characters are substituted, inserted, or deleted over time. Each change may happen at any location of either of the two strings. The goal is to maintain the (exact or approximate) edit distance of such dynamic strings while minimizing the update time. The exact edit distance can be maintained in Õ(n) time per update (Charalampopoulos, Kociumaka, Mozes; 2020), which is again tight assuming SETH. Unfortunately, even with the unprecedented progress in edit distance approximation in the static setting, strikingly little is known regarding dynamic edit distance approximation. Utilizing the off-the-shelf tools, it is possible to achieve an O(n^c)-approximation in n^0.5-c+o(1) update time for any constant c∈ [0,1/6]. Improving upon this trade-off remains open. The contribution of this work is a dynamic n^o(1)-approximation algorithm with amortized expected update time of n^o(1). In other words, we bring the approximation-ratio and update-time product down to n^o(1). Our solution utilizes an elegant framework of precision sampling tree for edit distance approximation (Andoni, Krauthgamer, Onak; 2010).

READ FULL TEXT
research
10/02/2019

Sublinear Algorithms for Gap Edit Distance

The edit distance is a way of quantifying how similar two strings are to...
research
10/08/2018

Approximating Edit Distance Within Constant Factor in Truly Sub-Quadratic Time

Edit distance is a measure of similarity of two strings based on the min...
research
04/10/2019

Reducing approximate Longest Common Subsequence to approximate Edit Distance

Given a pair of strings, the problems of computing their Longest Common ...
research
12/10/2021

How Compression and Approximation Affect Efficiency in String Distance Measures

Real-world data often comes in compressed form. Analyzing compressed dat...
research
09/01/2021

Algorithme de recherche approximative dans un dictionnaire fondé sur une distance d'édition définie par blocs

We propose an algorithm for approximative dictionary lookup, where alter...
research
11/04/2021

Representation Edit Distance as a Measure of Novelty

Adaptation to novelty is viewed as learning to change and augment existi...
research
07/06/2020

Near-Linear Time Edit Distance for Indel Channels

We consider the following model for sampling pairs of strings: s_1 is a ...

Please sign up or login with your details

Forgot password? Click here to reset