Approximating longest common substring with k mismatches: Theory and practice

04/28/2020
by   Garance Gourdel, et al.
0

In the problem of the longest common substring with k mismatches we are given two strings X, Y and must find the maximal length ℓ such that there is a length-ℓ substring of X and a length-ℓ substring of Y that differ in at most k positions. The length ℓ can be used as a robust measure of similarity between X, Y. In this work, we develop new approximation algorithms for computing ℓ that are significantly more efficient that previously known solutions from the theoretical point of view. Our approach is simple and practical, which we confirm via an experimental evaluation, and is probably close to optimal as we demonstrate via a conditional lower bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/22/2017

Longest common substring with approximately k mismatches

In the longest common substring problem we are given two strings of leng...
research
02/13/2023

Near-Optimal Dynamic Time Warping on Run-Length Encoded Strings

We give an Õ(n^2) time algorithm for computing the exact Dynamic Time Wa...
research
07/25/2023

A Compact DAG for Storing and Searching Maximal Common Subsequences

Maximal Common Subsequences (MCSs) between two strings X and Y are subse...
research
08/27/2022

Practical applications of Set Shaping Theory in Huffman coding

One of the biggest criticisms of the Set Shaping Theory is the lack of a...
research
11/30/2022

Approximating binary longest common subsequence in almost-linear time

The Longest Common Subsequence (LCS) is a fundamental string similarity ...
research
09/07/2020

A Fast Randomized Algorithm for Finding the Maximal Common Subsequences

Finding the common subsequences of L multiple strings has many applicati...
research
03/16/2020

Approximating LCS in Linear Time: Beating the √(n) Barrier

Longest common subsequence (LCS) is one of the most fundamental problems...

Please sign up or login with your details

Forgot password? Click here to reset