    Approximating the Median under the Ulam Metric

We study approximation algorithms for variants of the median string problem, which asks for a string that minimizes the sum of edit distances from a given set of m strings of length n. Only the straightforward 2-approximation is known for this NP-hard problem. This problem is motivated e.g. by computational biology, and belongs to the class of median problems (over different metric spaces), which are fundamental tasks in data analysis. Our main result is for the Ulam metric, where all strings are permutations over [n] and each edit operation moves a symbol (deletion plus insertion). We devise for this problem an algorithms that breaks the 2-approximation barrier, i.e., computes a (2-δ)-approximate median permutation for some constant δ>0 in time Õ(nm^2+n^3). We further use these techniques to achieve a (2-δ) approximation for the median string problem in the special case where the median is restricted to length n and the optimal objective is large Ω(mn). We also design an approximation algorithm for the following probabilistic model of the Ulam median: the input consists of m perturbations of an (unknown) permutation x, each generated by moving every symbol to a random position with probability (a parameter) ϵ>0. Our algorithm computes with high probability a (1+o(1/ϵ))-approximate median permutation in time O(mn^2+n^3).

Authors

07/20/2021

Approximate Trace Reconstruction via Median String (in Average-Case)

We consider an approximate version of the trace reconstruction problem, ...
12/06/2021

On Complexity of 1-Center in Various Metrics

We consider the classic 1-center problem: Given a set P of n points in a...
12/04/2019

Assessing the best edit in perturbation-based iterative refinement algorithms to compute the median string

Strings are a natural representation of biological data such as DNA, RNA...
03/04/2020

Pivot Selection for Median String Problem

The Median String Problem is W-Hard under the Levenshtein distance, t...
11/10/2021

Permute, Graph, Map, Derange

We study decomposable combinatorial labeled structures in the exp-log cl...
01/05/2022

Deterministic metric 1-median selection with very few queries

Given an n-point metric space (M,d), metric 1-median asks for a point p∈...
07/01/2021

Data Deduplication with Random Substitutions

Data deduplication saves storage space by identifying and removing repea...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.