On Computing Centroids According to the p-Norms of Hamming Distance Vectors

07/17/2018
by   Jiehua Chen, et al.
0

In this paper we consider the p-Norm Hamming Centroid problem which asks to determine whether some given binary strings have a centroid with a bound on the p-norm of its Hamming distances to the strings. Specifically, given a set of strings S and a real k, we consider the problem of determining whether there exists a string s^* with (∑_s ∈ Sd^p(s^*,s))^1/p≤ k, where d(,) denotes the Hamming distance metric. This problem has important applications in data clustering, and is a generalization of the well-known polynomial-time solvable Consensus String (p=1) problem, as well as the NP-hard Closest String (p=∞) problem. Our main result shows that the problem is NP-hard for all rational p > 1, closing the gap for all rational values of p between 1 and ∞. Under standard complexity assumptions the reduction also implies that the problem has no 2^o(n+m)-time or 2^o(k^p/(p+1))-time algorithm, where m denotes the number of input strings and n denotes the length of each string, for any fixed p > 1. Both running time lower bounds are tight. In particular, we provide a 2^k^p/(p+1)+ε-time algorithm for each fixed ε > 0.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2020

Faster Binary Mean Computation Under Dynamic Time Warping

Many consensus string problems are based on Hamming distance. We replace...
research
05/26/2023

Can You Solve Closest String Faster than Exhaustive Search?

We study the fundamental problem of finding the best string to represent...
research
07/24/2018

A Note on Clustering Aggregation

We consider the clustering aggregation problem in which we are given a s...
research
06/30/2021

String Comparison on a Quantum Computer Using Hamming Distance

The Hamming distance is ubiquitous in computing. Its computation gets ex...
research
03/01/2023

Computing the Best Policy That Survives a Vote

An assembly of n voters needs to decide on t independent binary issues. ...
research
04/05/2020

On the Tandem Duplication Distance

A tandem duplication denotes the process of inserting a copy of a segmen...
research
09/16/2023

Parallel Longest Common SubSequence Analysis In Chapel

One of the most critical problems in the field of string algorithms is t...

Please sign up or login with your details

Forgot password? Click here to reset