Johnson Coverage Hypothesis: Inapproximability of k-means and k-median in L_p metrics

11/21/2021
by   Vincent Cohen-Addad, et al.
0

K-median and k-means are the two most popular objectives for clustering algorithms. Despite intensive effort, a good understanding of the approximability of these objectives, particularly in ℓ_p-metrics, remains a major open problem. In this paper, we significantly improve upon the hardness of approximation factors known in literature for these objectives in ℓ_p-metrics. We introduce a new hypothesis called the Johnson Coverage Hypothesis (JCH), which roughly asserts that the well-studied max k-coverage problem on set systems is hard to approximate to a factor greater than 1-1/e, even when the membership graph of the set system is a subgraph of the Johnson graph. We then show that together with generalizations of the embedding techniques introduced by Cohen-Addad and Karthik (FOCS '19), JCH implies hardness of approximation results for k-median and k-means in ℓ_p-metrics for factors which are close to the ones obtained for general metrics. In particular, assuming JCH we show that it is hard to approximate the k-means objective: ∙ Discrete case: To a factor of 3.94 in the ℓ_1-metric and to a factor of 1.73 in the ℓ_2-metric; this improves upon the previous factor of 1.56 and 1.17 respectively, obtained under UGC. ∙ Continuous case: To a factor of 2.10 in the ℓ_1-metric and to a factor of 1.36 in the ℓ_2-metric; this improves upon the previous factor of 1.07 in the ℓ_2-metric obtained under UGC. We also obtain similar improvements under JCH for the k-median objective. Additionally, we prove a weak version of JCH using the work of Dinur et al. (SICOMP '05) on Hypergraph Vertex Cover, and recover all the results stated above of Cohen-Addad and Karthik (FOCS '19) to (nearly) the same inapproximability factors but now under the standard NP≠P assumption (instead of UGC).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2020

On Approximability of Clustering Problems Without Candidate Centers

The k-means objective is arguably the most widely-used cost function for...
research
02/21/2018

Inapproximability of Matrix p→ q Norms

We study the problem of computing the p→ q norm of a matrix A ∈ R^m × n,...
research
06/30/2022

Approximation Algorithms for Continuous Clustering and Facility Location Problems

We consider the approximability of center-based clustering problems wher...
research
07/14/2018

Exact Algorithms and Lower Bounds for Stable Instances of Euclidean k-Means

We investigate the complexity of solving stable or perturbation-resilien...
research
12/03/2021

On Some Variants of Euclidean K-Supplier

The k-Supplier problem is an important location problem that has been ac...
research
05/05/2021

Universal Algorithms for Clustering

This paper presents universal algorithms for clustering problems, includ...

Please sign up or login with your details

Forgot password? Click here to reset