Information theoretical clustering is hard to approximate

12/17/2018
by   Ferdinando Cicalese, et al.
0

An impurity measures I: R^d R^+ is a function that assigns a d-dimensional vector v to a non-negative value I( v) so that the more homogeneous v, with respect to the values of its coordinates, the larger its impurity. A well known example of impurity measures is the Entropy impurity. We study the problem of clustering based on impurity measures. Let V be a collection of n many d-dimensional vectors with non-negative components. Given V and an impurity measure I, the goal is to find a partition P of V into k groups V_1,...,V_k so as to minimize the sum of the impurities of the groups in P, i.e., I( P)= ∑_i=1^k I(∑_ v∈ V_i v). Impurity minimization has been widely used as quality assessment measure in probability distribution clustering (KL-divergence) as well as in categorical clustering. However, in contrast to the case of metric based clustering, the current knowledge of impurity measure based clustering in terms of approximation and inapproximability results is very limited. Here, we contribute to change this scenario by proving that for the Entropy impurity measure the problem does not admit a PTAS even when all vectors have the same ℓ_1 norm. This result solves a question that remained open in previous work on this topic [Chaudhuri and McGregor COLT 08; Ackermann et. al. ECCC 11].

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2018

Approximation Algorithms for Clustering via Weighted Impurity Measures

An impurity measures I:R^k →R^+ maps a k-dimensional vector v to a non-...
research
04/16/2021

Parameterized Complexity of Categorical Clustering with Size Constraints

In the Categorical Clustering problem, we are given a set of vectors (ma...
research
04/04/2018

Sparse non-negative super-resolution - simplified and stabilised

The convolution of a discrete measure, x=∑_i=1^ka_iδ_t_i, with a local w...
research
08/08/2022

Partial reconstruction of measures from halfspace depth

The halfspace depth of a d-dimensional point x with respect to a finite ...
research
06/16/2023

On Orderings of Probability Vectors and Unsupervised Performance Estimation

Unsupervised performance estimation, or evaluating how well models perfo...
research
03/27/2022

Random Graphs by Product Random Measures

A natural representation of random graphs is the random measure. The col...

Please sign up or login with your details

Forgot password? Click here to reset