Parameterized Complexity of Categorical Clustering with Size Constraints

04/16/2021
by   Fedor V. Fomin, et al.
0

In the Categorical Clustering problem, we are given a set of vectors (matrix) A=a_1,…,a_n over Σ^m, where Σis a finite alphabet, and integers k and B. The task is to partition A into k clusters such that the median objective of the clustering in the Hamming norm is at most B. That is, we seek a partition I_1,…,I_k of 1,…,n and vectors c_1,…,c_k∈Σ^m such that ∑_i=1^k∑_j∈I_id_h(c_i,a_j)≤B, where d_H(a,b) is the Hamming distance between vectors a and b. Fomin, Golovach, and Panolan [ICALP 2018] proved that the problem is fixed-parameter tractable (for binary case Σ=0,1) by giving an algorithm that solves the problem in time 2^O(BlogB) (mn)^O(1). We extend this algorithmic result to a popular capacitated clustering model, where in addition the sizes of the clusters should satisfy certain constraints. More precisely, in Capacitated Clustering, in addition, we are given two non-negative integers p and q, and seek a clustering with p≤|I_i|≤q for all i∈1,…,k. Our main theorem is that Capacitated Clustering is solvable in time 2^O(BlogB)|Σ|^B(mn)^O(1). The theorem not only extends the previous algorithmic results to a significantly more general model, it also implies algorithms for several other variants of Categorical Clustering with constraints on cluster sizes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2019

Parameterized k-Clustering: The distance matters!

We consider the k-Clustering problem, which is for a given multiset of n...
research
05/08/2021

Parameterized Complexity of Feature Selection for Categorical Data Clustering

We develop new algorithmic methods with provable guarantees for feature ...
research
10/17/2017

The Bayesian Sorting Hat: A Decision-Theoretic Approach to Size-Constrained Clustering

Size-constrained clustering (SCC) refers to the dual problem of using ob...
research
12/17/2018

Information theoretical clustering is hard to approximate

An impurity measures I: R^d R^+ is a function that assigns a d-dimension...
research
12/01/2021

Dimensionality Reduction for Categorical Data

Categorical attributes are those that can take a discrete set of values,...
research
06/29/2019

Approximate Inference in Structured Instances with Noisy Categorical Observations

We study the problem of recovering the latent ground truth labeling of a...
research
08/21/2019

Clustering Longitudinal Life-Course Sequences using Mixtures of Exponential-Distance Models

Sequence analysis is an increasingly popular approach for the analysis o...

Please sign up or login with your details

Forgot password? Click here to reset