Parameterized Complexity of Categorical Clustering with Size Constraints
In the Categorical Clustering problem, we are given a set of vectors (matrix) A=a_1,…,a_n over Σ^m, where Σis a finite alphabet, and integers k and B. The task is to partition A into k clusters such that the median objective of the clustering in the Hamming norm is at most B. That is, we seek a partition I_1,…,I_k of 1,…,n and vectors c_1,…,c_k∈Σ^m such that ∑_i=1^k∑_j∈I_id_h(c_i,a_j)≤B, where d_H(a,b) is the Hamming distance between vectors a and b. Fomin, Golovach, and Panolan [ICALP 2018] proved that the problem is fixed-parameter tractable (for binary case Σ=0,1) by giving an algorithm that solves the problem in time 2^O(BlogB) (mn)^O(1). We extend this algorithmic result to a popular capacitated clustering model, where in addition the sizes of the clusters should satisfy certain constraints. More precisely, in Capacitated Clustering, in addition, we are given two non-negative integers p and q, and seek a clustering with p≤|I_i|≤q for all i∈1,…,k. Our main theorem is that Capacitated Clustering is solvable in time 2^O(BlogB)|Σ|^B(mn)^O(1). The theorem not only extends the previous algorithmic results to a significantly more general model, it also implies algorithms for several other variants of Categorical Clustering with constraints on cluster sizes.
READ FULL TEXT