A Unified Framework for Clustering Constrained Data without Locality Property
In this paper, we consider a class of constrained clustering problems of points in R^d, where d could be rather high. A common feature of these problems is that their optimal clusterings no longer have the locality property (due to the additional constraints), which is a key property required by many algorithms for their unconstrained counterparts. To overcome the difficulty caused by the loss of locality, we present in this paper a unified framework, called Peeling-and-Enclosing (PnE), to iteratively solve two variants of the constrained clustering problems, constrained k-means clustering (k-CMeans) and constrained k-median clustering (k-CMedian). Our framework is based on two standalone geometric techniques, called Simplex Lemma and Weaker Simplex Lemma, for k-CMeans and k-CMedian, respectively. The simplex lemma (or weaker simplex lemma) enables us to efficiently approximate the mean (or median) point of an unknown set of points by searching a small-size grid, independent of the dimensionality of the space, in a simplex (or the surrounding region of a simplex), and thus can be used to handle high dimensional data. If k and 1/ϵ are fixed numbers, our framework generates, in nearly linear time ( i.e., O(n( n)^k+1d)), O(( n)^k) k-tuple candidates for the k mean or median points, and one of them induces a (1+ϵ)-approximation for k-CMeans or k-CMedian, where n is the number of points. Combining this unified framework with a problem-specific selection algorithm (which determines the best k-tuple candidate), we obtain a (1+ϵ)-approximation for each of the constrained clustering problems. We expect that our technique will be applicable to other constrained clustering problems without locality.
READ FULL TEXT