A semi-supervised sparse K-Means algorithm
We consider the problem of data clustering with unidentified feature quality but the existence of small amount of label data. In the first case a sparse clustering method can be employed in order to detect the subgroup of features necessary for clustering and in the second case a semi-supervised method can use the labelled data to create constraints and enhance the clustering solution. In this paper we propose a K-Means inspired algorithm that employs these techniques. We show that the algorithm maintains the high performance of other similar semi-supervised algorthms as well as keeping the ability to identify informative from uninformative features. We examine the performance of the algorithm on real world data sets with unknown features quality as well as a real world data set with a known uninformative feature. We use a series of scenarios with different number and types of constraints.
READ FULL TEXT