Greedy Strategy Works for Clustering with Outliers and Coresets Construction
We study the problems of clustering with outliers in high dimension. Though a number of methods have been developed in the past decades, it is still quite challenging to design quality guaranteed algorithms with low complexities for the problems. Our idea is inspired by the greedy method, Gonzalez's algorithm, for solving the problem of ordinary k-center clustering. Based on some novel observations, we show that this greedy strategy actually can handle k-center/median/means clustering with outliers efficiently, in terms of qualities and complexities. We further show that the greedy approach yields small coreset for the problem in doubling metrics, so as to reduce the time complexity significantly. Moreover, a by-product is that the coreset construction can be applied to speedup the popular density-based clustering approach DBSCAN.
READ FULL TEXT