A Global Optimization Algorithm for K-Center Clustering of One Billion Samples

12/30/2022
by   Jiayang Ren, et al.
0

This paper presents a practical global optimization algorithm for the K-center clustering problem, which aims to select K samples as the cluster centers to minimize the maximum within-cluster distance. This algorithm is based on a reduced-space branch and bound scheme and guarantees convergence to the global optimum in a finite number of steps by only branching on the regions of centers. To improve efficiency, we have designed a two-stage decomposable lower bound, the solution of which can be derived in a closed form. In addition, we also propose several acceleration techniques to narrow down the region of centers, including bounds tightening, sample reduction, and parallelization. Extensive studies on synthetic and real-world datasets have demonstrated that our algorithm can solve the K-center problems to global optimal within 4 hours for ten million samples in the serial mode and one billion samples in the parallel mode. Moreover, compared with the state-of-the-art heuristic methods, the global optimum obtained by our algorithm can averagely reduce the objective function by 25.8 synthetic and real-world datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2019

Global Optimal Path-Based Clustering Algorithm

Combinatorial optimization problems for clustering are known to be NP-ha...
research
10/23/2020

Quantizing Multiple Sources to a Common Cluster Center: An Asymptotic Analysis

We consider quantizing an Ld-dimensional sample, which is obtained by co...
research
05/30/2019

Sequential no-Substitution k-Median-Clustering

We study the sample-based k-median clustering objective under a sequenti...
research
07/17/2023

LearnedSort as a learning-augmented SampleSort: Analysis and Parallelization

This work analyzes and parallelizes LearnedSort, the novel algorithm tha...
research
09/09/2021

Towards Sustainable Energy-Efficient Data Centers in Africa

Developing nations are particularly susceptible to the adverse effects o...
research
02/14/2023

Accelerated Fuzzy C-Means Clustering Based on New Affinity Filtering and Membership Scaling

Fuzzy C-Means (FCM) is a widely used clustering method. However, FCM and...

Please sign up or login with your details

Forgot password? Click here to reset