A Unified Framework for Tuning Hyperparameters in Clustering Problems

10/17/2019
by   Xinjie Fan, et al.
57

Selecting hyperparameters for unsupervised learning problems is difficult in general due to the lack of ground truth for validation. However, this issue is prevalent in machine learning, especially in clustering problems with examples including the Lagrange multipliers of penalty terms in semidefinite programming (SDP) relaxations and the bandwidths used for constructing kernel similarity matrices for Spectral Clustering. Despite this, there are not many provable algorithms for tuning these hyperparameters. In this paper, we provide a unified framework with provable guarantees for the above class of problems. We demonstrate our method on two distinct models. First, we show how to tune the hyperparameters in widely used SDP algorithms for community detection in networks. In this case, our method can also be used for model selection. Second, we show the same framework works for choosing the bandwidth for the kernel similarity matrix in Spectral Clustering for subgaussian mixtures under suitable model specification. In a variety of simulation experiments, we show that our framework outperforms other widely used tuning procedures in a broad range of parameter settings.

READ FULL TEXT

page 13

page 15

research
01/20/2020

Randomized Spectral Clustering in Large-Scale Stochastic Block Models

Spectral clustering has been one of the widely used methods for communit...
research
11/12/2017

Unified Spectral Clustering with Optimal Graph

Spectral clustering has found extensive use in many areas. Most traditio...
research
03/05/2020

Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap

In this study, we propose a new spectral clustering framework that can a...
research
05/01/2017

Twin Learning for Similarity and Clustering: A Unified Kernel Approach

Many similarity-based clustering methods work in two separate steps incl...
research
10/19/2012

Learning Generative Models of Similarity Matrices

We describe a probabilistic (generative) view of affinity matrices along...
research
02/23/2017

Spectral Clustering using PCKID - A Probabilistic Cluster Kernel for Incomplete Data

In this paper, we propose PCKID, a novel, robust, kernel function for sp...
research
02/04/2019

Self-Tuning Spectral Clustering for Adaptive Tracking Areas Design in 5G Ultra-Dense Networks

In this paper, we address the issue of automatic tracking areas (TAs) pl...

Please sign up or login with your details

Forgot password? Click here to reset