Simple, Scalable, and Stable Variational Deep Clustering

05/16/2020
by   Lele Cao, et al.
0

Deep clustering (DC) has become the state-of-the-art for unsupervised clustering. In principle, DC represents a variety of unsupervised methods that jointly learn the underlying clusters and the latent representation directly from unstructured datasets. However, DC methods are generally poorly applied due to high operational costs, low scalability, and unstable results. In this paper, we first evaluate several popular DC variants in the context of industrial applicability using eight empirical criteria. We then choose to focus on variational deep clustering (VDC) methods, since they mostly meet those criteria except for simplicity, scalability, and stability. To address these three unmet criteria, we introduce four generic algorithmic improvements: initial γ-training, periodic β-annealing, mini-batch GMM (Gaussian mixture model) initialization, and inverse min-max transform. We also propose a novel clustering algorithm S3VDC (simple, scalable, and stable VDC) that incorporates all those improvements. Our experiments show that S3VDC outperforms the state-of-the-art on both benchmark tasks and a large unstructured industrial dataset without any ground truth label. In addition, we analytically evaluate the usability and interpretability of S3VDC.

READ FULL TEXT

page 3

page 13

research
06/11/2021

Deep Conditional Gaussian Mixture Model for Constrained Clustering

Constrained clustering has gained significant attention in the field of ...
research
11/16/2016

Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering

Clustering is among the most fundamental tasks in computer vision and ma...
research
01/11/2022

Deep clustering with fusion autoencoder

Embracing the deep learning techniques for representation learning in cl...
research
05/22/2023

Deep Clustering for Data Cleaning and Integration

Deep Learning (DL) techniques now constitute the state-of-the-art for im...
research
09/25/2019

Disentangling to Cluster: Gaussian Mixture Variational Ladder Autoencoders

In clustering we normally output one cluster variable for each datapoint...
research
12/06/2021

Top-Down Deep Clustering with Multi-generator GANs

Deep clustering (DC) leverages the representation power of deep architec...
research
03/27/2022

DeepDPM: Deep Clustering With an Unknown Number of Clusters

Deep Learning (DL) has shown great promise in the unsupervised task of c...

Please sign up or login with your details

Forgot password? Click here to reset