RDEC: Integrating Regularization into Deep Embedded Clustering for Imbalanced Datasets

12/06/2018
by   Yaling Tao, et al.
0

Clustering is a fundamental machine learning task and can be used in many applications. With the development of deep neural networks (DNNs), combining techniques from DNNs with clustering has become a new research direction and achieved some success. However, few studies have focused on the imbalanced-data problem which commonly occurs in real-world applications. In this paper, we propose a clustering method, regularized deep embedding clustering (RDEC), that integrates virtual adversarial training (VAT), a network regularization technique, with a clustering method called deep embedding clustering (DEC). DEC optimizes cluster assignments by pushing data more densely around centroids in latent space, but it is sometimes sensitive to the initial location of centroids, especially in the case of imbalanced data, where the minor class has less chance to be assigned a good centroid. RDEC introduces regularization using VAT to ensure the model's robustness to local perturbations of data. VAT pushes data that are similar in the original space closer together in the latent space, bunching together data from minor classes and thereby facilitating cluster identification by RDEC. Combining the advantages of DEC and VAT, RDEC attains state-of-the-art performance on both balanced and imbalanced benchmark/real-world datasets. For example, accuracies are as high as 98.41 from the MNIST, which is nearly 8

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2021

Deep Embedded K-Means Clustering

Recently, deep clustering methods have gained momentum because of the hi...
research
07/27/2021

Improving ClusterGAN Using Self-AugmentedInformation Maximization of Disentangling LatentSpaces

The Latent Space Clustering in Generative adversarial networks (ClusterG...
research
09/11/2021

Learning Statistical Representation with Joint Deep Embedded Clustering

One of the most promising approaches for unsupervised learning is combin...
research
11/02/2018

Clustering and Learning from Imbalanced Data

A learning classifier must outperform a trivial solution, in case of imb...
research
07/16/2021

ScRAE: Deterministic Regularized Autoencoders with Flexible Priors for Clustering Single-cell Gene Expression Data

Clustering single-cell RNA sequence (scRNA-seq) data poses statistical a...
research
06/16/2023

GraphSHA: Synthesizing Harder Samples for Class-Imbalanced Node Classification

Class imbalance is the phenomenon that some classes have much fewer inst...
research
06/11/2021

A deep learning approach to clustering visual arts

Clustering artworks is difficult for several reasons. On the one hand, r...

Please sign up or login with your details

Forgot password? Click here to reset