Meta-Learning to Cluster

10/30/2019
by   Yibo Jiang, et al.
0

Clustering is one of the most fundamental and wide-spread techniques in exploratory data analysis. Yet, the basic approach to clustering has not really changed: a practitioner hand-picks a task-specific clustering loss to optimize and fit the given data to reveal the underlying cluster structure. Some types of losses—such as k-means, or its non-linear version: kernelized k-means (centroid based), and DBSCAN (density based)—are popular choices due to their good empirical performance on a range of applications. Although every so often the clustering output using these standard losses fails to reveal the underlying structure, and the practitioner has to custom-design their own variation. In this work we take an intrinsically different approach to clustering: rather than fitting a dataset to a specific clustering loss, we train a recurrent model that learns how to cluster. The model uses as training pairs examples of datasets (as input) and its corresponding cluster identities (as output). By providing multiple types of training datasets as inputs, our model has the ability to generalize well on unseen datasets (new clustering tasks). Our experiments reveal that by training on simple synthetically generated datasets or on existing real datasets, we can achieve better clustering performance on unseen real-world datasets when compared with standard benchmark clustering techniques. Our meta clustering model works well even for small datasets where the usual deep learning models tend to perform worse.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2018

Meta-learning autoencoders for few-shot prediction

Compared to humans, machine learning models generally require significan...
research
09/30/2019

Deep Amortized Clustering

We propose a deep amortized clustering (DAC), a neural architecture whic...
research
09/30/2021

Deep Embedded K-Means Clustering

Recently, deep clustering methods have gained momentum because of the hi...
research
10/16/2019

Generalized Clustering by Learning to Optimize Expected Normalized Cuts

We introduce a novel end-to-end approach for learning to cluster in the ...
research
10/06/2021

T-SNE Is Not Optimized to Reveal Clusters in Data

Cluster visualization is an essential task for nonlinear dimensionality ...
research
10/19/2011

A Reliable Effective Terascale Linear Learning System

We present a system and a set of techniques for learning linear predicto...
research
04/27/2023

ClusterNet: A Perception-Based Clustering Model for Scattered Data

Cluster separation in scatterplots is a task that is typically tackled b...

Please sign up or login with your details

Forgot password? Click here to reset