Representation Learning for Clustering: A Statistical Framework

06/19/2015
by   Hassan Ashtiani, et al.
0

We address the problem of communicating domain knowledge from a user to the designer of a clustering algorithm. We propose a protocol in which the user provides a clustering of a relatively small random sample of a data set. The algorithm designer then uses that sample to come up with a data representation under which k-means clustering results in a clustering (of the full data set) that is aligned with the user's clustering. We provide a formal statistical model for analyzing the sample complexity of learning a clustering representation with this paradigm. We then introduce a notion of capacity of a class of possible representations, in the spirit of the VC-dimension, showing that classes of representations that have finite such dimension can be successfully learned with sample size error bounds, and end our discussion with an analysis of that dimension for classes of representations induced by linear embeddings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2020

Computationally efficient sparse clustering

We study statistical and computational limits of clustering when the mea...
research
06/26/2018

Deep k-Means: Jointly Clustering with k-Means and Learning Representations

We study in this paper the problem of jointly clustering and learning re...
research
02/20/2023

Replicable Clustering

In this paper, we design replicable algorithms in the context of statist...
research
01/16/2014

Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback

While traditional research on text clustering has largely focused on gro...
research
01/07/2022

Probabilistic spatial clustering based on the Self Discipline Learning (SDL) model of autonomous learning

Unsupervised clustering algorithm can effectively reduce the dimension o...
research
10/27/2021

Provable Lifelong Learning of Representations

In lifelong learning, the tasks (or classes) to be learned arrive sequen...
research
02/02/2021

A Basis Approach to Surface Clustering

This paper presents a novel method for clustering surfaces. The proposal...

Please sign up or login with your details

Forgot password? Click here to reset