Semi-supervised clustering methods

07/01/2013
by   Eric Bair, et al.
0

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided.

READ FULL TEXT

page 2

page 15

page 16

page 17

page 18

page 19

page 20

page 24

research
04/13/2013

Identification of relevant subtypes via preweighted sparse clustering

Cluster analysis methods are used to identify homogeneous subgroups in a...
research
05/03/2017

Semi-supervised cross-entropy clustering with information bottleneck constraint

In this paper, we propose a semi-supervised clustering method, CEC-IB, t...
research
02/28/2023

Semi-Supervised Constrained Clustering: An In-Depth Overview, Ranked Taxonomy and Future Research Directions

Clustering is a well-known unsupervised machine learning approach capabl...
research
07/11/2014

Biclustering Via Sparse Clustering

In many situations it is desirable to identify clusters that differ with...
research
05/22/2017

Improved Clustering with Augmented k-means

Identifying a set of homogeneous clusters in a heterogeneous dataset is ...
research
05/18/2023

Computational thematics: Comparing algorithms for clustering the genres of literary fiction

What are the best methods of capturing thematic similarity between liter...
research
04/28/2013

Semi-supervised Eigenvectors for Large-scale Locally-biased Learning

In many applications, one has side information, e.g., labels that are pr...

Please sign up or login with your details

Forgot password? Click here to reset