A Computational Theory and Semi-Supervised Algorithm for Clustering

06/12/2023
by   Nassir Mohammad, et al.
0

A computational theory for clustering and a semi-supervised clustering algorithm is presented. Clustering is defined to be the obtainment of groupings of data such that each group contains no anomalies with respect to a chosen grouping principle and measure; all other examples are considered to be fringe points, isolated anomalies, anomalous clusters or unknown clusters. More precisely, after appropriate modelling under the assumption of uniform random distribution, any example whose expectation of occurrence is <1 with respect to a group is considered an anomaly; otherwise it is assigned a membership of that group. Thus, clustering is conceived as the dual of anomaly detection. The representation of data is taken to be the Euclidean distance of a point to a cluster median. This is due to the robustness properties of the median to outliers, its approximate location of centrality and so that decision boundaries are general purpose. The kernel of the clustering method is Mohammad's anomaly detection algorithm, resulting in a parameter-free, fast, and efficient clustering algorithm. Acknowledging that clustering is an interactive and iterative process, the algorithm relies on a small fraction of known relationships between examples. These relationships serve as seeds to define the user's objectives and guide the clustering process. The algorithm then expands the clusters accordingly, leaving the remaining examples for exploration and subsequent iterations. Results are presented on synthetic and realworld data sets, demonstrating the advantages over the most widely used clustering methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2019

Integrated Clustering and Anomaly Detection (INCAD) for Streaming Data (Revised)

Most current clustering based anomaly detection methods use scoring sche...
research
12/21/2021

Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly Types

We introduce anomaly clustering, whose goal is to group data into semant...
research
03/23/2021

Anomaly detection using principles of human perception

In the fields of statistics and unsupervised machine learning a fundamen...
research
09/26/2013

Determinantal Clustering Processes - A Nonparametric Bayesian Approach to Kernel Based Semi-Supervised Clustering

Semi-supervised clustering is the task of clustering data points into cl...
research
07/31/2013

Who and Where: People and Location Co-Clustering

In this paper, we consider the clustering problem on images where each i...
research
10/17/2019

Multi-level conformal clustering: A distribution-free technique for clustering and anomaly detection

In this work we present a clustering technique called multi-level confor...
research
02/07/2022

A Least Square Approach to Semi-supervised Local Cluster Extraction

A least square semi-supervised local clustering algorithm based on the i...

Please sign up or login with your details

Forgot password? Click here to reset