Penalized K-Nearest-Neighbor-Graph Based Metrics for Clustering

by   Ariel E. Baya, et al.

A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitrary shapes or paths embedded in a high-dimensional space. In this work we introduce the Penalized k-Nearest-Neighbor-Graph (PKNNG) based metric, a new tool for evaluating distances in such cases. The new metric can be used in combination with most clustering algorithms. The PKNNG metric is based on a two-step procedure: first it constructs the k-Nearest-Neighbor-Graph of the dataset of interest using a low k-value and then it adds edges with an exponentially penalized weight for connecting the sub-graphs produced by the first step. We discuss several possible schemes for connecting the different sub-graphs. We use three artificial datasets in four different embedding situations to evaluate the behavior of the new metric, including a comparison among different clustering methods. We also evaluate the new metric in a real world application, clustering the MNIST digits dataset. In all cases the PKNNG metric shows promising clustering results.


page 1

page 2

page 3

page 4


Intrinsic Metrics: Nearest Neighbor and Edge Squared Distances

Some researchers have proposed using non-Euclidean metrics for clusterin...

Rank-based linkage I: triplet comparisons and oriented simplicial complexes

Rank-based linkage is a new tool for summarizing a collection S of objec...

Intrinsic Metrics: Exact Equality between a Geodesic Metric and a Graph metric

Some researchers have proposed using non-Euclidean metrics for clusterin...

Metric recovery from directed unweighted graphs

We analyze directed, unweighted graphs obtained from x_i∈R^d by connecti...

Fast redshift clustering with the Baire (ultra) metric

The Baire metric induces an ultrametric on a dataset and is of linear co...

A Graph-Based Semi-Supervised k Nearest-Neighbor Method for Nonlinear Manifold Distributed Data Classification

k Nearest Neighbors (kNN) is one of the most widely used supervised lear...

Please sign up or login with your details

Forgot password? Click here to reset