A Survey and Implementation of Performance Metrics for Self-Organized Maps

11/11/2020
by   Florent Forest, et al.
0

Self-Organizing Map algorithms have been used for almost 40 years across various application domains such as biology, geology, healthcare, industry and humanities as an interpretable tool to explore, cluster and visualize high-dimensional data sets. In every application, practitioners need to know whether they can trust the resulting mapping, and perform model selection to tune algorithm parameters (e.g. the map size). Quantitative evaluation of self-organizing maps (SOM) is a subset of clustering validation, which is a challenging problem as such. Clustering model selection is typically achieved by using clustering validity indices. While they also apply to self-organized clustering models, they ignore the topology of the map, only answering the question: do the SOM code vectors approximate well the data distribution? Evaluating SOM models brings in the additional challenge of assessing their topology: does the mapping preserve neighborhood relationships between the map and the original data? The problem of assessing the performance of SOM models has already been tackled quite thoroughly in literature, giving birth to a family of quality indices incorporating neighborhood constraints, called topographic indices. Commonly used examples of such metrics are the topographic error, neighborhood preservation or the topographic product. However, open-source implementations are almost impossible to find. This is the issue we try to solve in this work: after a survey of existing SOM performance metrics, we implemented them in Python and widely used numerical libraries, and provide them as an open-source library, SOMperf. This paper introduces each metric available in our module along with usage examples.

READ FULL TEXT

page 3

page 7

page 8

research
07/10/2020

SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics

We present SacreROUGE, an open-source library for using and developing s...
research
11/14/2006

Advances in Self Organising Maps

The Self-Organizing Map (SOM) with its related extensions is the most po...
research
01/08/2018

Online Cluster Validity Indices for Streaming Data

Cluster analysis is used to explore structure in unlabeled data sets in ...
research
05/23/2023

Clustering Indices based Automatic Classification Model Selection

Classification model selection is a process of identifying a suitable mo...
research
10/09/2016

A new selection strategy for selective cluster ensemble based on Diversity and Independency

This research introduces a new strategy in cluster ensemble selection by...
research
12/18/2013

SOMz: photometric redshift PDFs with self organizing maps and random atlas

In this paper we explore the applicability of the unsupervised machine l...

Please sign up or login with your details

Forgot password? Click here to reset