Classes are not Clusters: Improving Label-based Evaluation of Dimensionality Reduction

08/01/2023
by   Hyeon Jeon, et al.
0

A common way to evaluate the reliability of dimensionality reduction (DR) embeddings is to quantify how well labeled classes form compact, mutually separated clusters in the embeddings. This approach is based on the assumption that the classes stay as clear clusters in the original high-dimensional space. However, in reality, this assumption can be violated; a single class can be fragmented into multiple separated clusters, and multiple classes can be merged into a single cluster. We thus cannot always assure the credibility of the evaluation using class labels. In this paper, we introduce two novel quality measures – Label-Trustworthiness and Label-Continuity (Label-T C) – advancing the process of DR evaluation based on class labels. Instead of assuming that classes are well-clustered in the original space, Label-T C work by (1) estimating the extent to which classes form clusters in the original and embedded spaces and (2) evaluating the difference between the two. A quantitative evaluation showed that Label-T C outperform widely used DR evaluation measures (e.g., Trustworthiness and Continuity, Kullback-Leibler divergence) in terms of the accuracy in assessing how well DR embeddings preserve the cluster structure, and are also scalable. Moreover, we present case studies demonstrating that Label-T C can be successfully used for revealing the intrinsic characteristics of DR techniques and their hyperparameters.

READ FULL TEXT

page 5

page 8

research
10/01/2021

Visual Cluster Separation Using High-Dimensional Sharpened Dimensionality Reduction

Applying dimensionality reduction (DR) to large, high-dimensional data s...
research
08/01/2023

ZADU: A Python Library for Evaluating the Reliability of Dimensionality Reduction Embeddings

Dimensionality reduction (DR) techniques inherently distort the original...
research
05/10/2019

Supporting Analysis of Dimensionality Reduction Results with Contrastive Learning

Dimensionality reduction (DR) is frequently used for analyzing and visua...
research
07/16/2021

Measuring and Explaining the Inter-Cluster Reliability of Multidimensional Projections

We propose Steadiness and Cohesiveness, two novel metrics to measure the...
research
01/15/2021

Multi-point dimensionality reduction to improve projection layout reliability

In ordinary Dimensionality Reduction (DR), each data instance in an m-di...
research
12/28/2019

Measuring group-separability in geometrical space for evaluation of pattern recognition and embedding algorithms

Evaluating data separation in a geometrical space is fundamental for pat...
research
08/26/2023

Class-constrained t-SNE: Combining Data Features and Class Probabilities

Data features and class probabilities are two main perspectives when, e....

Please sign up or login with your details

Forgot password? Click here to reset