Explaining dimensionality reduction results using Shapley values

Dimensionality reduction (DR) techniques have been consistently supporting high-dimensional data analysis in various applications. Besides the patterns uncovered by these techniques, the interpretation of DR results based on each feature's contribution to the low-dimensional representation supports new finds through exploratory analysis. Current literature approaches designed to interpret DR techniques do not explain the features' contributions well since they focus only on the low-dimensional representation or do not consider the relationship among features. This paper presents ClusterShapley to address these problems, using Shapley values to generate explanations of dimensionality reduction techniques and interpret these algorithms using a cluster-oriented analysis. ClusterShapley explains the formation of clusters and the meaning of their relationship, which is useful for exploratory data analysis in various domains. We propose novel visualization techniques to guide the interpretation of features' contributions on clustering formation and validate our methodology through case studies of publicly available datasets. The results demonstrate our approach's interpretability and analysis power to generate insights about pathologies and patients in different conditions using DR results.

READ FULL TEXT

page 5

page 8

page 10

page 11

research
05/10/2019

Supporting Analysis of Dimensionality Reduction Results with Contrastive Learning

Dimensionality reduction (DR) is frequently used for analyzing and visua...
research
06/14/2021

HUMAP: Hierarchical Uniform Manifold Approximation and Projection

Dimensionality reduction (DR) techniques help analysts to understand pat...
research
06/20/2021

ExplorerTree: a focus+context exploration approach for 2D embeddings

In exploratory tasks involving high-dimensional datasets, dimensionality...
research
01/26/2021

Contrastive analysis for scatter plot-based representations of dimensionality reduction

Exploring multidimensional datasets is a ubiquitous part of the ones wor...
research
01/07/2022

Unsupervised Machine Learning for Exploratory Data Analysis of Exoplanet Transmission Spectra

Transit spectroscopy is a powerful tool to decode the chemical compositi...
research
08/26/2023

Class-constrained t-SNE: Combining Data Features and Class Probabilities

Data features and class probabilities are two main perspectives when, e....
research
12/28/2019

Measuring group-separability in geometrical space for evaluation of pattern recognition and embedding algorithms

Evaluating data separation in a geometrical space is fundamental for pat...

Please sign up or login with your details

Forgot password? Click here to reset