Balancing the Tradeoff Between Clustering Value and Interpretability

12/17/2019
by   Sandhya Saisubramanian, et al.
16

Graph clustering groups entities – the vertices of a graph – based on their similarity, typically using a complex distance function over a large number of features. Successful integration of clustering approaches in automated decision-support systems hinges on the interpretability of the resulting clusters. This paper addresses the problem of generating interpretable clusters, given features of interest that signify interpretability to an end-user, by optimizing interpretability in addition to common clustering objectives. We propose a β-interpretable clustering algorithm that ensures that at least β fraction of nodes in each cluster share the same feature value. The tunable parameter β is user-specified. We also present a more efficient algorithm for scenarios with β=1 and analyze the theoretical guarantees of the two algorithms. Finally, we empirically demonstrate the benefits of our approaches in generating interpretable clusters using four real-world datasets. The interpretability of the clusters is complemented by generating simple explanations denoting the feature values of the nodes in the clusters, using frequent pattern mining.

READ FULL TEXT
research
01/30/2023

Optimal Decision Trees For Interpretable Clustering with Constraints

Constrained clustering is a semi-supervised task that employs a limited ...
research
05/24/2021

Deep Descriptive Clustering

Recent work on explainable clustering allows describing clusters when th...
research
10/11/2018

FeatureLego: Volume Exploration Using Exhaustive Clustering of Super-Voxels

We present a volume exploration framework, FeatureLego, that uses a nove...
research
09/15/2023

Automated dermatoscopic pattern discovery by clustering neural network output for human-computer interaction

Background: As available medical image datasets increase in size, it bec...
research
10/22/2019

Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis

Clustering is a difficult and widely-studied data mining task, with many...
research
12/21/2019

Regularized Operating Envelope with Interpretability and Implementability Constraints

Operating envelope is an important concept in industrial operations. Acc...
research
02/16/2022

Spatial Transformer K-Means

K-means defines one of the most employed centroid-based clustering algor...

Please sign up or login with your details

Forgot password? Click here to reset