Clustrophile 2: Guided Visual Clustering Analysis

04/09/2018
by   Marco Cavallo, et al.
0

Data clustering is a common unsupervised learning method frequently used in exploratory data analysis. However, identifying relevant structures in unlabeled, high-dimensional data is nontrivial, requiring iterative experimentation with clustering parameters as well as data features and instances. The space of possible clusterings for a typical dataset is vast, and navigating in this vast space is also challenging. The absence of ground-truth labels makes it impossible to define an optimal solution, thus requiring user judgment to establish what can be considered a satisfiable clustering result. Data scientists need adequate interactive tools to effectively explore and navigate the large space of clusterings so as to improve the effectiveness of exploratory clustering analysis. We introduce Clustrophile 2, a new interactive tool for guided clustering analysis. Clustrophile 2 guides users in clustering-based exploratory analysis, adapts user feedback to improve user guidance, facilitates the interpretation of clusters, and helps quickly reason about differences between clusterings. To this end, Clustrophile 2 contributes a novel feature, the clustering tour, to help users choose clustering parameters and assess the quality of different clustering results in relation to current analysis goals and user expectations. We evaluate Clustrophile 2 through a user study with 12 data scientists, who used our tool to explore and interpret sub-cohorts in a dataset of Parkinson's disease patients. Results suggest that Clustrophile 2 improves the speed and effectiveness of exploratory clustering analysis for both experts and non-experts.

READ FULL TEXT

page 1

page 7

research
09/29/2017

Foresight: Rapid Data Exploration Through Guideposts

Current tools for exploratory data analysis (EDA) require users to manua...
research
07/12/2017

Foresight: Recommending Visual Insights

Current tools for exploratory data analysis (EDA) require users to manua...
research
07/29/2021

Interactive Region-of-Interest Discovery using Exploratory Feedback

In this paper, we propose a geospatial data management framework called ...
research
10/05/2017

Clustrophile: A Tool for Visual Clustering Analysis

While clustering is one of the most popular methods for data mining, ana...
research
05/06/2020

Integrating Prior Knowledge in Mixed Initiative Social Network Clustering

We propose a new paradigm—called PK-clustering—to help social scientists...
research
11/03/2019

Geono-Cluster: Interactive Visual Cluster Analysis for Biologists

Biologists often perform clustering analysis to derive meaningful patter...
research
09/21/2020

Interactive Steering of Hierarchical Clustering

Hierarchical clustering is an important technique to organize big data f...

Please sign up or login with your details

Forgot password? Click here to reset