Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps

05/14/2022
by   Donald Bertucci, et al.
57

In this paper, we present DendroMap, a novel approach to interactively exploring large-scale image datasets for machine learning. Machine learning practitioners often explore image datasets by generating a grid of images or projecting high-dimensional representations of images into 2-D using dimensionality reduction techniques (e.g., t-SNE). However, neither approach effectively scales to large datasets because images are ineffectively organized and interactions are insufficiently supported. To address these challenges, we develop DendroMap by adapting Treemaps, a well-known visualization technique. DendroMap effectively organizes images by extracting hierarchical cluster structures from high-dimensional representations of images. It enables users to make sense of the overall distributions of datasets and interactively zoom into specific areas of interests at multiple levels of abstraction. Our case studies with widely-used image datasets for deep learning demonstrate that users can discover insights about datasets and trained models by examining the diversity of images, identifying underperforming subgroups, and analyzing classification errors. We conducted a user study that evaluates the effectiveness of DendroMap in grouping and searching tasks by comparing it with a gridified version of t-SNE and found that participants preferred DendroMap over the compared method.

READ FULL TEXT

page 1

page 2

page 4

page 5

page 6

research
12/13/2022

A Novel Approach For Generating Customizable Light Field Datasets for Machine Learning

To train deep learning models, which often outperform traditional approa...
research
06/20/2021

ExplorerTree: a focus+context exploration approach for 2D embeddings

In exploratory tasks involving high-dimensional datasets, dimensionality...
research
01/11/2023

Large Scale Qualitative Evaluation of Generative Image Model Outputs

Evaluating generative image models remains a difficult problem. This is ...
research
06/15/2023

WizMap: Scalable Interactive Visualization for Exploring Large Machine Learning Embeddings

Machine learning models often learn latent embedding representations tha...
research
08/10/2020

Measures of Complexity for Large Scale Image Datasets

Large scale image datasets are a growing trend in the field of machine l...
research
08/20/2020

An Examination of Grouping and Spatial Organization Tasks for High-Dimensional Data Exploration

How do analysts think about grouping and spatial operations? This overar...
research
05/13/2020

Progressive growing of self-organized hierarchical representations for exploration

Designing agent that can autonomously discover and learn a diversity of ...

Please sign up or login with your details

Forgot password? Click here to reset