Exploring Multi-dimensional Data via Subset Embedding

04/24/2021
by   Peng Xie, et al.
17

Multi-dimensional data exploration is a classic research topic in visualization. Most existing approaches are designed for identifying record patterns in dimensional space or subspace. In this paper, we propose a visual analytics approach to exploring subset patterns. The core of the approach is a subset embedding network (SEN) that represents a group of subsets as uniformly-formatted embeddings. We implement the SEN as multiple subnets with separate loss functions. The design enables to handle arbitrary subsets and capture the similarity of subsets on single features, thus achieving accurate pattern exploration, which in most cases is searching for subsets having similar values on few features. Moreover, each subnet is a fully-connected neural network with one hidden layer. The simple structure brings high training efficiency. We integrate the SEN into a visualization system that achieves a 3-step workflow. Specifically, analysts (1) partition the given dataset into subsets, (2) select portions in a projected latent space created using the SEN, and (3) determine the existence of patterns within selected subsets. Generally, the system combines visualizations, interactions, automatic methods, and quantitative measures to balance the exploration flexibility and operation efficiency, and improve the interpretability and faithfulness of the identified patterns. Case studies and quantitative experiments on multiple open datasets demonstrate the general applicability and effectiveness of our approach.

READ FULL TEXT

page 1

page 2

page 4

page 6

page 7

page 10

page 11

page 12

research
09/23/2022

Incorporation of Human Knowledge into Data Embeddings to Improve Pattern Significance and Interpretability

Embedding is a common technique for analyzing multi-dimensional data. Ho...
research
11/02/2021

UnProjection: Leveraging Inverse-Projections for Visual Analytics of High-Dimensional Data

Projection techniques are often used to visualize high-dimensional data,...
research
06/14/2019

Confluent-Drawing Parallel Coordinates: Web-Based Interactive Visual Analytics of Large Multi-Dimensional Data

Parallel coordinates plot is one of the most popular and widely used vis...
research
08/15/2022

A Novel Tree Visualization to Guide Interactive Exploration of Multi-dimensional Topological Hierarchies

Understanding the response of an output variable to multi-dimensional in...
research
07/21/2021

Improving Visualization Interpretation Using Counterfactuals

Complex, high-dimensional data is used in a wide range of domains to exp...
research
08/05/2023

Dataopsy: Scalable and Fluid Visual Exploration using Aggregate Query Sculpting

We present aggregate query sculpting (AQS), a faceted visual query techn...
research
03/02/2023

DataPilot: Utilizing Quality and Usage Information for Subset Selection during Visual Data Preparation

Selecting relevant data subsets from large, unfamiliar datasets can be d...

Please sign up or login with your details

Forgot password? Click here to reset