Selection Bias Tracking and Detailed Subset Comparison for High-Dimensional Data

by   David Borland, et al.

The collection of large, complex datasets has become common across a wide variety of domains. Visual analytics tools increasingly play a key role in exploring and answering complex questions about these large datasets. However, many visualizations are not designed to concurrently visualize the large number of dimensions present in complex datasets (e.g. tens of thousands of distinct codes in an electronic health record system). This fact, combined with the ability of many visual analytics systems to enable rapid, ad-hoc specification of groups, or cohorts, of individuals based on a small subset of visualized dimensions, leads to the possibility of introducing selection bias--when the user creates a cohort based on a specified set of dimensions, differences across many other unseen dimensions may also be introduced. These unintended side effects may result in the cohort no longer being representative of the larger population intended to be studied, which can negatively affect the validity of subsequent analyses. We present techniques for selection bias tracking and visualization that can be incorporated into high-dimensional exploratory visual analytics systems, with a focus on medical data with existing data hierarchies. These techniques include: (1) tree-based cohort provenance and visualization, with a user-specified baseline cohort that all other cohorts are compared against, and visual encoding of the drift for each cohort, which indicates where selection bias may have occurred, and (2) a set of visualizations, including a novel icicle-plot based visualization, to compare in detail the per-dimension differences between the baseline and a user-specified focus cohort. These techniques are integrated into a medical temporal event sequence visual analytics tool. We present example use cases and report findings from domain expert user interviews.


page 3

page 7

page 8

page 9


Selection-Bias-Corrected Visualization via Dynamic Reweighting

The collection and visual analysis of large-scale data from complex syst...

Visual Analysis of High-Dimensional Event Sequence Data via Dynamic Hierarchical Aggregation

Temporal event data are collected across a broad range of domains, and a...

Modeling and Leveraging Analytic Focus During Exploratory Visual Analysis

Visual analytics systems enable highly interactive exploratory data anal...

A Unified Comparison of User Modeling Techniques for Predicting Data Interaction and Detecting Exploration Bias

The visual analytics community has proposed several user modeling algori...

Improving Visualization Interpretation Using Counterfactuals

Complex, high-dimensional data is used in a wide range of domains to exp...

VizNet: Towards A Large-Scale Visualization Learning and Benchmarking Repository

Researchers currently rely on ad hoc datasets to train automated visuali...

Exploring How Personality Models Information Visualization Preferences

Recent research on information visualization has shown how individual di...

Please sign up or login with your details

Forgot password? Click here to reset