VISPUR: Visual Aids for Identifying and Interpreting Spurious Associations in Data-Driven Decisions

07/26/2023
by   Xian Teng, et al.
0

Big data and machine learning tools have jointly empowered humans in making data-driven decisions. However, many of them capture empirical associations that might be spurious due to confounding factors and subgroup heterogeneity. The famous Simpson's paradox is such a phenomenon where aggregated and subgroup-level associations contradict with each other, causing cognitive confusions and difficulty in making adequate interpretations and decisions. Existing tools provide little insights for humans to locate, reason about, and prevent pitfalls of spurious association in practice. We propose VISPUR, a visual analytic system that provides a causal analysis framework and a human-centric workflow for tackling spurious associations. These include a CONFOUNDER DASHBOARD, which can automatically identify possible confounding factors, and a SUBGROUP VIEWER, which allows for the visualization and comparison of diverse subgroup patterns that likely or potentially result in a misinterpretation of causality. Additionally, we propose a REASONING STORYBOARD, which uses a flow-based approach to illustrate paradoxical phenomena, as well as an interactive DECISION DIAGNOSIS panel that helps ensure accountable decision-making. Through an expert interview and a controlled user experiment, our qualitative and quantitative results demonstrate that the proposed "de-paradox" workflow and the designed visual analytic system are effective in helping human users to identify and understand spurious associations, as well as to make accountable causal decisions.

READ FULL TEXT

page 1

page 6

page 8

page 9

page 14

research
03/16/2023

ESCAPE: Countering Systematic Errors from Machine's Blind Spots via Interactive Visual Analysis

Classification models learn to generalize the associations between data ...
research
03/01/2018

Challenges and opportunities in visual interpretation of Big Data

We live in a world where data generation is omnipresent. Innovations in ...
research
05/04/2017

A Workflow for Visual Diagnostics of Binary Classifiers using Instance-Level Explanations

Human-in-the-loop data analysis applications necessitate greater transpa...
research
05/31/2022

The Contribution of Lyrics and Acoustics to Collaborative Understanding of Mood

In this work, we study the association between song lyrics and mood thro...
research
07/21/2021

Improving Visualization Interpretation Using Counterfactuals

Complex, high-dimensional data is used in a wide range of domains to exp...

Please sign up or login with your details

Forgot password? Click here to reset