Sirius: A Mutual Information Tool for Exploratory Visualization of Mixed Data

06/09/2021
by   Jane L. Adams, et al.
0

Data scientists across disciplines are increasingly in need of exploratory analysis tools for data sets with a high volume of features. We expand upon graph mining approaches for exploratory analysis of high-dimensional data to introduce Sirius, a visualization package for researchers to explore feature relationships among mixed data types using mutual information and network backbone sparsification. Visualizations of feature relationships aid data scientists in finding meaningful dependence among features, which can engender further analysis for feature selection, feature extraction, projection, identification of proxy variables, or insight into temporal variation at the macro scale. Graph mining approaches for feature analysis exist, such as association networks of binary features, or correlation networks of quantitative features, but mixed data types present a unique challenge for developing comprehensive feature networks for exploratory analysis. Using an information theoretic approach, Sirius supports heterogeneous data sets consisting of binary, continuous quantitative, and discrete categorical data types, and provides a user interface exploring feature pairs with high mutual information scores. We leverage a backbone sparsification approach from network theory as a dimensionality reduction technique, which probabilistically trims edges according to the local network context. Sirius is an open source Python package and Django web application for exploratory visualization, which can be deployed in data analysis pipelines. The Sirius codebase and exemplary data sets can be found at: https://github.com/compstorylab/sirius

READ FULL TEXT

page 1

page 2

page 5

page 9

page 10

page 11

page 12

research
05/01/2021

Stochastic Mutual Information Gradient Estimation for Dimensionality Reduction Networks

Feature ranking and selection is a widely used approach in various appli...
research
07/14/2012

Dimension Reduction by Mutual Information Feature Extraction

During the past decades, to study high-dimensional data in a large varie...
research
02/21/2023

Feature selection algorithm based on incremental mutual information and cockroach swarm optimization

Feature selection is an effective preprocessing technique to reduce data...
research
06/10/2012

Dimension Reduction by Mutual Information Discriminant Analysis

In the past few decades, researchers have proposed many discriminant ana...
research
08/12/2010

Viewpoints: A high-performance high-dimensional exploratory data analysis tool

Scientific data sets continue to increase in both size and complexity. I...
research
06/24/2019

AMIC: An Adaptive Information Theoretic Method to Identify Multi-Scale Temporal Correlations in Big Time Series Data

Recent development in computing, sensing and crowd-sourced data have res...
research
06/24/2019

AMIC: An Adaptive Information Theoretic Method to Identify Multi-Scale Temporal Correlations in Big Time Series Data -- Accepted Version

Recent development in computing, sensing and crowd-sourced data have res...

Please sign up or login with your details

Forgot password? Click here to reset