Sirius: A Mutual Information Tool for Exploratory Visualization of Mixed Data

06/09/2021
by   Jane L. Adams, et al.
0

Data scientists across disciplines are increasingly in need of exploratory analysis tools for data sets with a high volume of features. We expand upon graph mining approaches for exploratory analysis of high-dimensional data to introduce Sirius, a visualization package for researchers to explore feature relationships among mixed data types using mutual information and network backbone sparsification. Visualizations of feature relationships aid data scientists in finding meaningful dependence among features, which can engender further analysis for feature selection, feature extraction, projection, identification of proxy variables, or insight into temporal variation at the macro scale. Graph mining approaches for feature analysis exist, such as association networks of binary features, or correlation networks of quantitative features, but mixed data types present a unique challenge for developing comprehensive feature networks for exploratory analysis. Using an information theoretic approach, Sirius supports heterogeneous data sets consisting of binary, continuous quantitative, and discrete categorical data types, and provides a user interface exploring feature pairs with high mutual information scores. We leverage a backbone sparsification approach from network theory as a dimensionality reduction technique, which probabilistically trims edges according to the local network context. Sirius is an open source Python package and Django web application for exploratory visualization, which can be deployed in data analysis pipelines. The Sirius codebase and exemplary data sets can be found at: https://github.com/compstorylab/sirius

READ FULL TEXT

Authors

page 1

page 2

page 5

page 9

page 10

page 11

page 12

05/01/2021

Stochastic Mutual Information Gradient Estimation for Dimensionality Reduction Networks

Feature ranking and selection is a widely used approach in various appli...
07/14/2012

Dimension Reduction by Mutual Information Feature Extraction

During the past decades, to study high-dimensional data in a large varie...
06/10/2012

Dimension Reduction by Mutual Information Discriminant Analysis

In the past few decades, researchers have proposed many discriminant ana...
02/23/2022

baker: An R package for Nested Partially-Latent Class Models

This paper describes and illustrates the functionality of the baker R pa...
08/12/2010

Viewpoints: A high-performance high-dimensional exploratory data analysis tool

Scientific data sets continue to increase in both size and complexity. I...
06/24/2019

AMIC: An Adaptive Information Theoretic Method to Identify Multi-Scale Temporal Correlations in Big Time Series Data

Recent development in computing, sensing and crowd-sourced data have res...
06/24/2019

AMIC: An Adaptive Information Theoretic Method to Identify Multi-Scale Temporal Correlations in Big Time Series Data -- Accepted Version

Recent development in computing, sensing and crowd-sourced data have res...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.