Union of Intersections (UoI) for Interpretable Data Driven Discovery and Prediction

05/22/2017
by   Kristofer E. Bouchard, et al.
0

The increasing size and complexity of scientific data could dramatically enhance discovery and prediction for basic scientific applications. Realizing this potential, however, requires novel statistical analysis methods that are both interpretable and predictive. We introduce Union of Intersections (UoI), a flexible, modular, and scalable framework for enhanced model selection and estimation. Methods based on UoI perform model selection and model estimation through intersection and union operations, respectively. We show that UoI-based methods achieve low-variance and nearly unbiased estimation of a small number of interpretable features, while maintaining high-quality prediction accuracy. We perform extensive numerical investigation to evaluate a UoI algorithm (UoI_Lasso) on synthetic and real data. In doing so, we demonstrate the extraction of interpretable functional networks from human electrophysiology recordings as well as accurate prediction of phenotypes from genotype-phenotype data with reduced features. We also show (with the UoI_L1Logistic and UoI_CUR variants of the basic framework) improved prediction parsimony for classification and matrix factorization on several benchmark biomedical data sets. These results suggest that methods based on the UoI framework could improve interpretation and prediction in data-driven discovery across scientific fields.

READ FULL TEXT
research
08/29/2019

Sparse, Low-bias, and Scalable Estimation of High Dimensional Vector Autoregressive Models via Union of Intersections

Vector autoregressive (VAR) models are widely used for causal discovery ...
research
03/25/2021

Searching for waveforms on spatially-filtered epileptic ECoG

Seizures are one of the defining symptoms in patients with epilepsy, and...
research
10/05/2021

Model-Adaptive Interface Generation for Data-Driven Discovery

Discovery of new knowledge is increasingly data-driven, predicated on a ...
research
10/30/2017

Contextual Regression: An Accurate and Conveniently Interpretable Nonlinear Model for Mining Discovery from Scientific Data

Machine learning algorithms such as linear regression, SVM and neural ne...
research
10/13/2021

A Field Guide to Scientific XAI: Transparent and Interpretable Deep Learning for Bioinformatics Research

Deep learning has become popular because of its potential to achieve hig...
research
04/20/2022

A Data-Driven Method for Automated Data Superposition with Applications in Soft Matter Science

The superposition of data sets with internal parametric self-similarity ...

Please sign up or login with your details

Forgot password? Click here to reset