JigSaw: A tool for discovering explanatory high-order interactions from random forests

05/09/2020
by   Demetrius DiMucci, et al.
0

Machine learning is revolutionizing biology by facilitating the prediction of outcomes from complex patterns found in massive data sets. Large biological data sets, like those generated by transcriptome or microbiome studies,measure many relevant components that interact in vivo with one another in modular ways.Identifying the high-order interactions that machine learning models use to make predictions would facilitate the development of hypotheses linking combinations of measured components to outcome. By using the structure of random forests, a new algorithmic approach, termed JigSaw,was developed to aid in the discovery of patterns that could explain predictions made by the forest. By examining the patterns of individual decision trees JigSaw identifies high-order interactions between measured features that are strongly associated with a particular outcome and identifies the relevant decision thresholds. JigSaw's effectiveness was tested in simulation studies where it was able to recover multiple ground truth patterns;even in the presence of significant noise. It was then used to find patterns associated with outcomes in two real world data sets.It was first used to identify patterns clinical measurements associated with heart disease. It was then used to find patterns associated with breast cancer using metabolites measured in the blood. In heart disease, JigSaw identified several three-way interactions that combine to explain most of the heart disease records (66 three two-way interactions were recovered that can be combined to explain almost all records (92 method for exploring high-dimensional feature spaces for rules that explain statistical associations with a given outcome and can inspire the generation of testable hypotheses.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2017

Iterative Random Forests to detect predictive and stable high-order interactions

Genomics has revolutionized biology, enabling the interrogation of whole...
research
07/13/2021

Outcome-guided Bayesian Clustering for Disease Subtype Discovery Using High-dimensional Transcriptomic Data

The discovery of disease subtypes is an essential step for developing pr...
research
03/18/2021

Outcome-guided Sparse K-means for Disease Subtype Discovery via Integrating Phenotypic Data with High-dimensional Transcriptomic Data

The discovery of disease subtypes is an essential step for developing pr...
research
10/16/2018

Refining interaction search through signed iterative Random Forests

Advances in supervised learning have enabled accurate prediction in biol...
research
11/06/2015

Finding structure in data using multivariate tree boosting

Technology and collaboration enable dramatic increases in the size of ps...
research
12/15/2021

Investigating myocardial infarction and its effects in patients with urgent medical problems using advanced data mining tools

In medical science, it is very important to gather multiple data on diff...
research
10/05/2018

Predicting and Explaining Behavioral Data with Structured Feature Space Decomposition

Modeling human behavioral data is challenging due to its scale, sparsene...

Please sign up or login with your details

Forgot password? Click here to reset