Contextual Regression: An Accurate and Conveniently Interpretable Nonlinear Model for Mining Discovery from Scientific Data

10/30/2017
by   Chengyu Liu, et al.
0

Machine learning algorithms such as linear regression, SVM and neural network have played an increasingly important role in the process of scientific discovery. However, none of them is both interpretable and accurate on nonlinear datasets. Here we present contextual regression, a method that joins these two desirable properties together using a hybrid architecture of neural network embedding and dot product layer. We demonstrate its high prediction accuracy and sensitivity through the task of predictive feature selection on a simulated dataset and the application of predicting open chromatin sites in the human genome. On the simulated data, our method achieved high fidelity recovery of feature contributions under random noise levels up to 200 chromatin dataset, the application of our method not only outperformed the state of the art method in terms of accuracy, but also unveiled two previously unfound open chromatin related histone marks. Our method can fill the blank of accurate and interpretable nonlinear modeling in scientific data mining tasks.

READ FULL TEXT

page 32

page 33

page 34

page 35

page 36

page 39

page 40

page 43

research
10/12/2020

On Feature Selection Using Anisotropic General Regression Neural Network

The presence of irrelevant features in the input dataset tends to reduce...
research
12/16/2022

Interpretable models for extrapolation in scientific machine learning

Data-driven models are central to scientific discovery. In efforts to ac...
research
02/07/2018

Cadre Modeling: Simultaneously Discovering Subpopulations and Predictive Models

We consider the problem in regression analysis of identifying subpopulat...
research
07/11/2021

Machine Learning based CVD Virtual Metrology in Mass Produced Semiconductor Process

A cross-benchmark has been done on three critical aspects, data imputing...
research
12/07/2021

Accelerating Understanding of Scientific Experiments with End to End Symbolic Regression

We consider the problem of learning free-form symbolic expressions from ...
research
07/10/2020

High heritability does not imply accurate prediction under the small additive effects hypothesis

Genome-Wide Association Studies (GWAS) explain only a small fraction of ...
research
05/22/2017

Union of Intersections (UoI) for Interpretable Data Driven Discovery and Prediction

The increasing size and complexity of scientific data could dramatically...

Please sign up or login with your details

Forgot password? Click here to reset