A principle feature analysis

01/29/2021
by   Tim Breitenbach, et al.
0

A key task of data science is to identify relevant features linked to certain output variables that are supposed to be modeled or predicted. To obtain a small but meaningful model, it is important to find stochastically independent variables capturing all the information necessary to model or predict the output variables sufficiently. Therefore, we introduce in this work a framework to detect linear and non-linear dependencies between different features. As we will show, features that are actually functions of other features do not represent further information. Consequently, a model reduction neglecting such features conserves the relevant information, reduces noise and thus improves the quality of the model. Furthermore, a smaller model makes it easier to adopt a model of a given system. In addition, the approach structures dependencies within all the considered features. This provides advantages for classical modeling starting from regression ranging to differential equations and for machine learning. To show the generality and applicability of the presented framework 2154 features of a data center are measured and a model for classification for faulty and non-faulty states of the data center is set up. This number of features is automatically reduced by the framework to 161 features. The prediction accuracy for the reduced model even improves compared to the model trained on the total number of features. A second example is the analysis of a gene expression data set where from 9513 genes 9 genes are extracted from whose expression levels two cell clusters of macrophages can be distinguished.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2021

Genotype-Guided Radiomics Signatures for Recurrence Prediction of Non-Small-Cell Lung Cancer

Non-small cell lung cancer (NSCLC) is a serious disease and has a high r...
research
10/21/2019

Is graph biased feature selection of genes better than random?

Gene interaction graphs aim to capture various relationships between gen...
research
07/10/2018

DeepDiff: Deep-learning for predicting Differential gene expression from histone modifications

Computational methods that predict differential gene expression from his...
research
10/25/2022

A single-cell gene expression language model

Gene regulation is a dynamic process that connects genotype and phenotyp...
research
01/22/2019

The autofeat Python Library for Automatic Feature Engineering and Selection

This paper describes the autofeat Python library, which provides a sciki...
research
09/28/2022

Identifying Differential Equations to predict Blood Glucose using Sparse Identification of Nonlinear Systems

Describing dynamic medical systems using machine learning is a challengi...
research
08/16/2021

Detecting and interpreting faults in vulnerable power grids with machine learning

Unscheduled power disturbances cause severe consequences both for custom...

Please sign up or login with your details

Forgot password? Click here to reset