An improved chromosome formulation for genetic algorithms applied to variable selection with the inclusion of interaction terms

04/22/2016
by   Chee Chun Gan, et al.
0

Genetic algorithms are a well-known method for tackling the problem of variable selection. As they are non-parametric and can use a large variety of fitness functions, they are well-suited as a variable selection wrapper that can be applied to many different models. In almost all cases, the chromosome formulation used in these genetic algorithms consists of a binary vector of length n for n potential variables indicating the presence or absence of the corresponding variables. While the aforementioned chromosome formulation has exhibited good performance for relatively small n, there are potential problems when the size of n grows very large, especially when interaction terms are considered. We introduce a modification to the standard chromosome formulation that allows for better scalability and model sparsity when interaction terms are included in the predictor search space. Experimental results show that the indexed chromosome formulation demonstrates improved computational efficiency and sparsity on high-dimensional datasets with interaction terms compared to the standard chromosome formulation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2020

Large-P Variable Selection in Two-Stage Models

Model selection in the large-P small-N scenario is discussed in the fram...
research
04/22/2016

Developing an ICU scoring system with interaction terms using a genetic algorithm

ICU mortality scoring systems attempt to predict patient mortality using...
research
02/14/2020

A comparison of different types of Niching Genetic Algorithms for variable selection in solar radiation estimation

Variable selection problems generally present more than a single solutio...
research
11/17/2017

Variable selection with genetic algorithms using repeated cross-validation of PLS regression models as fitness measure

Genetic algorithms are a widely used method in chemometrics for extracti...
research
02/26/2018

Scalable kernel-based variable selection with sparsistency

Variable selection is central to high-dimensional data analysis, and var...
research
09/02/2022

EPA Particulate Matter Data – Analyses using Local Control Strategy

Analyses of large observational datasets tend to be complicated and pron...
research
09/23/2013

Data Mining using Unguided Symbolic Regression on a Blast Furnace Dataset

In this paper a data mining approach for variable selection and knowledg...

Please sign up or login with your details

Forgot password? Click here to reset