1. Introduction
When traditional genetic programming (GP) is applied to classification and/or regression, individual programs assume the roles of feature selection, transformation, and model prediction, and are evaluated for their ability to make accurate estimations and/or predictions. The flexibility of evolving the structure and parameters of a model comes with a heavy computational cost that can be mitigated if one instead uses a fast (e.g. polynomialtime) machine learning (ML) method to optimize the parameters of a GP model with respect to an objective function (for example, least squares error minimization with linear regression). With this in mind, many variants of GP have been proposed that embed linear regression and/or local search in each program, leading to better models
(Iba and Sato, 1994; Kommenda et al., 2013; Arnaldo et al., 2014; La Cava et al., 2015). The highlevel takeaway from the success of methods that hybridize GP is that it is best to focus the computational effort of GP on the parts of the modeling process that are known to be NPhard, namely the tasks of feature selection (Foster et al., 2015) and construction (Krawiec, 2002).The task of feature construction, also known as feature engineering or representation learning, is wellmotivated since the central factor affecting the quality of a model derived from ML is the ability of the data representation to facilitate learning (Bengio et al., 2013). This paper focuses on the supervised classification task, for which the goal is to find a mapping
that associates the vector of attributes
with class labels from the set using paired examples . The goal of feature engineering is to find a new representation of via a dimensional feature mapping , such that a classifier more accurately classifies samples than .GPbased approaches to representation learning include evolving single features for decision trees (DT)
(Muharram and Smith, 2005), or coupling ML with each program (Krawiec, 2002; Silva et al., 2015; Žegklitz and Pošík, 2017). Recent work (De Melo, 2014; Arnaldo et al., 2015) has advocated what we refer to as an “ensemble” approach which treats the entire GP population as , with each program representing a transformation of the form . These proposed methods feed the population output into a linear regression model to make predictions.The MLspecific nature of these previous approaches motivates our development of the more general feature engineering wrapper (FEW) method^{1}^{1}1Available from https://lacava.github.io/few and via the Python Package Index: https://pypi.python.org/pypi/FEW, which is a wrapperbased ensemble method of feature engineering with GP (La Cava and Moore, 2017). Unlike previous approaches, FEW allows any learning algorithm in scikitlearn format (Pedregosa et al., 2011) to be used for estimation. FEW has been demonstrated for use in regression with several ML pairings, including Lasso (Tibshirani, 1996)
, linear and nonlinear support vector regression, DT, and knearest neighbors (KNN). Central to its ability to evolve features in a single population is the introduction of
lexicase survival which produces uncorrelated population behavior.The wrapperbased ensemble approach to GP is understudied and presents new challenges from an evolutionary computation standpoint, namely the need for individuals in the population to complement each other in facilitating the learning of the ML method with which they are paired. Our goal in this paper is to use FEW as a test bed for evaluating the ability of several survival and fitness techniques in this new framework for supervised classification. In addition, whereas previously FEW was demonstrated in sidebyside comparisons with default ML methods, here we more robustly analyze whether FEW can, in general, produce better models than existing ML techniques when hyperparameter optimization of every method is considered.
This paper contains four main contributions. First, it presents a muchneeded analysis of fitness and survival methods for ensemblebased representation learning with GP, which is currently lacking in the field. Second, it focuses on the classification task, which has not been the focus of previous methods with this GP framework. Third, it presents robust comparisons of FEW to other ML methods, including a previously proposed GP method that also focuses on feature learning. As a final contribution we analyze a biomedical problem for which FEW is able to correctly identify the nonlinear, underlying structure of the data across ML pairings, thereby showing the usefulness of learning readable data representations.
We pair FEW with several wellknown classifiers in our analysis: logistic regression (LR), support vector classification (SVC), KNN, DT and random forests (RF). We present an overview of FEW in Section
2 including a description of several fitness and survival methods that are tested. We review related work more thoroughly in Section 3, including distinguishing between wrapper and filter approaches as well as single, multiple, and ensemble representations of features in GP. The results of the experiments on FEW and its comparison to other methods is shown in Section 5, with discussion and conclusions following in Section 6.2. Methods
The components of FEW are summarized in Figure 1. The learning process begins by fitting the ML method to the original data. FEW maintains an internal validation set to evaluate new models, which guarantees that the returned model will have a crossvalidation (CV) fitness at least as good as the initial data representation can produce. FEW then initializes a population of feature transformations, , seeded with the features from the initial ML model with nonzero coefficients. Each generation, a new ML model is trained on to produce .
The selection step of FEW is the entry point for new information from the ML method about the quality of the current representation. Methods that admit regularization (available in the scikitlearn implementations of LR and SVC) or feature importance scores (DT and RF) apply selective pressure to the GP population by eliminating any individuals with a corresponding coefficient or feature importance of zero in the ML model. Feature importance for DT and RF is measured using the Gini importance (Breiman and Cutler, 2003). Thus ML and GP share the feature selection role. After selection, the remaining individuals ( in Figure 1) are used to produce offspring, , via subtree crossover and point mutation. In this way FEW differs from previous ensemble representation learning approaches (Arnaldo et al., 2015; McConaghy, 2011) in that it incorporates crossover for variation instead of strict mutation.
The fitness step (see Section 2.1) evaluates the ability of and to adequately distinguish between classes in . The survival step in FEW (see Section 2.2) reduces the pool of parents and offspring back to the original size (), and the surviving set of transformations, , is used at the beginning of the next generation to fit a new ML model.
2.1. Fitness
We compare the three fitness metrics (Eqns. 1–3 below) in our experimental analysis in Section 4.1. In contrast to traditional GP, the fitness of an engineered feature must measure the individual’s ability to separate data between classes rather than its predictive capacity, since is not itself a model. A simple approach to assessing feature quality is to look at the coefficient of determination using
(1) 
For binary classification, seems appropriate, since it only has to capture the correlation of the feature with a change from 0 to 1. However, for multiclass classification, the imposes an additional constraint on the feature by rewarding it for increasing in the direction of the class label values. For certain problems (e.g. one in which the ordering of the class labels corresponds to a degree of risk), this imposed fitness pressure may be warranted, but in the general case we do not want to assume the order of the class labels, nor the relative distance between them in a feature, is meaningful. Instead, we want to reward features that separate samples from different classes and cluster samples within classes.
Other GP feature construction methods have used the Fisher criterion (Guo and Nandi, 2006; Ahmed et al., 2014) for achieving such a measure. The Fisher criterion assigns fitness of a feature as
(2) 
where is the mean of belonging to a class label, i.e. , and
is the standard deviation. The Fisher criterion gives a measure of the average pairwise separation between, and dispersion within, classes for
. However, it does not provide finegrained information about the distance of specific samples in the transformation. In an attempt to extract this information, we include the silhouette score (Rousseeuw, 1987) in our comparisons. Like Eqn. 2, the silhouette score assesses feature quality by combining the withinclass variance with the distance between neighboring classes. Thus it captures both the tightness of a cluster and its overlap with the nearest cluster. The silhouette score
for a single sample is defined as(3) 
Here, is the set of samples with class label , and is the set of samples in the next nearest class (according centroid distance). Thus Eq. (3) takes into account both the pairwise square distances within a class and the separation of neighboring classes from each other. Here the Euclidean distance metric is used. For aggregate fitness of an engineered feature, the average silhouette score over all samples, , is used.
2.2. Survival
Unlike typical populations in modelbased GP, the surviving individuals in FEW are assessed together in an ML estimation, and therefore benefit from being chosen to work well together. In fact, many ML pairings depend on low colinearity between features, including LR and SVC. We test four methods for achieving this cooperation: tournament survival (tournaments of size 2), deterministic crowding, lexicase survival, and random survival. Tournament survival is agnostic to the population structure when selecting survivors, and simply picks the individual in the tournament with the best fitness to survive. Meanwhile, deterministic crowding and lexicase survival are designed to promote feature diversity, which should influence the ability of the population to effectively produce a representation for the ML training step. We include random survival tests to control for the effect of unguided search.
Deterministic crowding (Mahfoud, 1995) is a niching mechanism in which offspring compete only with the parent they are most similar to. We define similarity as the correlation (, Eqn. 1) between a child and its offspring. In the case of mutation, there is only one parent, so no similarity comparison is necessary. Although traditionally a steady state algorithm, its implementation here is generational. Children take the place of their parent in the surviving population if and only if they have a better fitness. This algorithm produces niches in the population which should maintain diverse features.
lexicase survival is a new survival technique adapted from lexicase selection (La Cava et al., 2016) for use in FEW. lexicase selection is, in turn, an adaptation of lexicase selection (Spector, 2012; Helmuth et al., 2014) for continuousvalued problems. Lexicase selection works by pressuring individuals in the population to solve unique subsets of the training samples (i.e. cases) and shifting selective pressure to cases that are the most difficult in terms of population performance. lexicase survival differs from lexicase selection in that it removes the individuals selected at each step from the remaining selection pool, and adds them to the survivors for the next generation. Each iteration of lexicase survival proceeds as follows:
GetSurvivors() :  

training cases  
survivors  
for each parent selection:  
initial pool  
() for  get for each case 
while and :  main loop 
case random choice from  pick a case 
elite best fitness in on case  determine elite 
if fitness() elite  reduce pool 
case  reduce cases 
random choice from  pick survivor 
return  return survivors 
In the routine above, is the median absolute deviation of the fitnesses on case across the population.
3. Related Work
Feature construction has received considerable attention in GP, with implementations falling into single feature, multiple feature and ensemble categories. Single feature representations attempt to evolve a single solution that is an engineered feature as in (Muharram and Smith, 2005; Guo and Nandi, 2006). Multiple feature representations encode a candidate set of feature transformations in each individual (Krawiec, 2002; Smith and Bull, 2005; Silva et al., 2015; La Cava, William et al., 2017), such that each individual is a multioutput estimate of . In this case, a separate ML model is trained on the outputs of each program, and the resulting output is used to assign fitness to each individual. Ensembles are a more recent approach (McConaghy, 2011; De Melo, 2014; Arnaldo et al., 2015; La Cava and Moore, 2017) designed to reduce the computational complexity of fitting a model to each individual. Ensemble approaches instead fit a single ML model to the output of the entire population. This ensemblelike approach treats each individual in the population as single features , and treats the ensemble output of the population as . Among these ensemble methods, FEW shares the most in common with evolutionary feature synthesis (EFS) (Arnaldo et al., 2015) in that it uses the more successful wrapperbased approach (Krawiec, 2002; Smith and Bull, 2005) and incorporates feature selection information from the ML routine. Unlike FEW, EFS pairs exclusively with Lasso (Tibshirani, 1996), uses three population partitions, and does not incorporate crossover between individuals. FEW is motivated by the hypotheses that 1) the ML pairing is best treated like a hyperparameter of the method, and 2) that existing diversitypreserving selection methods can be successfully adapted to the purposes of ensemblebased feature survival. As a final note, previous work does not often consider the effect of tuning the proposed algorithm or the ML approaches to which is compared, which is a vital step in algorithm comparisons (Caruana and NiculescuMizil, 2006) and in the application of ML to realworld problems.
4. Experimental Setup
We conduct two separate sets of experiments. The first set described in Section 4.1 is designed to compare the fitness and survival methods for FEW in combination with different ML methods and hyperparameters. We use the results the first experiment to choose the fitness and survival method for FEW in the second set of experiments. The second set of experiments, described in Section 4.2, is a benchmark comparison of FEW to several ML methods on a larger set of classification problems. All the datasets used in the comparison are freely available via the Penn Machine Learning Benchmark repository^{2}^{2}2https://github.com/EpistasisLab/pennmlbenchmarks.
4.1. FEW comparisons
We tune the choice of fitness and survival methods by performing an experimental analysis of FEW on the tuning problems in Table 3 using the parameters listed in Table 1.
Setting  Values 

Population size  10, 50, 100 
Max depth  2,3 
Fitness  R2, silhouette 
Survival  tournament, deterministic crowding, lexicase 
ML  LR, DT, KNN 
4.2. Comparison to other methods
We evaluate FEW’s performance in comparison to six other ML approaches: Gaussian naïve Bayes (NB), LR, KNN, SVC, RF, and M4GP (La Cava, William et al., 2017), a multifeature GP method derived from (Silva et al., 2015) that couples a multifeature representation with a nearest centroid classifier (Tibshirani et al., 2002). For more information on the implementations of NB, LR, KNN, SVC, and RF, refer to (Pedregosa et al., 2011). These methods are evaluated on 20 classification problems that vary in numbers of classes, samples and features, as seen in Table 2. To ensure robust comparisons, we include hyperparameter optimization in the training phase for each method. To do so, we do a grid search of the hyperparameters of each method (shown in Table 2), using 5fold crossvalidation on the training set to choose the final parameters. The model with the best average cross validation accuracy on the training set is evaluated on the test set. This process is repeated for 30 shuffled, 50/50 train/test splits of the data. In an attempt to control for the different possible hyperparameter combinations between the methods, we limited each grid search to a maximum of 100 combinations of hyperparameter settings during training.
The hyperparameters considered for FEW (see Table 2) include the population size, the ML method. expressed as a function of the number of features in the data, the output type of the features (float or bool), and max feature depth. Floating point outputs use the operator set , , , , , , , , , , and boolean outputs add AND, OR, XOR, , , , , , . It is important to note that the tuning of the ML method is not considered when paired with FEW. As a result, this experiment compares the relative effects of learning a representation for a default ML method to tuning the hyperparameters of those methods.
Method  hyperparameters 

FEW  Population (0.25,,3); ML (LR, KNN, RF, SVM); output type (bool, float); max depth (2,3) 
M4GP  Population size (250, 500, 1000); generations (50,100,500,1000); selection method (tournament, lexicase); max length (10, 25, 50, 100) 
Gaussian Naïve Bayes  none 
Logistic Regression  Regularization coefficient (0.001,…,100); penalty (, ,elastic net); epochs (5,10) 
Support Vector Classifier 
Regularization coefficient (0.01,…,100,‘auto’); (0.01, 10, 1000, ‘auto’); kernel (linear, sigmoid, radial basis function) 
Random Forest Classifier  No. estimators (10, 100, 1000); minimum weight fraction for leaf (0.0, 0.25, 0.5); max features (, , None); splitting criterion (entropy, gini) 
KNearest Neighbor Classifier  K (1,…,50); weights (uniform, distance) 
Dataset  Classes  Samples  Features 

Tuning Problems  
auto  5  202  25 
calendarDOW  5  399  32 
corral  2  160  6 
new thyroid  3  215  5 
Benchmark Problems  
analcatdata authorship  4  841  70 
analcatdata cyyoung8092  2  97  10 
coil2000  2  9822  85 
GMT 2w1000a0.4h  2  1600  1000 
GMT 2w20a0.4h  2  1600  20 
german  2  1000  20 
Hill Valley with noise  2  1212  100 
Hill Valley without noise  2  1212  100 
magic  2  19020  10 
mfeat fourier  10  2000  76 
mfeat pixel  10  2000  240 
molecular biology promoters  2  106  58 
monk2  2  601  6 
optdigits  10  5620  64 
parity5+5  2  1124  10 
schizo  2  340  14 
texture  11  5500  40 
vowel  11  990  13 
xd6  2  973  9 
yeast  9  1479  8 
5. Results
The fitness and survival methods are compared on the tuning datasets in Figures 2 and 3, respectively. The fitness metric comparisons yield unexpected results. The Fisher criterion is outperformed by both R and the silhouette score in 3 out of 4 problems ( 4.8e7). Surprisingly we find that the silhouette score does not outperform R as a fitness metric either; across problems and ML pairings, there is no significant difference in performance aside from newthyroid. This is surprising given our hypothesis in Section 2.1 that the class label assumptions implicit in the R would make it less suited to classification with multiple labels. According to this evidence in conjunction with the lower complexity of , we opt to use as the fitness criterion for the benchmark comparison.
We find that
lexicase survival produces more accurate classifiers than deterministic crowding, tournament and random survival across problems and ML pairings. It is significantly correlated with higher test accuracy according to a ttest (
2e16) and significantly outperforms tournament ( 0.002) and deterministic crowding ( 2.4e7) according to all pairwise Wilcoxon tests, correcting for multiple comparisons. lexicase survival also outperforms random survival on auto ( 4.4e8) and newthyroid ( 2e16), and ties it on the other two problems (for calendarDOW, 0.094). Random survival performs strongly compared to tournament and deterministic crowding survival, outperforming those methods on 3 out of 4 problems. The results motivate our use of lexicase survival in the benchmark comparison.The test set accuracies of the 7 method comparisons on the benchmark datasets are shown in boxplot form in Figure 4 and the mean rankings are summarized in Figure 5. Across problems, performance varies, generally with RF, SVC, M4GP or FEW producing the highest test accuracy. Whereas FEW generally does well on the problems for which M4GP excels, FEW also does well in cases where M4GP underperforms, which is likely due to FEW’s ability to tune the ML method with which it is paired. Three problems stand out for being particularly amenable to feature engineering: GMT 2w20a0.4h, Hill_Valley_without_noise, and parity5+5. These three problems are wellknown for containing strong interactions between features, which helps explain the observed increase in performance from FEW. In terms of mean rankings across problems, FEW generates the best classifiers among the methods tested, followed closely by SVC and RF. A Friedman test of the rankings with posthoc analysis reveals RF, SVC, and FEW significantly outperform NB and LR across all problems (0.039).
As expected, the computation time of FEW is higher than other ML methods (see Figure 6) due to its wrapperbased approach. The quicker performance of M4GP may be explained by its c++ implementation compared to FEW’s Python implementation, as well as M4GP’s use of a consistently fast ML pairing.
We show models generated with single runs of FEW on GMT 2w20a0.4h in Table 4 using DT and LR. This genetics problem is generated using the GAMETES simulation tool (Urbanowicz et al., 2012). It consists of 20 attributes, 18 of which are noise, and two of which interact epistatically, meaning they must be considered together to infer the correct class (the labels contain noise as well). The models correctly identify the interaction between features 18 and 19. For this problem FEW’s transformation provides the essential knowledge required to solve this problem, whereas the ML approaches simply serve as a discriminant function for processing the information presented via the transformation.
Decision Tree Model  

Importance  Feature 
0.899  XOR 
0.084  
0.017  
Logistic Regression Model  
Coefficient  Feature 
1.992  XOR 
1.433  
0.996  XOR 
0.102  
Performance  Decision Tree  Logistic Regression 

Initial ML CV accuracy  0.487  0.473 
Final model CV accuracy  0.763  0.803 
Test accuracy  0.787  0.755 
Runtime (s)  8.2  8.2 
6. Discussion & Conclusion
Our results suggest that FEW is a useful technique for supervised classification problems. FEW performs the best on average among the algorithms tested, which include optimized SVM, RF, KNN, M4GP, LR and NB models. This result provides evidence with these ML methods that the data representation can influence algorithm performance as much as, if not more than, the parameter settings of those algorithms. Although it hasn’t been tested here, it is likely that including hyperparameter optimization of the ML methods paired with FEW in the tuning step would show even greater gains in performance over the baseline approach. FEW also performs better than a multiple feature GP approach (M4GP) that uses a fixed ML pairing.
Despite FEW’s runtime in these tests, a complexity analysis suggests it is wellpositioned for large datasets in comparison to other feature construction techniques. Whereas techniques like polynomial feature expansion scale poorly with the number of features ( for an degree polynomial) and techniques like kernel transformations scale poorly with the numbers of samples () (Friedman et al., 2001), FEW scales independently of the features in the dataset, linearly with , and quadratically with the population size. These observations warrant further investigation with large datasets.
7. Acknowledgements
This work was supported by the Warren Center for Network and Data Science at the University of Pennsylvania, as well as NIH grants P30ES013508, AI116794 and LM009012.
References
 (1)
 Ahmed et al. (2014) Soha Ahmed, Mengjie Zhang, Lifeng Peng, and Bing Xue. 2014. Multiple feature construction for effective biomarker identification and classification using genetic programming. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. ACM, 249–256. http://dl.acm.org/citation.cfm?id=2598292
 Arnaldo et al. (2014) Ignacio Arnaldo, Krzysztof Krawiec, and UnaMay O’Reilly. 2014. Multiple regression genetic programming. In Proceedings of the 2014 conference on Genetic and evolutionary computation. ACM Press, 879–886. DOI:http://dx.doi.org/10.1145/2576768.2598291
 Arnaldo et al. (2015) Ignacio Arnaldo, UnaMay O’Reilly, and Kalyan Veeramachaneni. 2015. Building Predictive Models via Feature Synthesis. ACM Press, 983–990. DOI:http://dx.doi.org/10.1145/2739480.2754693
 Bengio et al. (2013) Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35, 8 (2013), 1798–1828. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6472238
 Breiman and Cutler (2003) Leo Breiman and Adele Cutler. 2003. Random Forests. (2003). http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

Caruana and
NiculescuMizil (2006)
Rich Caruana and
Alexandru NiculescuMizil.
2006.
An empirical comparison of supervised learning algorithms. In
Proceedings of the 23rd international conference on Machine learning. ACM, 161–168. http://dl.acm.org/citation.cfm?id=1143865  De Melo (2014) Viníícius Veloso De Melo. 2014. Kaizen programming. In GECCO ’14: Proceedings of the Genetic and Evolutionary Computation Conference. ACM Press, 895–902. DOI:http://dx.doi.org/10.1145/2576768.2598264
 Foster et al. (2015) Dean Foster, Howard Karloff, and Justin Thaler. 2015. Variable selection is hard. In Proceedings of The 28th Conference on Learning Theory. 696–709. http://www.jmlr.org/proceedings/papers/v40/Foster15.pdf
 Friedman et al. (2001) Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2001. The elements of statistical learning. Vol. 1. Springer series in statistics Springer, Berlin. http://statweb.stanford.edu/~tibs/book/preface.ps
 Guo and Nandi (2006) Hong Guo and Asoke K. Nandi. 2006. Breast cancer diagnosis using genetic programming generated feature. Pattern Recognition 39, 5 (May 2006), 980–987. DOI:http://dx.doi.org/10.1016/j.patcog.2005.10.001
 Helmuth et al. (2014) T. Helmuth, L. Spector, and J. Matheson. 2014. Solving Uncompromising Problems with Lexicase Selection. IEEE Transactions on Evolutionary Computation PP, 99 (2014), 1–1. DOI:http://dx.doi.org/10.1109/TEVC.2014.2362729
 Iba and Sato (1994) Hitoshi Iba and Taisuke Sato. 1994. Genetic Programming with Local HillClimbing. Technical Report ETLTR944. Electrotechnical Laboratory, 114 Umezono, Tsukubacity, Ibaraki, 305, Japan. http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/papers/Iba_1994_GPlHC.pdf
 Kommenda et al. (2013) Michael Kommenda, Gabriel Kronberger, Stephan Winkler, Michael Affenzeller, and Stefan Wagner. 2013. Effects of constant optimization by nonlinear least squares minimization in symbolic regression. In GECCO ’13 Companion: Proceeding of the fifteenth annual conference companion on Genetic and evolutionary computation conference companion. ACM, Amsterdam, The Netherlands, 1121–1128. DOI:http://dx.doi.org/doi:10.1145/2464576.2482691
 Krawiec (2002) Krzysztof Krawiec. 2002. Genetic programmingbased construction of features for machine learning and knowledge discovery tasks. Genetic Programming and Evolvable Machines 3, 4 (2002), 329–343. http://link.springer.com/article/10.1023/A:1020984725014
 La Cava et al. (2015) William La Cava, Thomas Helmuth, Lee Spector, and Kourosh Danai. 2015. Genetic Programming with Epigenetic Local Search. In GECCO ’15: Proceedings of the Genetic and Evolutionary Computation Conference. ACM Press, 1055–1062. DOI:http://dx.doi.org/10.1145/2739480.2754763
 La Cava and Moore (2017) William La Cava and Jason Moore. 2017. A General Feature Engineering Wrapper for Machine Learning Using Lexicase Survival. In European Conference on Genetic Programming. Springer, 80–95. https://link.springer.com/chapter/10.1007/9783319556963_6 DOI: 10.1007/9783319556963_6.
 La Cava et al. (2016) William La Cava, Lee Spector, and Kourosh Danai. 2016. EpsilonLexicase Selection for Regression. In GECCO ’16: Proceedings of the Genetic and Evolutionary Computation Conference. ACM, New York, NY, USA, 741–748. DOI:http://dx.doi.org/10.1145/2908812.2908898
 La Cava, William et al. (2017) La Cava, William, Silva, Sara, Vanneschi, Leonardo, Spector, Lee, and Moore, Jason H. 2017. Genetic Programming Representations for Multidimensional Feature Learning in Biomedical Classification. In European Conference on the Applications of Evolutionary Computation. Springer, 158173. https://link.springer.com/chapter/10.1007/9783319558493_11 DOI: 10.1007/9783319558493_11.

Mahfoud (1995)
Samir W Mahfoud.
1995.
Niching methods for genetic algorithms
. Ph.D. Dissertation.  McConaghy (2011) Trent McConaghy. 2011. FFX: Fast, scalable, deterministic symbolic regression technology. In Genetic Programming Theory and Practice IX. Springer, 235–260. http://link.springer.com/chapter/10.1007/9781461417705_13
 Muharram and Smith (2005) Mohammed Muharram and George D. Smith. 2005. Evolutionary constructive induction. IEEE Transactions on Knowledge and Data Engineering 17, 11 (2005), 1518–1528. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1512037
 Pedregosa et al. (2011) Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, and others. 2011. Scikitlearn: Machine learning in Python. Journal of Machine Learning Research 12, Oct (2011), 2825–2830. http://www.jmlr.org/papers/v12/pedregosa11a.html

Rousseeuw (1987)
Peter J. Rousseeuw.
1987.
Silhouettes: A graphical aid to the interpretation and validation of cluster analysis.
J. Comput. Appl. Math. 20 (Nov. 1987), 53–65. DOI:http://dx.doi.org/10.1016/03770427(87)901257  Silva et al. (2015) Sara Silva, Luis Muñoz, Leonardo Trujillo, Vijay Ingalalli, Mauro Castelli, and Leonardo Vanneschi. 2015. Multiclass Classificatin Through Multidimensional Clustering. In Genetic Programming Theory and Practice XIII. Vol. 13. Springer, Ann Arbor, MI.
 Smith and Bull (2005) Matthew G. Smith and Larry Bull. 2005. Genetic programming with a genetic algorithm for feature construction and selection. Genetic Programming and Evolvable Machines 6, 3 (2005), 265–281. http://link.springer.com/article/10.1007/s1071000529887
 Spector (2012) Lee Spector. 2012. Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report. In Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference companion. 401–408. http://dl.acm.org/citation.cfm?id=2330846
 Tibshirani (1996) Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) (1996), 267–288. http://www.jstor.org/stable/2346178
 Tibshirani et al. (2002) Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu. 2002. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences 99, 10 (May 2002), 6567–6572. DOI:http://dx.doi.org/10.1073/pnas.082099299
 Urbanowicz et al. (2012) Ryan J. Urbanowicz, Jeff Kiralis, Nicholas A. SinnottArmstrong, Tamra Heberling, Jonathan M. Fisher, and Jason H. Moore. 2012. GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData mining 5, 1 (2012), 1. https://biodatamining.biomedcentral.com/articles/10.1186/17560381516
 Žegklitz and Pošík (2017) Jan ŽŽegklitz and Petr Poššíík. 2017. Symbolic Regression Algorithms with Builtin Linear Regression. arXiv:1701.03641 [cs] (Jan. 2017). http://arxiv.org/abs/1701.03641 arXiv: 1701.03641.
Comments
There are no comments yet.