Generating Redundant Features with Unsupervised Multi-Tree Genetic Programming

02/02/2018
by   Andrew Lensen, et al.
0

Recently, feature selection has become an increasingly important area of research due to the surge in high-dimensional datasets in all areas of modern life. A plethora of feature selection algorithms have been proposed, but it is difficult to truly analyse the quality of a given algorithm. Ideally, an algorithm would be evaluated by measuring how well it removes known bad features. Acquiring datasets with such features is inherently difficult, and so a common technique is to add synthetic bad features to an existing dataset. While adding noisy features is an easy task, it is very difficult to automatically add complex, redundant features. This work proposes one of the first approaches to generating redundant features, using a novel genetic programming approach. Initial experiments show that our proposed method can automatically create difficult, redundant features which have the potential to be used for creating high-quality feature selection benchmark datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2022

Synthetic Data for Feature Selection

Feature selection is an important and active field of research in machin...
research
10/22/2019

Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis

Clustering is a difficult and widely-studied data mining task, with many...
research
08/08/2020

A Novel Community Detection Based Genetic Algorithm for Feature Selection

The selection of features is an essential data preprocessing stage in da...
research
02/19/2019

Feature Selection for Better Spectral Characterization or: How I Learned to Start Worrying and Love Ensembles

An ever-looming threat to astronomical applications of machine learning ...
research
05/07/2015

Integrating K-means with Quadratic Programming Feature Selection

Several data mining problems are characterized by data in high dimension...
research
10/04/2022

Robust self-healing prediction model for high dimensional data

Owing to the advantages of increased accuracy and the potential to detec...

Please sign up or login with your details

Forgot password? Click here to reset