Data Aggregation for Reducing Training Data in Symbolic Regression

08/24/2021
by   Lukas Kammerer, et al.
0

The growing volume of data makes the use of computationally intense machine learning techniques such as symbolic regression with genetic programming more and more impractical. This work discusses methods to reduce the training data and thereby also the runtime of genetic programming. The data is aggregated in a preprocessing step before running the actual machine learning algorithm. K-means clustering and data binning is used for data aggregation and compared with random sampling as the simplest data reduction method. We analyze the achieved speed-up in training and the effects on the trained models test accuracy for every method on four real-world data sets. The performance of genetic programming is compared with random forests and linear regression. It is shown, that k-means and random sampling lead to very small loss in test accuracy when the data is reduced down to only 30 the speed-up is proportional to the size of the data set. Binning on the contrary, leads to models with very high test error.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/28/2021

Symbolic regression outperforms other models for small data sets

Machine learning is often applied to obtain predictions and new understa...
research
02/08/2023

Down-Sampled Epsilon-Lexicase Selection for Real-World Symbolic Regression Problems

Epsilon-lexicase selection is a parent selection method in genetic progr...
research
06/18/2019

Symbolic regression by random search

Purpose: To compare symbolic regression by genetic programming (SRGP) wi...
research
04/01/2019

Fast, accurate, and transferable many-body interatomic potentials by symbolic regression

The length and time scales of atomistic simulations are limited by the c...
research
04/01/2019

Fast, accurate, and transferable many-body interatomic potentials by genetic programming

The length and time scales of atomistic simulations are limited by the c...
research
12/06/2017

Perceived Audiovisual Quality Modelling based on Decison Trees, Genetic Programming and Neural Networks

Our objective is to build machine learning based models that predict aud...
research
02/06/2017

Toward the automated analysis of complex diseases in genome-wide association studies using genetic programming

Machine learning has been gaining traction in recent years to meet the d...

Please sign up or login with your details

Forgot password? Click here to reset