Symbolic regression outperforms other models for small data sets

03/28/2021
by   Casper Wilstrup, et al.
0

Machine learning is often applied to obtain predictions and new understanding of complex phenomena and relationships, but availability of sufficient data for model training is a widespread problem. Traditional machine learning techniques such as random forests and gradient boosting tend to overfit when working with data sets of a few hundred samples. This study demonstrates that for small training sets of 250 observations, symbolic regression is a superior alternative to these machine learning models by providing better accuracy while preserving the interpretability of linear models and decision trees. In 132 out of 240 cases, the symbolic regression model performsbetter than any of the other models on the out-of-sample data. The second best algorithm was found to be a random forest, which performs best in 37 of the 240 cases. When restricting the comparison to interpretable models,symbolic regression performs best in 184 out of 240 cases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/24/2021

Data Aggregation for Reducing Training Data in Symbolic Regression

The growing volume of data makes the use of computationally intense mach...
research
05/18/2020

Applying Genetic Programming to Improve Interpretability in Machine Learning Models

Explainable Artificial Intelligence (or xAI) has become an important res...
research
09/13/2020

That looks interesting! Personalizing Communication and Segmentation with Random Forest Node Embeddings

Communicating effectively with customers is a challenge for many markete...
research
02/07/2023

Machine learning benchmarks for the classification of equivalent circuit models from solid-state electrochemical impedance spectra

Analysis of Electrochemical Impedance Spectroscopy (EIS) data for electr...
research
04/12/2022

Automated Learning of Interpretable Models with Quantified Uncertainty

Interpretability and uncertainty quantification in machine learning can ...
research
06/18/2022

Reduced Robust Random Cut Forest for Out-Of-Distribution detection in machine learning models

Most machine learning-based regressors extract information from data col...
research
04/12/2019

Boosting insights in insurance tariff plans with tree-based machine learning

Pricing actuaries typically stay within the framework of generalized lin...

Please sign up or login with your details

Forgot password? Click here to reset