Improving generalisation of AutoML systems with dynamic fitness evaluations

01/23/2020
by   Benjamin Patrick Evans, et al.
0

A common problem machine learning developers are faced with is overfitting, that is, fitting a pipeline too closely to the training data that the performance degrades for unseen data. Automated machine learning aims to free (or at least ease) the developer from the burden of pipeline creation, but this overfitting problem can persist. In fact, this can become more of a problem as we look to iteratively optimise the performance of an internal cross-validation (most often k-fold). While this internal cross-validation hopes to reduce this overfitting, we show we can still risk overfitting to the particular folds used. In this work, we aim to remedy this problem by introducing dynamic fitness evaluations which approximate repeated k-fold cross-validation, at little extra cost over single k-fold, and far lower cost than typical repeated k-fold. The results show that when time equated, the proposed fitness function results in significant improvement over the current state-of-the-art baseline method which uses an internal single k-fold. Furthermore, the proposed extension is very simple to implement on top of existing evolutionary computation methods, and can provide essentially a free boost in generalisation/testing performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2017

Cross-Validation with Confidence

Cross-validation is one of the most popular model selection methods in s...
research
06/30/2018

chemmodlab: A Cheminformatics Modeling Laboratory for Fitting and Assessing Machine Learning Models

The goal of chemmodlab is to streamline the fitting and assessment pipel...
research
08/22/2019

Efficient Cross-Validation of Echo State Networks

Echo State Networks (ESNs) are known for their fast and precise one-shot...
research
12/08/2012

An Empirical Comparison of V-fold Penalisation and Cross Validation for Model Selection in Distribution-Free Regression

Model selection is a crucial issue in machine-learning and a wide variet...
research
04/04/2019

Subject Cross Validation in Human Activity Recognition

K-fold Cross Validation is commonly used to evaluate classifiers and tun...
research
11/17/2017

Variable selection with genetic algorithms using repeated cross-validation of PLS regression models as fitness measure

Genetic algorithms are a widely used method in chemometrics for extracti...
research
02/08/2020

On a scalable entropic breaching of the overfitting barrier in machine learning

Overfitting and treatment of "small data" are among the most challenging...

Please sign up or login with your details

Forgot password? Click here to reset