TPOT-SH: a Faster Optimization Algorithm to Solve the AutoML Problem on Large Datasets

11/27/2021
by   lpyparmenier, et al.
0

Data are omnipresent nowadays and contain knowl- edge and patterns that machine learning (ML) algorithms can extract so as to take decisions or perform a task without explicit instructions. To achieve that, these algorithms learn a mathematical model using sample data. However, there are numerous ML algorithms, all learning different models of reality. Furthermore, the behavior of these algorithms can be altered by modifying some of their plethora of hyperparameters. Cleverly tuning these algorithms is costly but essential to reach decent performance. Yet it requires a lot of expertise and remains hard even for experts who tend to resort to exploration-only approaches like random search and grid search. The field of AutoML has consequently emerged in the quest for automatized machine learning processes that would be less expensive than brute force searches. In this paper we continue the research initiated on the Tree-based Pipeline Optimization Tool (TPOT), an AutoML based on Evolutionary Algorithms (EA). EAs are typically slow to converge which makes TPOT incapable of scaling to large datasets. As a consequence, we introduce TPOT- SH inspired from the concept of Successive Halving used in Multi- Armed Bandit problems. This solution allows TPOT to explore the search space faster and have much better performance on larger datasets.

READ FULL TEXT
research
11/27/2021

AutoTSC: Optimization Algorithm to Automatically Solve the Time Series Classification Problem

Nowadays Automated Machine Learning, abbrevi- ated AutoML, is recognize...
research
08/20/2023

DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data

Data preprocessing is a crucial step in the machine learning process tha...
research
03/10/2018

Enhancing Evolutionary Optimization in Uncertain Environments by Allocating Evaluations via Multi-armed Bandit Algorithms

Optimization problems with uncertain fitness functions are common in the...
research
12/02/2020

VisEvol: Visual Analytics to Support Hyperparameter Search through Evolutionary Optimization

During the training phase of machine learning (ML) models, it is usually...
research
02/01/2023

Faster Convergence with Lexicase Selection in Tree-based Automated Machine Learning

In many evolutionary computation systems, parent selection methods can a...
research
07/30/2014

Automated Machine Learning on Big Data using Stochastic Algorithm Tuning

We introduce a means of automating machine learning (ML) for big data ta...
research
03/28/2020

Making RooFit Ready for Run 3

RooFit and RooStats, the toolkits for statistical modelling in ROOT, are...

Please sign up or login with your details

Forgot password? Click here to reset