TPOT-SH: a Faster Optimization Algorithm to Solve the AutoML Problem on Large Datasets

by   lpyparmenier, et al.

Data are omnipresent nowadays and contain knowl- edge and patterns that machine learning (ML) algorithms can extract so as to take decisions or perform a task without explicit instructions. To achieve that, these algorithms learn a mathematical model using sample data. However, there are numerous ML algorithms, all learning different models of reality. Furthermore, the behavior of these algorithms can be altered by modifying some of their plethora of hyperparameters. Cleverly tuning these algorithms is costly but essential to reach decent performance. Yet it requires a lot of expertise and remains hard even for experts who tend to resort to exploration-only approaches like random search and grid search. The field of AutoML has consequently emerged in the quest for automatized machine learning processes that would be less expensive than brute force searches. In this paper we continue the research initiated on the Tree-based Pipeline Optimization Tool (TPOT), an AutoML based on Evolutionary Algorithms (EA). EAs are typically slow to converge which makes TPOT incapable of scaling to large datasets. As a consequence, we introduce TPOT- SH inspired from the concept of Successive Halving used in Multi- Armed Bandit problems. This solution allows TPOT to explore the search space faster and have much better performance on larger datasets.


AutoTSC: Optimization Algorithm to Automatically Solve the Time Series Classification Problem

Nowadays Automated Machine Learning, abbrevi- ated AutoML, is recognize...

DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data

Data preprocessing is a crucial step in the machine learning process tha...

Enhancing Evolutionary Optimization in Uncertain Environments by Allocating Evaluations via Multi-armed Bandit Algorithms

Optimization problems with uncertain fitness functions are common in the...

VisEvol: Visual Analytics to Support Hyperparameter Search through Evolutionary Optimization

During the training phase of machine learning (ML) models, it is usually...

Faster Convergence with Lexicase Selection in Tree-based Automated Machine Learning

In many evolutionary computation systems, parent selection methods can a...

Automated Machine Learning on Big Data using Stochastic Algorithm Tuning

We introduce a means of automating machine learning (ML) for big data ta...

Making RooFit Ready for Run 3

RooFit and RooStats, the toolkits for statistical modelling in ROOT, are...

Please sign up or login with your details

Forgot password? Click here to reset