Benchmarking AutoML algorithms on a collection of binary problems

12/06/2022
by   Pedro Henrique Ribeiro, et al.
0

Automated machine learning (AutoML) algorithms have grown in popularity due to their high performance and flexibility to adapt to different problems and data sets. With the increasing number of AutoML algorithms, deciding which would best suit a given problem becomes increasingly more work. Therefore, it is essential to use complex and challenging benchmarks which would be able to differentiate the AutoML algorithms from each other. This paper compares the performance of four different AutoML algorithms: Tree-based Pipeline Optimization Tool (TPOT), Auto-Sklearn, Auto-Sklearn 2, and H2O AutoML. We use the Diverse and Generative ML benchmark (DIGEN), a diverse set of synthetic datasets derived from generative functions designed to highlight the strengths and weaknesses of the performance of common machine learning algorithms. We confirm that AutoML can identify pipelines that perform well on all included datasets. Most AutoML algorithms performed similarly without much room for improvement; however, some were more consistent than others at finding high-performing solutions for some datasets.

READ FULL TEXT

page 7

page 9

research
07/14/2021

Generative and reproducible benchmarks for comprehensive evaluation of machine learning classifiers

Understanding the strengths and weaknesses of machine learning (ML) algo...
research
01/28/2016

Automating biomedical data science through tree-based pipeline optimization

Over the past decade, data science and machine learning has grown from a...
research
04/29/2019

Challenges and Pitfalls of Machine Learning Evaluation and Benchmarking

An increasingly complex and diverse collection of Machine Learning (ML) ...
research
10/23/2018

Preprocessor Selection for Machine Learning Pipelines

Much of the work in metalearning has focused on classifier selection, co...
research
02/13/2021

HAWKS: Evolving Challenging Benchmark Sets for Cluster Analysis

Comprehensive benchmarking of clustering algorithms is rendered difficul...
research
07/08/2020

Auto-Sklearn 2.0: The Next Generation

Automated Machine Learning, which supports practitioners and researchers...
research
01/12/2021

Benchmarking Simulation-Based Inference

Recent advances in probabilistic modelling have led to a large number of...

Please sign up or login with your details

Forgot password? Click here to reset