Assembled-OpenML: Creating Efficient Benchmarks for Ensembles in AutoML with OpenML

07/01/2023
by   Lennart Purucker, et al.
0

Automated Machine Learning (AutoML) frameworks regularly use ensembles. Developers need to compare different ensemble techniques to select appropriate techniques for an AutoML framework from the many potential techniques. So far, the comparison of ensemble techniques is often computationally expensive, because many base models must be trained and evaluated one or multiple times. Therefore, we present Assembled-OpenML. Assembled-OpenML is a Python tool, which builds meta-datasets for ensembles using OpenML. A meta-dataset, called Metatask, consists of the data of an OpenML task, the task's dataset, and prediction data from model evaluations for the task. We can make the comparison of ensemble techniques computationally cheaper by using the predictions stored in a metatask instead of training and evaluating base models. To introduce Assembled-OpenML, we describe the first version of our tool. Moreover, we present an example of using Assembled-OpenML to compare a set of ensemble techniques. For this example comparison, we built a benchmark using Assembled-OpenML and implemented ensemble techniques expecting predictions instead of base models as input. In our example comparison, we gathered the prediction data of 1523 base models for 31 datasets. Obtaining the prediction data for all base models using Assembled-OpenML took ∼ 1 hour in total. In comparison, obtaining the prediction data by training and evaluating just one base model on the most computationally expensive dataset took ∼ 37 minutes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2018

Developing parsimonious ensembles using ensemble diversity within a reinforcement learning framework

Heterogeneous ensembles built from the predictions of a wide variety and...
research
03/07/2022

Evaluating State of the Art, Forecasting Ensembles- and Meta-learning Strategies for Model Fusion

Techniques of hybridisation and ensemble learning are popular model fusi...
research
08/30/2011

Dimension Reduction Using Rule Ensemble Machine Learning Methods: A Numerical Study of Three Ensemble Methods

Ensemble methods for supervised machine learning have become popular due...
research
12/30/2021

SAE: Sequential Anchored Ensembles

Computing the Bayesian posterior of a neural network is a challenging ta...
research
03/09/2015

Distilling the Knowledge in a Neural Network

A very simple way to improve the performance of almost any machine learn...
research
10/19/2017

Visual Integration of Data and Model Space in Ensemble Learning

Ensembles of classifier models typically deliver superior performance an...
research
07/12/2019

A machine learning framework for computationally expensive transient models

The promise of machine learning has been explored in a variety of scient...

Please sign up or login with your details

Forgot password? Click here to reset