On Taking Advantage of Opportunistic Meta-knowledge to Reduce Configuration Spaces for Automated Machine Learning

08/08/2022
by   David Jacob Kedziora, et al.
9

The automated machine learning (AutoML) process can require searching through complex configuration spaces of not only machine learning (ML) components and their hyperparameters but also ways of composing them together, i.e. forming ML pipelines. Optimisation efficiency and the model accuracy attainable for a fixed time budget suffer if this pipeline configuration space is excessively large. A key research question is whether it is both possible and practical to preemptively avoid costly evaluations of poorly performing ML pipelines by leveraging their historical performance for various ML tasks, i.e. meta-knowledge. The previous experience comes in the form of classifier/regressor accuracy rankings derived from either (1) a substantial but non-exhaustive number of pipeline evaluations made during historical AutoML runs, i.e. 'opportunistic' meta-knowledge, or (2) comprehensive cross-validated evaluations of classifiers/regressors with default hyperparameters, i.e. 'systematic' meta-knowledge. Numerous experiments with the AutoWeka4MCPS package suggest that (1) opportunistic/systematic meta-knowledge can improve ML outcomes, typically in line with how relevant that meta-knowledge is, and (2) configuration-space culling is optimal when it is neither too conservative nor too radical. However, the utility and impact of meta-knowledge depend critically on numerous facets of its generation and exploitation, warranting extensive analysis; these are often overlooked/underappreciated within AutoML and meta-learning literature. In particular, we observe strong sensitivity to the `challenge' of a dataset, i.e. whether specificity in choosing a predictor leads to significantly better performance. Ultimately, identifying `difficult' datasets, thus defined, is crucial to both generating informative meta-knowledge bases and understanding optimal search-space reduction strategies.

READ FULL TEXT

page 21

page 22

page 28

page 29

page 35

page 36

research
05/01/2021

Exploring Opportunistic Meta-knowledge to Reduce Search Spaces for Automated Machine Learning

Machine learning (ML) pipeline composition and optimisation have been st...
research
06/29/2023

AutoML in Heavily Constrained Applications

Optimizing a machine learning pipeline for a task at hand requires caref...
research
12/23/2019

AutoML: Exploration v.s. Exploitation

Building a machine learning (ML) pipeline in an automated way is a cruci...
research
03/19/2023

AutoEn: An AutoML method based on ensembles of predefined Machine Learning pipelines for supervised Traffic Forecasting

Intelligent Transportation Systems are producing tons of hardly manageab...
research
11/21/2020

AutoWeka4MCPS-AVATAR: Accelerating Automated Machine Learning Pipeline Composition and Optimisation

Automated machine learning pipeline (ML) composition and optimisation ai...
research
12/15/2020

Amazon SageMaker Autopilot: a white box AutoML solution at scale

AutoML systems provide a black-box solution to machine learning problems...
research
11/10/2021

Towards Green Automated Machine Learning: Status Quo and Future Directions

Automated machine learning (AutoML) strives for the automatic configurat...

Please sign up or login with your details

Forgot password? Click here to reset