Probabilistic Matrix Factorization for Automated Machine Learning

05/15/2017
by   Nicolo Fusi, et al.
0

In order to achieve state-of-the-art performance, modern machine learning techniques require careful data pre-processing and hyperparameter tuning. Moreover, given the ever increasing number of machine learning models being developed, model selection is becoming increasingly important. Automating the selection and tuning of machine learning pipelines consisting of data pre-processing methods and machine learning models, has long been one of the goals of the machine learning community. In this paper, we tackle this meta-learning task by combining ideas from collaborative filtering and Bayesian optimization. Using probabilistic matrix factorization techniques and acquisition functions from Bayesian optimization, we exploit experiments performed in hundreds of different datasets to guide the exploration of the space of possible pipelines. In our experiments, we show that our approach quickly identifies high-performing pipelines across a wide range of datasets, significantly outperforming the current state-of-the-art.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2019

Adaptive Bayesian Linear Regression for Automated Machine Learning

To solve a machine learning problem, one typically needs to perform data...
research
08/09/2018

OBOE: Collaborative Filtering for AutoML Initialization

Algorithm selection and hyperparameter tuning remain two of the most cha...
research
08/20/2021

A Recommender System for Scientific Datasets and Analysis Pipelines

Scientific datasets and analysis pipelines are increasingly being shared...
research
05/29/2019

Lifelong Bayesian Optimization

Automatic Machine Learning (Auto-ML) systems tackle the problem of autom...
research
10/04/2022

Integrating pre-processing pipelines in ODC based framework

Using on-demand processing pipelines to generate virtual geospatial prod...
research
10/20/2020

Model-specific Data Subsampling with Influence Functions

Model selection requires repeatedly evaluating models on a given dataset...
research
10/22/2014

A Parallel and Efficient Algorithm for Learning to Match

Many tasks in data mining and related fields can be formalized as matchi...

Please sign up or login with your details

Forgot password? Click here to reset