Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

03/20/2016
by   Randal S. Olson, et al.
0

As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also address the tendency for TPOT to design overly complex pipelines by integrating Pareto optimization, which produces compact pipelines without sacrificing classification accuracy. As such, this work represents an important step toward fully automating machine learning pipeline design.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2016

Automating biomedical data science through tree-based pipeline optimization

Over the past decade, data science and machine learning has grown from a...
research
01/18/2018

Layered TPOT: Speeding up Tree-based Pipeline Optimization

With the demand for machine learning increasing, so does the demand for ...
research
10/17/2020

MLCask: Efficient Management of Component Evolution in Collaborative Data Analytics Pipelines

With the ever-increasing adoption of machine learning for data analytics...
research
05/03/2022

Automatically Debugging AutoML Pipelines using Maro: ML Automated Remediation Oracle (Extended Version)

Machine learning in practice often involves complex pipelines for data c...
research
12/16/2019

Pipelines for Procedural Information Extraction from Scientific Literature: Towards Recipes using Machine Learning and Data Science

This paper describes a machine learning and data science pipeline for st...
research
08/20/2021

A Recommender System for Scientific Datasets and Analysis Pipelines

Scientific datasets and analysis pipelines are increasingly being shared...

Please sign up or login with your details

Forgot password? Click here to reset