STREAMLINE: A Simple, Transparent, End-To-End Automated Machine Learning Pipeline Facilitating Data Analysis and Algorithm Comparison

06/23/2022
by   Ryan J. Urbanowicz, et al.
6

Machine learning (ML) offers powerful methods for detecting and modeling associations often in data with large feature spaces and complex associations. Many useful tools/packages (e.g. scikit-learn) have been developed to make the various elements of data handling, processing, modeling, and interpretation accessible. However, it is not trivial for most investigators to assemble these elements into a rigorous, replicatable, unbiased, and effective data analysis pipeline. Automated machine learning (AutoML) seeks to address these issues by simplifying the process of ML analysis for all. Here, we introduce STREAMLINE, a simple, transparent, end-to-end AutoML pipeline designed as a framework to easily conduct rigorous ML modeling and analysis (limited initially to binary classification). STREAMLINE is specifically designed to compare performance between datasets, ML algorithms, and other AutoML tools. It is unique among other autoML tools by offering a fully transparent and consistent baseline of comparison using a carefully designed series of pipeline elements including: (1) exploratory analysis, (2) basic data cleaning, (3) cross validation partitioning, (4) data scaling and imputation, (5) filter-based feature importance estimation, (6) collective feature selection, (7) ML modeling with `Optuna' hyperparameter optimization across 15 established algorithms (including less well-known Genetic Programming and rule-based ML), (8) evaluation across 16 classification metrics, (9) model feature importance estimation, (10) statistical significance comparisons, and (11) automatically exporting all results, plots, a PDF summary report, and models that can be easily applied to replication data.

READ FULL TEXT

page 4

page 6

page 13

page 18

page 22

research
07/14/2023

DataAssist: A Machine Learning Approach to Data Cleaning and Preparation

Current automated machine learning (ML) tools are model-centric, focusin...
research
09/06/2023

Automated Bioinformatics Analysis via AutoBA

With the fast-growing and evolving omics data, the demand for streamline...
research
04/26/2021

LCS-DIVE: An Automated Rule-based Machine Learning Visualization Pipeline for Characterizing Complex Associations in Classification

Machine learning (ML) research has yielded powerful tools for training a...
research
05/29/2022

Assessing the accuracy of the Australian Senate count: Key steps for a rigorous and transparent audit

This paper explains the main principles and some of the technical detail...
research
03/04/2021

Analysing Wideband Absorbance Immittance in Normal and Ears with Otitis Media with Effusion Using Machine Learning

Wideband Absorbance Immittance (WAI) has been available for more than a ...
research
03/20/2018

Stacked Neural Networks for end-to-end ciliary motion analysis

Cilia are hairlike structures protruding from nearly every cell in the b...

Please sign up or login with your details

Forgot password? Click here to reset