Building a Reproducible Machine Learning Pipeline

10/09/2018
by   Peter Sugimura, et al.
0

Reproducibility of modeling is a problem that exists for any machine learning practitioner, whether in industry or academia. The consequences of an irreproducible model can include significant financial costs, lost time, and even loss of personal reputation (if results prove unable to be replicated). This paper will first discuss the problems we have encountered while building a variety of machine learning models, and subsequently describe the framework we built to tackle the problem of model reproducibility. The framework is comprised of four main components (data, feature, scoring, and evaluation layers), which are themselves comprised of well defined transformations. This enables us to not only exactly replicate a model, but also to reuse the transformations across different models. As a result, the platform has dramatically increased the speed of both offline and online experimentation while also ensuring model reproducibility.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2021

Perspectives on Machine Learning from Psychology's Reproducibility Crisis

In the early 2010s, a crisis of reproducibility rocked the field of psyc...
research
01/05/2017

OpenML: An R Package to Connect to the Machine Learning Platform OpenML

OpenML is an online machine learning platform where researchers can easi...
research
05/01/2020

DriveML: An R Package for Driverless Machine Learning

In recent years, the concept of automated machine learning has become ve...
research
05/21/2023

Reproducibility Requires Consolidated Artifacts

Machine learning is facing a 'reproducibility crisis' where a significan...
research
12/25/2021

A comparative study on machine learning models combining with outlier detection and balanced sampling methods for credit scoring

Peer-to-peer (P2P) lending platforms have grown rapidly over the past de...
research
08/21/2022

Performance, Opaqueness, Consequences, and Assumptions: Simple questions for responsible planning of machine learning solutions

The data revolution has generated a huge demand for data-driven solution...
research
07/16/2019

Evaluating the Reproducibility of Research in Obstetrics and Gynecology

Objective: Reproducibility is a core tenet of scientific research. A rep...

Please sign up or login with your details

Forgot password? Click here to reset