Demystifying a Dark Art: Understanding Real-World Machine Learning Model Development

05/04/2020
by   Angela Lee, et al.
0

It is well-known that the process of developing machine learning (ML) workflows is a dark-art; even experts struggle to find an optimal workflow leading to a high accuracy model. Users currently rely on empirical trial-and-error to obtain their own set of battle-tested guidelines to inform their modeling decisions. In this study, we aim to demystify this dark art by understanding how people iterate on ML workflows in practice. We analyze over 475k user-generated workflows on OpenML, an open-source platform for tracking and sharing ML workflows. We find that users often adopt a manual, automated, or mixed approach when iterating on their workflows. We observe that manual approaches result in fewer wasted iterations compared to automated approaches. Yet, automated approaches often involve more preprocessing and hyperparameter options explored, resulting in higher performance overall–suggesting potential benefits for a human-in-the-loop ML system that appropriately recommends a clever combination of the two strategies.

READ FULL TEXT

page 3

page 11

page 12

research
02/21/2023

AutoML in The Wild: Obstacles, Workarounds, and Expectations

Automated machine learning (AutoML) is envisioned to make ML techniques ...
research
08/03/2018

Helix: Accelerating Human-in-the-loop Machine Learning

Data application developers and data scientists spend an inordinate amou...
research
02/04/2020

A Generalized Flow for B2B Sales Predictive Modeling: An Azure Machine Learning Approach

Predicting sales opportunities outcome is a core to successful business ...
research
05/23/2022

Rethinking Streaming Machine Learning Evaluation

While most work on evaluating machine learning (ML) models focuses on co...
research
05/21/2022

Automated machine learning: AI-driven decision making in business analytics

The realization that AI-driven decision-making is indispensable in today...
research
06/11/2023

Unraveling the Interconnected Axes of Heterogeneity in Machine Learning for Democratic and Inclusive Advancements

The growing utilization of machine learning (ML) in decision-making proc...
research
07/01/2022

Shai-am: A Machine Learning Platform for Investment Strategies

The finance industry has adopted machine learning (ML) as a form of quan...

Please sign up or login with your details

Forgot password? Click here to reset