Design choice and machine learning model performances

01/25/2022
by   Rosa Arboretti, et al.
0

An increasing number of publications present the joint application of Design of Experiments (DOE) and machine learning (ML) as a methodology to collect and analyze data on a specific industrial phenomenon. However, the literature shows that the choice of the design for data collection and model for data analysis is often driven by incidental factors, rather than by statistical or algorithmic advantages, thus there is a lack of studies which provide guidelines on what designs and ML models to jointly use for data collection and analysis. This is the first time in the literature that a paper discusses the choice of design in relation to the ML model performances. An extensive study is conducted that considers 12 experimental designs, 7 families of predictive models, 7 test functions that emulate physical processes, and 8 noise settings, both homoscedastic and heteroscedastic. The results of the research can have an immediate impact on the work of practitioners, providing guidelines for practical applications of DOE and ML.

READ FULL TEXT
research
01/24/2023

Designing Data: Proactive Data Collection and Iteration for Machine Learning

Lack of diversity in data collection has caused significant failures in ...
research
12/04/2018

Expanding search in the space of empirical ML

As researchers and practitioners of applied machine learning, we are giv...
research
11/19/2020

Social Determinants of Recidivism: A Machine Learning Solution

Current literature in criminal justice analytics often focuses on predic...
research
09/27/2021

Scalable and Accurate Test Case Prioritization in Continuous Integration Contexts

Continuous Integration (CI) requires efficient regression testing to ens...
research
07/17/2023

CohortFinder: an open-source tool for data-driven partitioning of biomedical image cohorts to yield robust machine learning models

Batch effects (BEs) refer to systematic technical differences in data co...
research
04/27/2022

An Empirical Evaluation of Flow Based Programming in the Machine Learning Deployment Context

As use of data driven technologies spreads, software engineers are more ...
research
11/22/2022

PhAST: Physics-Aware, Scalable, and Task-specific GNNs for Accelerated Catalyst Design

Mitigating the climate crisis requires a rapid transition towards lower ...

Please sign up or login with your details

Forgot password? Click here to reset