Sample selection from a given dataset to validate machine learning models

04/27/2021
by   Bertrand Iooss, et al.
0

The selection of a validation basis from a full dataset is often required in industrial use of supervised machine learning algorithm. This validation basis will serve to realize an independent evaluation of the machine learning model. To select this basis, we propose to adopt a "design of experiments" point of view, by using statistical criteria. We show that the "support points" concept, based on Maximum Mean Discrepancy criteria, is particularly relevant. An industrial test case from the company EDF illustrates the practical interest of the methodology.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/08/2022

Model predictivity assessment: incremental test-set selection and accuracy evaluation

Unbiased assessment of the predictivity of models learnt by supervised m...
research
10/02/2020

Explainable Online Validation of Machine Learning Models for Practical Applications

We present a reformulation of the regression and classification, which a...
research
06/13/2021

A News-based Machine Learning Model for Adaptive Asset Pricing

The paper proposes a new asset pricing model – the News Embedding UMAP S...
research
01/03/2022

Inferring Turbulent Parameters via Machine Learning

We design a machine learning technique to solve the general problem of i...
research
08/01/2022

Eficiency of REST and gRPC realizing communication tasks in microservice-based ecosystems

The aim of this contribution is to analyse practical aspects of the use ...
research
12/02/2020

A Methodology for Deriving Evaluation Criteria for Software Solutions

Finding a suited software solution for a company poses a resource-intens...
research
08/24/2020

Exoplanet Validation with Machine Learning: 50 new validated Kepler planets

Over 30 'validation', where the statistical likelihood of a transit aris...

Please sign up or login with your details

Forgot password? Click here to reset