Rethinking Machine Learning Model Evaluation in Pathology

04/11/2022
by   Syed Ashar Javed, et al.
11

Machine Learning has been applied to pathology images in research and clinical practice with promising outcomes. However, standard ML models often lack the rigorous evaluation required for clinical decisions. Machine learning techniques for natural images are ill-equipped to deal with pathology images that are significantly large and noisy, require expensive labeling, are hard to interpret, and are susceptible to spurious correlations. We propose a set of practical guidelines for ML evaluation in pathology that address the above concerns. The paper includes measures for setting up the evaluation framework, effectively dealing with variability in labels, and a recommended suite of tests to address issues related to domain shift, robustness, and confounding variables. We hope that the proposed framework will bridge the gap between ML researchers and domain experts, leading to wider adoption of ML techniques in pathology and improving patient outcomes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/13/2019

What Clinicians Want: Contextualizing Explainable Machine Learning for Clinical End Use

Translating machine learning (ML) models effectively to clinical practic...
research
08/27/2023

Empowering Clinicians and Democratizing Data Science: Large Language Models Automate Machine Learning for Clinical Studies

A knowledge gap persists between Machine Learning (ML) developers (e.g.,...
research
03/01/2021

Practices for Engineering Trustworthy Machine Learning Applications

Following the recent surge in adoption of machine learning (ML), the neg...
research
03/02/2021

Understanding the Usability Challenges of Machine Learning In High-Stakes Decision Making

Machine learning (ML) is being applied to a diverse and ever-growing set...
research
07/05/2022

A domain-specific language for describing machine learning datasets

Datasets play a central role in the training and evaluation of machine l...
research
06/18/2022

Weakly Supervised Classification of Vital Sign Alerts as Real or Artifact

A significant proportion of clinical physiologic monitoring alarms are f...
research
07/07/2022

Calibrate to Interpret

Trustworthy machine learning is driving a large number of ML community w...

Please sign up or login with your details

Forgot password? Click here to reset