Understanding Silent Failures in Medical Image Classification

07/27/2023
by   Till J. Bungert, et al.
0

To ensure the reliable use of classification systems in medical applications, it is crucial to prevent silent failures. This can be achieved by either designing classifiers that are robust enough to avoid failures in the first place, or by detecting remaining failures using confidence scoring functions (CSFs). A predominant source of failures in image classification is distribution shifts between training data and deployment data. To understand the current state of silent failure prevention in medical imaging, we conduct the first comprehensive analysis comparing various CSFs in four biomedical tasks and a diverse range of distribution shifts. Based on the result that none of the benchmarked CSFs can reliably prevent silent failures, we conclude that a deeper understanding of the root causes of failures in the data is required. To facilitate this, we introduce SF-Visuals, an interactive analysis tool that uses latent space clustering to visualize shifts and failures. On the basis of various examples, we demonstrate how this tool can help researchers gain insight into the requirements for safe application of classification systems in the medical domain. The open-source benchmark and tool are at: https://github.com/IML-DKFZ/sf-visuals.

READ FULL TEXT

page 2

page 7

research
11/28/2022

A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification

Reliable application of machine learning-based decision systems in the w...
research
05/27/2022

Failure Detection in Medical Image Classification: A Reality Check and Benchmarking Testbed

Failure detection in automated image classification is a critical safegu...
research
09/25/2021

A Principled Approach to Failure Analysis and Model Repairment: Demonstration in Medical Imaging

Machine learning models commonly exhibit unexpected failures post-deploy...
research
03/17/2023

Deephys: Deep Electrophysiology, Debugging Neural Networks under Distribution Shifts

Deep Neural Networks (DNNs) often fail in out-of-distribution scenarios....
research
02/15/2023

Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation

Distribution shifts are a major source of failure of deployed machine le...
research
06/21/2023

Mass-Producing Failures of Multimodal Systems with Language Models

Deployed multimodal systems can fail in ways that evaluators did not ant...
research
02/17/2021

DepOwl: Detecting Dependency Bugs to Prevent Compatibility Failures

Applications depend on libraries to avoid reinventing the wheel. Librari...

Please sign up or login with your details

Forgot password? Click here to reset