A Principled Approach to Failure Analysis and Model Repairment: Demonstration in Medical Imaging

09/25/2021
by   Thomas Henn, et al.
1

Machine learning models commonly exhibit unexpected failures post-deployment due to either data shifts or uncommon situations in the training environment. Domain experts typically go through the tedious process of inspecting the failure cases manually, identifying failure modes and then attempting to fix the model. In this work, we aim to standardise and bring principles to this process through answering two critical questions: (i) how do we know that we have identified meaningful and distinct failure types?; (ii) how can we validate that a model has, indeed, been repaired? We suggest that the quality of the identified failure types can be validated through measuring the intra- and inter-type generalisation after fine-tuning and introduce metrics to compare different subtyping methods. Furthermore, we argue that a model can be considered repaired if it achieves high accuracy on the failure types while retaining performance on the previously correct data. We combine these two ideas into a principled framework for evaluating the quality of both the identified failure subtypes and model repairment. We evaluate its utility on a classification and an object detection tasks. Our code is available at https://github.com/Rokken-lab6/Failure-Analysis-and-Model-Repairment

READ FULL TEXT

page 6

page 8

research
06/29/2022

Distilling Model Failures as Directions in Latent Space

Existing methods for isolating hard subpopulations and spurious correlat...
research
07/27/2023

Understanding Silent Failures in Medical Image Classification

To ensure the reliable use of classification systems in medical applicat...
research
09/27/2019

Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging

Machine learning models for medical image analysis often suffer from poo...
research
02/15/2023

Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation

Distribution shifts are a major source of failure of deployed machine le...
research
10/15/2021

Combining Diverse Feature Priors

To improve model generalization, model designers often restrict the feat...
research
06/29/2021

Enhancing the Analysis of Software Failures in Cloud Computing Systems with Deep Learning

Identifying the failure modes of cloud computing systems is a difficult ...
research
08/06/2023

Empirical Optimal Risk to Quantify Model Trustworthiness for Failure Detection

Failure detection (FD) in AI systems is a crucial safeguard for the depl...

Please sign up or login with your details

Forgot password? Click here to reset