Data Models for Dataset Drift Controls in Machine Learning With Images

11/04/2022
by   Luis Oala, et al.
1

Camera images are ubiquitous in machine learning research. They also play a central role in the delivery of important services spanning medicine and environmental surveying. However, the application of machine learning models in these domains has been limited because of robustness concerns. A primary failure mode are performance drops due to differences between the training and deployment data. While there are methods to prospectively validate the robustness of machine learning models to such dataset drifts, existing approaches do not account for explicit models of the primary object of interest: the data. This makes it difficult to create physically faithful drift test cases or to provide specifications of data models that should be avoided when deploying a machine learning model. In this study, we demonstrate how these shortcomings can be overcome by pairing machine learning robustness validation with physical optics. We examine the role raw sensor data and differentiable data models can play in controlling performance risks related to image dataset drift. The findings are distilled into three applications. First, drift synthesis enables the controlled generation of physically faithful drift test cases. The experiments presented here show that the average decrease in model performance is ten to four times less severe than under post-hoc augmentation testing. Second, the gradient connection between task and data models allows for drift forensics that can be used to specify performance-sensitive data models which should be avoided during deployment of a machine learning model. Third, drift adjustment opens up the possibility for processing adjustments in the face of drift. This can lead to speed up and stabilization of classifier training at a margin of up to 20 accuracy. A guide to access the open code and datasets is available at https://github.com/aiaudit-org/raw2logit.

READ FULL TEXT

page 8

page 10

page 15

page 32

page 33

page 36

page 40

page 42

research
05/12/2020

Handling Concept Drift for Predictions in Business Process Mining

Predictive services nowadays play an important role across all business ...
research
09/07/2023

Uncovering Drift in Textual Data: An Unsupervised Method for Detecting and Mitigating Drift in Machine Learning Models

Drift in machine learning refers to the phenomenon where the statistical...
research
04/07/2020

Adversarial Validation Approach to Concept Drift Problem in Automated Machine Learning Systems

In automated machine learning systems, concept drift in input data is on...
research
09/02/2021

Assessing Machine Learning Approaches to Address IoT Sensor Drift

The proliferation of IoT sensors and their deployment in various industr...
research
04/21/2022

The Silent Problem – Machine Learning Model Failure – How to Diagnose and Fix Ailing Machine Learning Models

The COVID-19 pandemic has dramatically changed how healthcare is deliver...
research
03/07/2022

ImageNet-Patch: A Dataset for Benchmarking Machine Learning Robustness against Adversarial Patches

Adversarial patches are optimized contiguous pixel blocks in an input im...
research
02/15/2021

Unified Shapley Framework to Explain Prediction Drift

Predictions are the currency of a machine learning model, and to underst...

Please sign up or login with your details

Forgot password? Click here to reset