Detecting model drift using polynomial relations

10/24/2021
by   Eliran Roffe, et al.
0

Machine learning (ML) models serve critical functions, such as classifying loan applicants as good or bad risks. Each model is trained under the assumption that the data used in training, and the data used in field come from the same underlying unknown distribution. Often this assumption is broken in practice. It is desirable to identify when this occurs in order to minimize the impact on model performance. We suggest a new approach to detect change in the data distribution by identifying polynomial relations between the data features. We measure the strength of each identified relation using its R-square value. A strong polynomial relation captures a significant trait of the data which should remain stable if the data distribution does not change. We thus use a set of learned strong polynomial relations to identify drift. For a set of polynomial relations that are stronger than a given desired threshold, we calculate the amount of drift observed for that relation. The amount of drift is estimated by calculating the Bayes Factor for the polynomial relation likelihood of the baseline data versus field data. We empirically validate the approach by simulating a range of changes in three publicly-available data sets, and demonstrate the ability to identify drift using the Bayes Factor of the polynomial relation likelihood change.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/10/2021

Automatically detecting data drift in machine learning classifiers

Classifiers and other statistics-based machine learning (ML) techniques ...
research
08/29/2019

An Auto-ML Framework Based on GBDT for Lifelong Learning

Automatic Machine Learning (Auto-ML) has attracted more and more attenti...
research
08/09/2020

Concept Drift Detection: Dealing with MissingValues via Fuzzy Distance Estimations

In data streams, the data distribution of arriving observations at diffe...
research
05/28/2023

Reliable and Interpretable Drift Detection in Streams of Short Texts

Data drift is the change in model input data that is one of the key fact...
research
05/21/2022

Temporal Domain Generalization with Drift-Aware Dynamic Neural Network

Temporal domain generalization is a promising yet extremely challenging ...
research
04/21/2022

The Silent Problem – Machine Learning Model Failure – How to Diagnose and Fix Ailing Machine Learning Models

The COVID-19 pandemic has dramatically changed how healthcare is deliver...
research
03/27/2021

Human-in-the-loop Handling of Knowledge Drift

We introduce and study knowledge drift (KD), a complex form of drift tha...

Please sign up or login with your details

Forgot password? Click here to reset