DeepAI AI Chat
Log In Sign Up

Detecting model drift using polynomial relations

by   Eliran Roffe, et al.

Machine learning (ML) models serve critical functions, such as classifying loan applicants as good or bad risks. Each model is trained under the assumption that the data used in training, and the data used in field come from the same underlying unknown distribution. Often this assumption is broken in practice. It is desirable to identify when this occurs in order to minimize the impact on model performance. We suggest a new approach to detect change in the data distribution by identifying polynomial relations between the data features. We measure the strength of each identified relation using its R-square value. A strong polynomial relation captures a significant trait of the data which should remain stable if the data distribution does not change. We thus use a set of learned strong polynomial relations to identify drift. For a set of polynomial relations that are stronger than a given desired threshold, we calculate the amount of drift observed for that relation. The amount of drift is estimated by calculating the Bayes Factor for the polynomial relation likelihood of the baseline data versus field data. We empirically validate the approach by simulating a range of changes in three publicly-available data sets, and demonstrate the ability to identify drift using the Bayes Factor of the polynomial relation likelihood change.


page 1

page 2

page 3

page 4


Automatically detecting data drift in machine learning classifiers

Classifiers and other statistics-based machine learning (ML) techniques ...

An Auto-ML Framework Based on GBDT for Lifelong Learning

Automatic Machine Learning (Auto-ML) has attracted more and more attenti...

Concept Drift Detection: Dealing with MissingValues via Fuzzy Distance Estimations

In data streams, the data distribution of arriving observations at diffe...

Temporal Domain Generalization with Drift-Aware Dynamic Neural Network

Temporal domain generalization is a promising yet extremely challenging ...

A probability theoretic approach to drifting data in continuous time domains

The notion of drift refers to the phenomenon that the distribution, whic...

The Silent Problem – Machine Learning Model Failure – How to Diagnose and Fix Ailing Machine Learning Models

The COVID-19 pandemic has dramatically changed how healthcare is deliver...

Concept Drift and Covariate Shift Detection Ensemble with Lagged Labels

In model serving, having one fixed model during the entire often life-lo...