Algebraic Ground Truth Inference: Non-Parametric Estimation of Sample Errors by AI Algorithms

06/15/2020
by   Andrés Corrada-Emmanuel, et al.
0

Binary classification is widely used in ML production systems. Monitoring classifiers in a constrained event space is well known. However, real world production systems often lack the ground truth these methods require. Privacy concerns may also require that the ground truth needed to evaluate the classifiers cannot be made available. In these autonomous settings, non-parametric estimators of performance are an attractive solution. They do not require theoretical models about how the classifiers made errors in any given sample. They just estimate how many errors there are in a sample of an industrial or robotic datastream. We construct one such non-parametric estimator of the sample errors for an ensemble of weak binary classifiers. Our approach uses algebraic geometry to reformulate the self-assessment problem for ensembles of binary classifiers as an exact polynomial system. The polynomial formulation can then be used to prove - as an algebraic geometry algorithm - that no general solution to the self-assessment problem is possible. However, specific solutions are possible in settings where the engineering context puts the classifiers close to independent errors. The practical utility of the method is illustrated on a real-world dataset from an online advertising campaign and a sample of common classification benchmarks. The accuracy estimators in the experiments where we have ground truth are better than one part in a hundred. The online advertising campaign data, where we do not have ground truth data, is verified by an internal consistency approach whose validity we conjecture as an algebraic geometry theorem. We call this approach - algebraic ground truth inference.

READ FULL TEXT
research
10/28/2020

Independence Tests Without Ground Truth for Noisy Learners

Exact ground truth invariant polynomial systems can be written for arbit...
research
06/17/2019

Error Correcting Algorithms for Sparsely Correlated Regressors

Autonomy and adaptation of machines requires that they be able to measur...
research
06/25/2021

The Effect of Ground Truth Accuracy on the Evaluation of Localization Systems

The ability to accurately evaluate the performance of location determina...
research
04/26/2016

Condorcet's Jury Theorem for Consensus Clustering and its Implications for Diversity

Condorcet's Jury Theorem has been invoked for ensemble classifiers to in...
research
02/15/2022

Binary Classification for High Dimensional Data using Supervised Non-Parametric Ensemble Method

Medical Research data used for prognostication deals with binary classif...
research
06/02/2023

Streaming algorithms for evaluating noisy judges on unlabeled data – binary classification

The evaluation of noisy binary classifiers on unlabeled data is treated ...
research
11/30/2017

Towards Data Quality Assessment in Online Advertising

In online advertising, our aim is to match the advertisers with the most...

Please sign up or login with your details

Forgot password? Click here to reset