Humanly Certifying Superhuman Classifiers

09/16/2021
by   Qiongkai Xu, et al.
0

Estimating the performance of a machine learning system is a longstanding challenge in artificial intelligence research. Today, this challenge is especially relevant given the emergence of systems which appear to increasingly outperform human beings. In some cases, this "superhuman" performance is readily demonstrated; for example by defeating legendary human players in traditional two player games. On the other hand, it can be challenging to evaluate classification models that potentially surpass human performance. Indeed, human annotations are often treated as a ground truth, which implicitly assumes the superiority of the human over any models trained on human annotations. In reality, human annotators can make mistakes and be subjective. Evaluating the performance with respect to a genuine oracle may be more objective and reliable, even when querying the oracle is expensive or impossible. In this paper, we first raise the challenge of evaluating the performance of both humans and models with respect to an oracle which is unobserved. We develop a theory for estimating the accuracy compared to the oracle, using only imperfect human annotations for reference. Our analysis provides a simple recipe for detecting and certifying superhuman performance in this setting, which we believe will assist in understanding the stage of current research on classification. We validate the convergence of the bounds and the assumptions of our theory on carefully designed toy experiments with known oracles. Moreover, we demonstrate the utility of our theory by meta-analyzing large-scale natural language processing tasks, for which an oracle does not exist, and show that under our assumptions a number of models from recent years are with high probability superhuman.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2020

Learning from Imperfect Annotations

Many machine learning systems today are trained on large amounts of huma...
research
02/01/2022

Rethinking the notion of oracle: A link between synthetic descriptive set theory and effective topos theory

We present three different perspectives of oracle. First, an oracle is a...
research
02/01/2019

The Hanabi Challenge: A New Frontier for AI Research

From the early days of computing, games have been important testbeds for...
research
05/23/2019

On modelling the emergence of logical thinking

Recent progress in machine learning techniques have revived interest in ...
research
10/23/2018

What can AI do for me: Evaluating Machine Learning Interpretations in Cooperative Play

Machine learning is an important tool for decision making, but its ethic...
research
08/03/2023

LOUC: Leave-One-Out-Calibration Measure for Analyzing Human Matcher Performance

Schema matching is a core data integration task, focusing on identifying...
research
02/17/2022

Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness

Much of machine learning research focuses on predictive accuracy: given ...

Please sign up or login with your details

Forgot password? Click here to reset