Optimal and Efficient Binary Questioning for Human-in-the-Loop Annotation

07/04/2023
by   Franco Marchesoni-Acland, et al.
0

Even though data annotation is extremely important for interpretability, research and development of artificial intelligence solutions, most research efforts such as active learning or few-shot learning focus on the sample efficiency problem. This paper studies the neglected complementary problem of getting annotated data given a predictor. For the simple binary classification setting, we present the spectrum ranging from optimal general solutions to practical efficient methods. The problem is framed as the full annotation of a binary classification dataset with the minimal number of yes/no questions when a predictor is available. For the case of general binary questions the solution is found in coding theory, where the optimal questioning strategy is given by the Huffman encoding of the possible labelings. However, this approach is computationally intractable even for small dataset sizes. We propose an alternative practical solution based on several heuristics and lookahead minimization of proxy cost functions. The proposed solution is analysed, compared with optimal solutions and evaluated on several synthetic and real-world datasets. On these datasets, the method allows a significant improvement (23-86%) in annotation efficiency.

READ FULL TEXT

page 7

page 8

research
10/29/2021

Convergence of Uncertainty Sampling for Active Learning

Uncertainty sampling in active learning is heavily used in practice to r...
research
03/02/2020

Learning from Positive and Unlabeled Data by Identifying the Annotation Process

In binary classification, Learning from Positive and Unlabeled data (LeP...
research
10/22/2012

Reducing statistical time-series problems to binary classification

We show how binary classification methods developed to work on i.i.d. da...
research
07/23/2019

Efficient Knowledge Graph Accuracy Evaluation

Estimation of the accuracy of a large-scale knowledge graph (KG) often r...
research
04/28/2020

Active Learning for Coreference Resolution using Discrete Annotation

We improve upon pairwise annotation for active learning in coreference r...
research
02/21/2018

Active Learning with Partial Feedback

In the large-scale multiclass setting, assigning labels often consists o...
research
07/27/2023

Retrieval-based Text Selection for Addressing Class-Imbalanced Data in Classification

This paper addresses the problem of selecting of a set of texts for anno...

Please sign up or login with your details

Forgot password? Click here to reset