ED2: Two-stage Active Learning for Error Detection – Technical Report

08/17/2019
by   Felix Neutatz, et al.
0

Traditional error detection approaches require user-defined parameters and rules. Thus, the user has to know both the error detection system and the data. However, we can also formulate error detection as a semi-supervised classification problem that only requires domain expertise. The challenges for such an approach are twofold: (1) to represent the data in a way that enables a classification model to identify various kinds of data errors, and (2) to pick the most promising data values for learning. In this paper, we address these challenges with ED2, our new example-driven error detection method. First, we present a new two-dimensional multi-classifier sampling strategy for active learning. Second, we propose novel multi-column features. The combined application of these techniques provides fast convergence of the classification task with high detection accuracy. On several real-world datasets, ED2 requires, on average, less than 1 detection approaches. This report extends the peer-reviewed paper "ED2: A Case for Active Learning in Error Detection". All source code related to this project is available on GitHub.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2020

Bayesian active learning for production, a systematic study and a reusable library

Active learning is able to reduce the amount of labelling effort by usin...
research
08/16/2023

How To Overcome Confirmation Bias in Semi-Supervised Image Classification By Active Learning

Do we need active learning? The rise of strong deep semi-supervised meth...
research
03/26/2019

Privacy-preserving Active Learning on Sensitive Data for User Intent Classification

Active learning holds promise of significantly reducing data annotation ...
research
02/18/2020

Adaptive Region-Based Active Learning

We present a new active learning algorithm that adaptively partitions th...
research
12/06/2019

A quantum active learning algorithm for sampling against adversarial attacks

Adversarial attacks represent a serious menace for learning algorithms a...
research
01/11/2018

Active Community Detection: A Maximum Likelihood Approach

We propose novel semi-supervised and active learning algorithms for the ...
research
11/20/2022

Finding active galactic nuclei through Fink

We present the Active Galactic Nuclei (AGN) classifier as currently impl...

Please sign up or login with your details

Forgot password? Click here to reset