Active Learning from Crowd in Document Screening

11/11/2020
by   Evgeny Krivosheev, et al.
0

In this paper, we explore how to efficiently combine crowdsourcing and machine intelligence for the problem of document screening, where we need to screen documents with a set of machine-learning filters. Specifically, we focus on building a set of machine learning classifiers that evaluate documents, and then screen them efficiently. It is a challenging task since the budget is limited and there are countless number of ways to spend the given budget on the problem. We propose a multi-label active learning screening specific sampling technique – objective-aware sampling – for querying unlabelled documents for annotating. Our algorithm takes a decision on which machine filter need more training data and how to choose unlabeled items to annotate in order to minimize the risk of overall classification errors rather than minimizing a single filter error. We demonstrate that objective-aware sampling significantly outperforms the state of the art active learning sampling strategies.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

04/01/2019

Combining Crowd and Machines for Multi-predicate Item Screening

This paper discusses how crowd and machine classifiers can be efficientl...
04/03/2019

Empirical Evaluations of Active Learning Strategies in Legal Document Review

One type of machine learning, text classification, is now regularly appl...
03/21/2018

Crowd-Machine Collaboration for Item Screening

In this paper we describe how crowd and machine classifier can be effici...
01/21/2021

Active Hybrid Classification

Hybrid crowd-machine classifiers can achieve superior performance by com...
10/12/2020

Active learning with RESSPECT: Resource allocation for extragalactic astronomical transients

The recent increase in volume and complexity of available astronomical d...
10/01/2021

OPAD: An Optimized Policy-based Active Learning Framework for Document Content Analysis

Documents are central to many business systems, and include forms, repor...
10/19/2012

Budgeted Learning of Naive-Bayes Classifiers

Frequently, acquiring training data has an associated cost. We consider ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.