Optimising Human-Machine Collaboration for Efficient High-Precision Information Extraction from Text Documents

02/18/2023
by   Bradley Butcher, et al.
0

While humans can extract information from unstructured text with high precision and recall, this is often too time-consuming to be practical. Automated approaches, on the other hand, produce nearly-immediate results, but may not be reliable enough for high-stakes applications where precision is essential. In this work, we consider the benefits and drawbacks of various human-only, human-machine, and machine-only information extraction approaches. We argue for the utility of a human-in-the-loop approach in applications where high precision is required, but purely manual extraction is infeasible. We present a framework and an accompanying tool for information extraction using weak-supervision labelling with human validation. We demonstrate our approach on three criminal justice datasets. We find that the combination of computer speed and human understanding yields precision comparable to manual annotation while requiring only a fraction of time, and significantly outperforms fully automated baselines in terms of precision.

READ FULL TEXT

page 5

page 9

page 10

page 17

page 18

page 19

page 23

research
12/25/2020

A Cascaded Residual UNET for Fully Automated Segmentation of Prostate and Peripheral Zone in T2-weighted 3D Fast Spin Echo Images

Multi-parametric MR images have been shown to be effective in the non-in...
research
05/05/2021

Iterative Human and Automated Identification of Wildlife Images

Camera trapping is increasingly used to monitor wildlife, but this techn...
research
11/05/2018

The one comparing narrative social network extraction techniques

Analysing narratives through their social networks is an expanding field...
research
12/04/2018

Information Extraction Framework to Build Legislation Network

This paper concerns an Information Extraction process for building a dyn...
research
12/12/2018

Towards Automating Precision Studies of Clone Detectors

Current research in clone detection suffers from poor ecosystems for eva...
research
05/24/2023

A Human-in-the-Loop Approach for Information Extraction from Privacy Policies under Data Scarcity

Machine-readable representations of privacy policies are door openers fo...
research
07/07/2020

Unsupervised Data Extraction from Computer-generated Documents with Single Line Formatting

Processing large amounts of data is an essential problem of the big data...

Please sign up or login with your details

Forgot password? Click here to reset