End-to-End Information Extraction without Token-Level Supervision

07/16/2017
by   Rasmus Berg Palm, et al.
0

Most state-of-the-art information extraction approaches rely on token-level labels to find the areas of interest in text. Unfortunately, these labels are time-consuming and costly to create, and consequently, not available for many real-life IE tasks. To make matters worse, token-level labels are usually not the desired output, but just an intermediary step. End-to-end (E2E) models, which take raw text as input and produce the desired output directly, need not depend on token-level labels. We propose an E2E model based on pointer networks, which can be trained directly on pairs of raw input and output text. We evaluate our model on the ATIS data set, MIT restaurant corpus and the MIT movie corpus and compare to neural baselines that do use token-level labels. We achieve competitive results, within a few percentage points of the baselines, showing the feasibility of E2E information extraction without the need for token-level labels. This opens up new possibilities, as for many tasks currently addressed by human extractors, raw input and output data are available, but not token-level labels.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/18/2018

Attend, Copy, Parse - End-to-end information extraction from documents

Document information extraction tasks performed by humans create data co...
research
10/28/2020

CopyNext: Explicit Span Copying and Alignment in Sequence to Sequence Models

Copy mechanisms are employed in sequence to sequence models (seq2seq) to...
research
10/05/2022

Token Classification for Disambiguating Medical Abbreviations

Abbreviations are unavoidable yet critical parts of the medical text. Us...
research
05/28/2021

ByT5: Towards a token-free future with pre-trained byte-to-byte models

Most widely-used pre-trained language models operate on sequences of tok...
research
05/06/2018

Zero-shot Sequence Labeling: Transferring Knowledge from Sentences to Tokens

Can attention- or gradient-based visualization techniques be used to inf...
research
08/02/2022

Lost in Space Marking

We look at a decision taken early in training a subword tokenizer, namel...
research
09/06/2021

End-to-end Neural Information Status Classification

Most previous studies on information status (IS) classification and brid...

Please sign up or login with your details

Forgot password? Click here to reset