Zero-shot Task Transfer for Invoice Extraction via Class-aware QA Ensemble

08/13/2021
by   Prithiviraj Damodaran, et al.
0

We present VESPA, an intentionally simple yet novel zero-shot system for layout, locale, and domain agnostic document extraction. In spite of the availability of large corpora of documents, the lack of labeled and validated datasets makes it a challenge to discriminatively train document extraction models for enterprises. We show that this problem can be addressed by simply transferring the information extraction (IE) task to a natural language Question-Answering (QA) task without engineering task-specific architectures. We demonstrate the effectiveness of our system by evaluating on a closed corpus of real-world retail and tax invoices with multiple complex layouts, domains, and geographies. The empirical evaluation shows that our system outperforms 4 prominent commercial invoice solutions that use discriminatively trained models with architectures specifically crafted for invoice extraction. We extracted 6 fields with zero upfront human annotation or training with an Avg. F1 of 87.50.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2021

Zero-Shot Dialogue State Tracking via Cross-Task Transfer

Zero-shot transfer learning for dialogue state tracking (DST) enables us...
research
05/12/2021

Encoding Explanatory Knowledge for Zero-shot Science Question Answering

This paper describes N-XKT (Neural encoding based on eXplanatory Knowled...
research
08/15/2020

Crossing The Gap: A Deep Dive into Zero-Shot Sim-to-Real Transfer for Dynamics

Zero-shot sim-to-real transfer of tasks with complex dynamics is a highl...
research
08/21/2023

DocPrompt: Large-scale continue pretrain for zero-shot and few-shot document question answering

In this paper, we propose Docprompt for document question answering task...
research
05/04/2018

A Coherent Unsupervised Model for Toponym Resolution

Toponym Resolution, the task of assigning a location mention in a docume...
research
04/21/2023

Information Extraction from Documents: Question Answering vs Token Classification in real-world setups

Research in Document Intelligence and especially in Document Key Informa...
research
05/27/2021

Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence

Automated event extraction in social science applications often requires...

Please sign up or login with your details

Forgot password? Click here to reset