One-shot Information Extraction from Document Images using Neuro-Deductive Program Synthesis

06/06/2019
by   Vishal Sunder, et al.
0

Our interest in this paper is in meeting a rapidly growing industrial demand for information extraction from images of documents such as invoices, bills, receipts etc. In practice users are able to provide a very small number of example images labeled with the information that needs to be extracted. We adopt a novel two-level neuro-deductive, approach where (a) we use pre-trained deep neural networks to populate a relational database with facts about each document-image; and (b) we use a form of deductive reasoning, related to meta-interpretive learning of transition systems to learn extraction programs: Given task-specific transitions defined using the entities and relations identified by the neural detectors and a small number of instances (usually 1, sometimes 2) of images and the desired outputs, a resource-bounded meta-interpreter constructs proofs for the instance(s) via logical deduction; a set of logic programs that extract each desired entity is easily synthesized from such proofs. In most cases a single training example together with a noisy-clone of itself suffices to learn a program-set that generalizes well on test documents, at which time the value of each entity is determined by a majority vote across its program-set. We demonstrate our two-level neuro-deductive approach on publicly available datasets ("Patent" and "Doctor's Bills") and also describe its use in a real-life industrial problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2022

Landmarks and Regions: A Robust Approach to Data Extraction

We propose a new approach to extracting data items or field values from ...
research
10/19/2021

Using Program Synthesis and Inductive Logic Programming to solve Bongard Problems

The ability to recognise and make analogies is often used as a measure o...
research
11/15/2017

WebRelate: Integrating Web Data with Spreadsheets using Examples

Data integration between web sources and relational data is a key challe...
research
12/11/2018

Deep Reader: Information extraction from Document images via relation extraction and Natural Language

Recent advancements in the area of Computer Vision with state-of-art Neu...
research
06/15/2023

Document Entity Retrieval with Massive and Noisy Pre-training

Visually-Rich Document Entity Retrieval (VDER) is a type of machine lear...
research
12/06/2019

Integrating Deep Learning with Logic Fusion for Information Extraction

Information extraction (IE) aims to produce structured information from ...
research
01/24/2023

Sherlock in OSS: A Novel Approach of Content-Based Searching in Object Storage System

Object Storage Systems (OSS) inside a cloud promise scalability, durabil...

Please sign up or login with your details

Forgot password? Click here to reset