End-to-End Document Classification and Key Information Extraction using Assignment Optimization

06/01/2023
by   Ciaran Cooney, et al.
0

We propose end-to-end document classification and key information extraction (KIE) for automating document processing in forms. Through accurate document classification we harness known information from templates to enhance KIE from forms. We use text and layout encoding with a cosine similarity measure to classify visually-similar documents. We then demonstrate a novel application of mixed integer programming by using assignment optimization to extract key information from documents. Our approach is validated on an in-house dataset of noisy scanned forms. The best performing document classification approach achieved 0.97 f1 score. A mean f1 score of 0.94 for the KIE task suggests there is significant potential in applying optimization techniques. Abation results show that the method relies on document preprocessing techniques to mitigate Type II errors and achieve optimal performance.

READ FULL TEXT
research
12/14/2021

Text Classification Models for Form Entity Linking

Forms are a widespread type of template-based document used in a great v...
research
10/17/2020

Learning from similarity and information extraction from structured documents

Neural networks have successfully advanced in the task of information ex...
research
12/11/2022

Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images

Digitization of scanned receipts aims to extract text from receipt image...
research
05/16/2023

About Evaluation of F1 Score for RECENT Relation Extraction System

This document contains a discussion of the F1 score evaluation used in t...
research
10/25/2016

How Document Pre-processing affects Keyphrase Extraction Performance

The SemEval-2010 benchmark dataset has brought renewed attention to the ...
research
01/28/2021

DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents

Creating presentation materials requires complex multimodal reasoning sk...

Please sign up or login with your details

Forgot password? Click here to reset