Key-value information extraction from full handwritten pages

04/26/2023
by   Solène Tarride, et al.
0

We propose a Transformer-based approach for information extraction from digitized handwritten documents. Our approach combines, in a single model, the different steps that were so far performed by separate models: feature extraction, handwriting recognition and named entity recognition. We compare this integrated approach with traditional two-stage methods that perform handwriting recognition before named entity recognition, and present results at different levels: line, paragraph, and page. Our experiments show that attention-based models are especially interesting when applied on full pages, as they do not require any prior segmentation step. Finally, we show that they are able to learn from key-value annotations: a list of important words with their corresponding named entities. We compare our models to state-of-the-art methods on three public databases (IAM, ESPOSALLES, and POPP) and outperform previous performances on all three datasets.

READ FULL TEXT
research
12/20/2019

TreyNet: A Neural Model for Text Localization, Transcription and Named Entity Recognition in Full Pages

In the last years, the consolidation of deep neural network architecture...
research
04/27/2023

Large Scale Genealogical Information Extraction From Handwritten Quebec Parish Records

This paper presents a complete workflow designed for extracting informat...
research
12/08/2021

Transformer-Based Approach for Joint Handwriting and Named Entity Recognition in Historical documents

The extraction of relevant information carried out by named entities in ...
research
04/26/2023

SIMARA: a database for key-value information extraction from full pages

We propose a new database for information extraction from historical han...
research
05/29/2018

Lightly-supervised Representation Learning with Global Interpretability

We propose a lightly-supervised approach for information extraction, in ...
research
06/20/2012

Mixture-of-Parents Maximum Entropy Markov Models

We present the mixture-of-parents maximum entropy Markov model (MoP-MEMM...
research
06/05/2023

Transformer-Based UNet with Multi-Headed Cross-Attention Skip Connections to Eliminate Artifacts in Scanned Documents

The extraction of text in high quality is essential for text-based docum...

Please sign up or login with your details

Forgot password? Click here to reset