Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images

12/11/2022
by   Hongkuan Zhang, et al.
0

Digitization of scanned receipts aims to extract text from receipt images and save it into structured documents. This is usually split into two sub-tasks: text localization and optical character recognition (OCR). Most existing OCR models only focus on the cropped text instance images, which require the bounding box information provided by a text region detection model. Introducing an additional detector to identify the text instance images in advance is inefficient, however instance-level OCR models have very low accuracy when processing the whole image for the document-level OCR, such as receipt images containing multiple text lines arranged in various layouts. To this end, we propose a localization-free document-level OCR model for transcribing all the characters in a receipt image into an ordered sequence end-to-end. Specifically, we finetune the pretrained Transformer-based instance-level model TrOCR with randomly cropped image chunks, and gradually increase the image chunk size to generalize the recognition ability from instance images to full-page images. In our experiments on the SROIE receipt OCR dataset, the model finetuned with our strategy achieved 64.4 F1-score and a 22.8 error rates (CER) on the word-level and character-level metrics, respectively, which outperforms the baseline results with 48.5 F1-score and 50.6 best model, which splits the full image into 15 equally sized chunks, gives 87.8 F1-score and 4.98 the output. Moreover, the characters in the generated document-level sequences are arranged in the reading order, which is practical for real-world applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/29/2022

PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition

Handwritten Chinese text recognition (HCTR) has been an active research ...
research
10/22/2020

TLGAN: document Text Localization using Generative Adversarial Nets

Text localization from the digital image is the first step for the optic...
research
03/18/2020

ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document Images

We introduce the Scanning Single Shot Detector (ScanSSD) for locating ma...
research
10/21/2019

CNN based Extraction of Panels/Characters from Bengali Comic Book Page Images

Peoples nowadays prefer to use digital gadgets like cameras or mobile ph...
research
10/17/2020

Learning from similarity and information extraction from structured documents

Neural networks have successfully advanced in the task of information ex...
research
06/01/2023

End-to-End Document Classification and Key Information Extraction using Assignment Optimization

We propose end-to-end document classification and key information extrac...
research
06/25/2011

Morphological Reconstruction for Word Level Script Identification

A line of a bilingual document page may contain text words in regional l...

Please sign up or login with your details

Forgot password? Click here to reset