TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

09/21/2021
by   Minghao Li, et al.
0

Text recognition is a long-standing research problem for document digitalization. Existing approaches for text recognition are usually built based on CNN for image understanding and RNN for char-level text generation. In addition, another language model is usually needed to improve the overall accuracy as a post-processing step. In this paper, we propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR, which leverages the Transformer architecture for both image understanding and wordpiece-level text generation. The TrOCR model is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets. Experiments show that the TrOCR model outperforms the current state-of-the-art models on both printed and handwritten text recognition tasks. The code and models will be publicly available at https://aka.ms/TrOCR.

READ FULL TEXT
research
08/30/2023

DTrOCR: Decoder-only Transformer for Optical Character Recognition

Typical text recognition methods rely on an encoder-decoder structure, i...
research
07/27/2023

A Transformer-based Approach for Arabic Offline Handwritten Text Recognition

Handwriting recognition is a challenging and critical problem in the fie...
research
05/18/2020

GPT-too: A language-model-first approach for AMR-to-text generation

Abstract Meaning Representations (AMRs) are broad-coverage sentence-leve...
research
09/20/2023

Kosmos-2.5: A Multimodal Literate Model

We present Kosmos-2.5, a multimodal literate model for machine reading o...
research
12/19/2022

(Psycho-)Linguistic Features Meet Transformer Models for Improved Explainable and Controllable Text Simplification

State-of-the-art text simplification (TS) systems adopt end-to-end neura...
research
07/13/2020

Paranoid Transformer: Reading Narrative of Madness as Computational Approach to Creativity

This papers revisits the receptive theory in context of computational cr...
research
02/19/2021

Progressive Transformer-Based Generation of Radiology Reports

Inspired by Curriculum Learning, we propose a consecutive (i.e. image-to...

Please sign up or login with your details

Forgot password? Click here to reset