DTrOCR: Decoder-only Transformer for Optical Character Recognition

08/30/2023
by   Masato Fujitake, et al.
0

Typical text recognition methods rely on an encoder-decoder structure, in which the encoder extracts features from an image, and the decoder produces recognized text from these features. In this study, we propose a simpler and more effective method for text recognition, known as the Decoder-only Transformer for Optical Character Recognition (DTrOCR). This method uses a decoder-only Transformer to take advantage of a generative language model that is pre-trained on a large corpus. We examined whether a generative language model that has been successful in natural language processing can also be effective for text recognition in computer vision. Our experiments demonstrated that DTrOCR outperforms current state-of-the-art methods by a large margin in the recognition of printed, handwritten, and scene text in both English and Chinese.

READ FULL TEXT
research
05/27/2022

GIT: A Generative Image-to-text Transformer for Vision and Language

In this paper, we design and train a Generative Image-to-text Transforme...
research
06/09/2022

Transformer based Urdu Handwritten Text Optical Character Reader

Extracting Handwritten text is one of the most important components of d...
research
09/21/2021

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

Text recognition is a long-standing research problem for document digita...
research
05/30/2019

Deep Learning Approach for Receipt Recognition

Inspired by the recent successes of deep learning on Computer Vision and...
research
04/15/2021

Rethinking Text Line Recognition Models

In this paper, we study the problem of text line recognition. Unlike mos...
research
03/30/2023

A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

There has been a recent explosion of computer vision models which perfor...
research
11/24/2021

Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages

This paper presents a novel training method for end-to-end scene text re...

Please sign up or login with your details

Forgot password? Click here to reset