Transformer based Urdu Handwritten Text Optical Character Reader

06/09/2022
by   Mohammad Daniyal Shaiq, et al.
16

Extracting Handwritten text is one of the most important components of digitizing information and making it available for large scale setting. Handwriting Optical Character Reader (OCR) is a research problem in computer vision and natural language processing computing, and a lot of work has been done for English, but unfortunately, very little work has been done for low resourced languages such as Urdu. Urdu language script is very difficult because of its cursive nature and change of shape of characters based on it's relative position, therefore, a need arises to propose a model which can understand complex features and generalize it for every kind of handwriting style. In this work, we propose a transformer based Urdu Handwritten text extraction model. As transformers have been very successful in Natural Language Understanding task, we explore them further to understand complex Urdu Handwriting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2023

DTrOCR: Decoder-only Transformer for Optical Character Recognition

Typical text recognition methods rely on an encoder-decoder structure, i...
research
07/09/2023

A Novel Pipeline for Improving Optical Character Recognition through Post-processing Using Natural Language Processing

Optical Character Recognition (OCR) technology finds applications in dig...
research
03/19/2020

Temporal Embeddings and Transformer Models for Narrative Text Understanding

We present two deep learning approaches to narrative text understanding ...
research
07/07/2021

Handling Heavily Abbreviated Manuscripts: HTR engines vs text normalisation approaches

Although abbreviations are fairly common in handwritten sources, particu...
research
08/19/2022

To show or not to show: Redacting sensitive text from videos of electronic displays

With the increasing prevalence of video recordings there is a growing ne...
research
08/18/2023

A tailored Handwritten-Text-Recognition System for Medieval Latin

The Bavarian Academy of Sciences and Humanities aims to digitize its Med...
research
08/18/2020

EASTER: Efficient and Scalable Text Recognizer

Recent progress in deep learning has led to the development of Optical C...

Please sign up or login with your details

Forgot password? Click here to reset