Word Spotting in Cursive Handwritten Documents using Modified Character Shape Codes

10/22/2013
by   Sayantan Sarkar, et al.
0

There is a large collection of Handwritten English paper documents of Historical and Scientific importance. But paper documents are not recognized directly by computer. Hence the closest way of indexing these documents is by storing their document digital image. Hence a large database of document images can replace the paper documents. But the document and data corresponding to each image cannot be directly recognized by the computer. This paper applies the technique of word spotting using Modified Character Shape Code to Handwritten English document images for quick and efficient query search of words on a database of document images. It is different from other Word Spotting techniques as it implements two level of selection for word segments to match search query. First based on word size and then based on character shape code of query. It makes the process faster and more efficient and reduces the need of multiple pre-processing.

READ FULL TEXT

page 3

page 5

research
09/17/2020

Word Segmentation from Unconstrained Handwritten Bangla Document Images using Distance Transform

Segmentation of handwritten document images into text lines and words is...
research
05/09/2021

End-to-End Optical Character Recognition for Bengali Handwritten Words

Optical character recognition (OCR) is a process of converting analogue ...
research
06/20/2022

Open Set Classification of Untranscribed Handwritten Documents

Huge amounts of digital page images of important manuscripts are preserv...
research
05/31/2021

Pho(SC)Net: An Approach Towards Zero-shot Word Image Recognition in Historical Documents

Annotating words in a historical document image archive for word image r...
research
06/25/2011

Morphological Reconstruction for Word Level Script Identification

A line of a bilingual document page may contain text words in regional l...
research
06/20/2019

Pattern Spotting in Historical Documents Using Convolutional Models

Pattern spotting consists of searching in a collection of historical doc...
research
02/17/2018

HWNet v2: An Efficient Word Image Representation for Handwritten Documents

We present a framework for learning efficient holistic representation fo...

Please sign up or login with your details

Forgot password? Click here to reset