DeepAI
Log In Sign Up

A Probabilistic Framework for Lexicon-based Keyword Spotting in Handwritten Text Images

04/09/2021
by   E. Vidal, et al.
0

Query by String Keyword Spotting (KWS) is here considered as a key technology for indexing large collections of handwritten text images to allow fast textual access to the contents of these collections. Under this perspective, a probabilistic framework for lexicon-based KWS in text images is presented. The presentation aims at providing a tutorial view that helps to understand the relations between classical statements of KWS and the relative challenges entailed by these statements. More specifically, the development of the proposed framework makes it self-evident that word recognition or classification implicitly or explicitly underlies any formulation of KWS. Moreover, it clearly suggests that the same statistical models and training methods successfully used for handwriting text recognition can advantageously be used also for KWS, even though KWS does not generally require or rely on any kind of previously produced image transcripts. These ideas are developed into a specific, probabilistically sound approach for segmentation-free, lexicon-based, query-by-string KWS. Experiments carried out using this approach are presented, which support the consistency and general interest of the proposed framework. Several datasets, traditionally used for KWS benchmarking are considered, with results significantly better than those previously published for these datasets. In addition, results on two new, larger handwritten text image datasets are reported, showing the great potential of the methods proposed in this paper for indexing and textual search in large collections of handwritten documents.

READ FULL TEXT

page 1

page 2

page 3

page 4

06/20/2022

Open Set Classification of Untranscribed Handwritten Documents

Huge amounts of digital page images of important manuscripts are preserv...
10/02/2021

Asking questions on handwritten document collections

This work addresses the problem of Question Answering (QA) on handwritte...
03/22/2017

Neural Ctrl-F: Segmentation-free Query-by-String Word Spotting in Handwritten Manuscript Collections

In this paper, we approach the problem of segmentation-free query-by-str...
04/06/2020

Indexing Highly Repetitive String Collections

Two decades ago, a breakthrough in indexing string collections made it p...
10/14/2020

Contextual Pattern Matching

The research on indexing repetitive string collections has focused on th...
06/15/2021

TextStyleBrush: Transfer of Text Aesthetics from a Single Example

We present a novel approach for disentangling the content of a text imag...
03/24/2020

Bootstrapping Weakly Supervised Segmentation-free Word Spotting through HMM-based Alignment

Recent work in word spotting in handwritten documents has yielded impres...