ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical Handwritten Documents

by   Sana Khamekhem Jemni, et al.

Keyword spotting (KWS) in historical documents is an important tool for the initial exploration of digitized collections. Nowadays, the most efficient KWS methods are relying on machine learning techniques that require a large amount of annotated training data. However, in the case of historical manuscripts, there is a lack of annotated corpus for training. To handle the data scarcity issue, we investigate the merits of the self-supervised learning to extract useful representations of the input data without relying on human annotations and then using these representations in the downstream task. We propose ST-KeyS, a masked auto-encoder model based on vision transformers where the pretraining stage is based on the mask-and-predict paradigm, without the need of labeled data. In the fine-tuning stage, the pre-trained encoder is integrated into a siamese neural network model that is fine-tuned to improve feature embedding from the input images. We further improve the image representation using pyramidal histogram of characters (PHOC) embedding to create and exploit an intermediate representation of images based on text attributes. In an exhaustive experimental evaluation on three widely used benchmark datasets (Botany, Alvermann Konzilsprotokolle and George Washington), the proposed approach outperforms state-of-the-art methods trained on the same datasets.


page 1

page 2

page 3

page 4


Estimating Galactic Distances From Images Using Self-supervised Representation Learning

We use a contrastive self-supervised learning framework to estimate dist...

Lacuna Reconstruction: Self-supervised Pre-training for Low-Resource Historical Document Transcription

We present a self-supervised pre-training approach for learning rich vis...

Self-Supervised Learning of Echocardiogram Videos Enables Data-Efficient Clinical Diagnosis

Given the difficulty of obtaining high-quality labels for medical image ...

Self-omics: A Self-supervised Learning Framework for Multi-omics Cancer Data

We have gained access to vast amounts of multi-omics data thanks to Next...

Graph Masked Autoencoder

Transformers have achieved state-of-the-art performance in learning grap...

A Few Shot Multi-Representation Approach for N-gram Spotting in Historical Manuscripts

Despite recent advances in automatic text recognition, the performance r...

How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning

Recent advancements in Deep Learning-based Handwritten Text Recognition ...

Please sign up or login with your details

Forgot password? Click here to reset