ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical Handwritten Documents

03/06/2023
by   Sana Khamekhem Jemni, et al.
0

Keyword spotting (KWS) in historical documents is an important tool for the initial exploration of digitized collections. Nowadays, the most efficient KWS methods are relying on machine learning techniques that require a large amount of annotated training data. However, in the case of historical manuscripts, there is a lack of annotated corpus for training. To handle the data scarcity issue, we investigate the merits of the self-supervised learning to extract useful representations of the input data without relying on human annotations and then using these representations in the downstream task. We propose ST-KeyS, a masked auto-encoder model based on vision transformers where the pretraining stage is based on the mask-and-predict paradigm, without the need of labeled data. In the fine-tuning stage, the pre-trained encoder is integrated into a siamese neural network model that is fine-tuned to improve feature embedding from the input images. We further improve the image representation using pyramidal histogram of characters (PHOC) embedding to create and exploit an intermediate representation of images based on text attributes. In an exhaustive experimental evaluation on three widely used benchmark datasets (Botany, Alvermann Konzilsprotokolle and George Washington), the proposed approach outperforms state-of-the-art methods trained on the same datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/12/2021

Estimating Galactic Distances From Images Using Self-supervised Representation Learning

We use a contrastive self-supervised learning framework to estimate dist...
research
12/16/2021

Lacuna Reconstruction: Self-supervised Pre-training for Low-Resource Historical Document Transcription

We present a self-supervised pre-training approach for learning rich vis...
research
07/23/2022

Self-Supervised Learning of Echocardiogram Videos Enables Data-Efficient Clinical Diagnosis

Given the difficulty of obtaining high-quality labels for medical image ...
research
10/03/2022

Self-omics: A Self-supervised Learning Framework for Multi-omics Cancer Data

We have gained access to vast amounts of multi-omics data thanks to Next...
research
02/17/2022

Graph Masked Autoencoder

Transformers have achieved state-of-the-art performance in learning grap...
research
09/21/2022

A Few Shot Multi-Representation Approach for N-gram Spotting in Historical Manuscripts

Despite recent advances in automatic text recognition, the performance r...
research
05/04/2023

How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning

Recent advancements in Deep Learning-based Handwritten Text Recognition ...

Please sign up or login with your details

Forgot password? Click here to reset