A Few-shot Learning Approach for Historical Ciphered Manuscript Recognition

09/26/2020
by   Mohamed Ali Souibgui, et al.
5

Encoded (or ciphered) manuscripts are a special type of historical documents that contain encrypted text. The automatic recognition of this kind of documents is challenging because: 1) the cipher alphabet changes from one document to another, 2) there is a lack of annotated corpus for training and 3) touching symbols make the symbol segmentation difficult and complex. To overcome these difficulties, we propose a novel method for handwritten ciphers recognition based on few-shot object detection. Our method first detects all symbols of a given alphabet in a line image, and then a decoding step maps the symbol similarity scores to the final sequence of transcribed symbols. By training on synthetic data, we show that the proposed architecture is able to recognize handwritten ciphers with unseen alphabets. In addition, if few labeled pages with the same alphabet are used for fine tuning, our method surpasses existing unsupervised and supervised HTR methods for ciphers recognition.

READ FULL TEXT
research
07/21/2021

Few Shots Is All You Need: A Progressive Few Shot Learning Approach for Low Resource Handwriting Recognition

Handwritten text recognition in low resource scenarios, such as manuscri...
research
10/16/2019

Offline handwritten mathematical symbol recognition utilising deep learning

This paper describes an approach for offline recognition of handwritten ...
research
04/04/2019

Learning to Decipher Hate Symbols

Existing computational models to understand hate speech typically frame ...
research
06/01/2020

Symbol Spotting on Digital Architectural Floor Plans Using a Deep Learning-based Framework

This papers focuses on symbol spotting on real-world digital architectur...
research
04/30/2010

Graphic Symbol Recognition using Graph Based Signature and Bayesian Network Classifier

We present a new approach for recognition of complex graphic symbols in ...
research
05/11/2021

One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition

Low resource Handwritten Text Recognition (HTR) is a hard problem due to...
research
09/08/2021

OSSR-PID: One-Shot Symbol Recognition in P ID Sheets using Path Sampling and GCN

Piping and Instrumentation Diagrams (P ID) are ubiquitous in several m...

Please sign up or login with your details

Forgot password? Click here to reset