A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis

05/22/2019
by   Linda Studer, et al.
0

Automatic analysis of scanned historical documents comprises a wide range of image analysis tasks, which are often challenging for machine learning due to a lack of human-annotated learning samples. With the advent of deep neural networks, a promising way to cope with the lack of training data is to pre-train models on images from a different domain and then fine-tune them on historical documents. In the current research, a typical example of such cross-domain transfer learning is the use of neural networks that have been pre-trained on the ImageNet database for object recognition. It remains a mostly open question whether or not this pre-training helps to analyse historical documents, which have fundamentally different image properties when compared with ImageNet. In this paper, we present a comprehensive empirical survey on the effect of ImageNet pre-training for diverse historical document analysis tasks, including character recognition, style classification, manuscript dating, semantic segmentation, and content-based retrieval. While we obtain mixed results for semantic segmentation at pixel-level, we observe a clear trend across different network architectures that ImageNet pre-training has a positive effect on classification as well as content-based retrieval.

READ FULL TEXT

page 1

page 2

research
04/11/2019

An Analysis of Pre-Training on Object Detection

We provide a detailed analysis of convolutional neural networks which ar...
research
12/16/2021

Lacuna Reconstruction: Self-supervised Pre-training for Low-Resource Historical Document Transcription

We present a self-supervised pre-training approach for learning rich vis...
research
01/20/2022

DIVA-DAF: A Deep Learning Framework for Historical Document Image Analysis

In this paper, we introduce a new deep learning framework called DIVA-DA...
research
04/05/2018

Identifying Cross-Depicted Historical Motifs

Cross-depiction is the problem of identifying the same object even when ...
research
03/15/2021

Generating Synthetic Handwritten Historical Documents With OCR Constrained GANs

We present a framework to generate synthetic historical documents with p...
research
07/14/2021

Synthesis in Style: Semantic Segmentation of Historical Documents using Synthetic Data

One of the most pressing problems in the automated analysis of historica...
research
03/16/2022

A Survey of Historical Document Image Datasets

This paper presents a systematic literature review of image datasets for...

Please sign up or login with your details

Forgot password? Click here to reset