DeepWriter: A Multi-Stream Deep CNN for Text-independent Writer Identification

06/21/2016 ∙ by Linjie Xing, et al. ∙ 0

Text-independent writer identification is challenging due to the huge variation of written contents and the ambiguous written styles of different writers. This paper proposes DeepWriter, a deep multi-stream CNN to learn deep powerful representation for recognizing writers. DeepWriter takes local handwritten patches as input and is trained with softmax classification loss. The main contributions are: 1) we design and optimize multi-stream structure for writer identification task; 2) we introduce data augmentation learning to enhance the performance of DeepWriter; 3) we introduce a patch scanning strategy to handle text image with different lengths. In addition, we find that different languages such as English and Chinese may share common features for writer identification, and joint training can yield better performance. Experimental results on IAM and HWDB datasets show that our models achieve high identification accuracy: 99.01 one English sentence input, 93.85 input, which outperform previous methods with a large margin. Moreover, our models obtain accuracy of 98.01 as input.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

This paper addresses the problem of automatic writer identification using off-line handwritten images. Handwriting is a kind of behavioural biometrics. Writer can be recognized by capturing specific characteristics of handwriting habbit of one author, which differ from other authors. [1] Writer identification has been applied in anti-crime and historic document analysis fields, which requires high level of domain expertise and heavy work.

Automatic writer identification aims to recognizing person based on his or her handwritten text. Researches in writer identification can be divided into two categories, off-line and on-line identification. On-line writer identification requires record the whole procedure of writing with special devices, thus the input is a time series of pen-tip positions, pressures, angles and other information about writing. On the other hand, off-line identification merely takes scanned images of handwritten text as input, which is usually more difficult [3].

Methods for off-line writer identification can be further categorized into two groups: text-dependent and text-independent. Text-dependent methods [18, 19, 20, 21] require input image with fixed text contents and which usually compares the input with registered templates for identification. In contrast with this, text-independent methods [1, 2, 4] dose not make assumptions on input content and have broader applications. However, compared with text-dependent one, text-independent writer identification needs to deal with image with arbitrary texts which exhibits huge intra-category variations, therefore, and is much more challenging. Figure 1 and Figure 2 shows several examples of handwritten English and Chinese by different writers. As can be seen, the main difference between two handwritten images is dominated by the text contents. For writer identification, one needs to extract abstractive written style features and fine details which reflect personal writing habits. This poses a great challenge for current handcrafted features which usually capture the local shape and gradient information. These handcrafted features may include both information of written contents (text) and written styles (person), which may limit their performance on this task.

(a) Two English text lines written by writer 009 from IAM dataset
(b) Two English text lines written by writer 010 from IAM dataset
Figure 1: Different writer examples from IAM dataset
(a) Two Chinese characters written by writer 1001 from HWDB dataset
(b) Two Chinese characters written by writer 1002 from HWDB dataset
Figure 2: Different writer examples from HWDB dataset

To address this challenging problem, this paper leverages deep CNNs (Convolutional Neural Network) as a powerful model to learn effective representations for off-line text-independent writer identification. Deep CNNs have demonstrated its effectiveness in various computer vision problems by improving state-of-the-art results with a large margin, including image classification

[5, 6, 7], object detection [8, 9]

, face recognition

[10, 11], handwriting recognition [12] etc. We propose DeepWriter, a multi-stream CNN, for extracting writer-sensitive features. DeepWriter takes multiple local regions as input and is trained with softmax loss on identification. The main contributions are three-folds. Firstly, we design a multi-stream structure and optimize its configuration for writer identification task. Secondly, we introduce data augmentation to enhance the performance of DeepWriter. Finally, we introduce a patch scanning strategy to handle handwritten image with various lengths. We evaluate the proposed methods on IAM dataset [15] and HWDB1.1 dataset [14]. Our methods achieves high identification accuracy of on 301 writers, on 657 writers from the IAM dataset on English sentence level, and on 300 writers from HWDB1.1 dataset on Chinese character level, which outperforms previous state-of-the-art. Interestingly, our results also show that handwritten texts of different languages such as English and Chinese may share common features for writer identification, and pretraining CNNs on another language can lead to better performance.

Ii Related Works

Writer verification is similar to writer identification. Writer verification system [22, 23, 24, 1] performs one-to-one comparison and determines whether or not the two input example are written by the same writer. Writer identification system [1, 2] performs a one-to-many search in a large database with handwriting samples of known authorship and returns a likely list of candidates. Writer verification performs two-class classification, while writer identification performs multi-class classification. [25] investigates how much handwritten text is needed for text-independent writer verification and identification. Experimental result in [25] demonstrates that, given the same number of handwritten characters, verification systems achieve lower error rate than identification systems with identical feature. Therefore, writer identification system is more ambiguous and difficult.

Methods proposed previously generally follow the pipeline of pre-processing, feature extraction and feature matching or classification, and mainly focus on feature extraction. In

[1]

, Bulace et.al. combined multiple features (directional, grapheme, and tun-length) and used probability distribution functions (PDFs) extracted from the handwriting images to characterize writer individuality, achieving an identification accuracy of

on 650 writers from IAM dataset on page level. In [2], Jain et.al. used K-adjacent segments (KAS) features to model character contours, achieving an identification accuracy of on 300 writers from IAM dataset on page level. These methods depend on features defined by humans, which has been shown can be learned automatically by deep CNN. We believe that with integrated training and overall optimization, deep CNN can learn to extract appropriate features to this task and outperform traditional methods.

[3] leverages CNN to identify writer. [3] address the problem of on-line text-independent writer identification. [3] leverages on-line writing information and deep CNNs to obtain accuracy of on 187 writers with Chinese page input, and on 134 writers with English page input on CASIA Handwriting Database [16]. In contrast, this paper address the problem of off-line text-independent writer identification which is more general and difficult. This paper feeds the model with merely scanned gray-scale handwritten image, and learns effective representation with carefully designed deep CNN model, leading a more simplified and elegant method.

Iii DeepWriter

Figure 3: Image patches cropped from IAM dataset
Figure 4: Network structure of DeepWriter. The boxes with ConvX denote convolutional layers. The like notation specifies that the convolutional layer filters the input with kernels of size

with a stride of

pixels and a padding of

pixels. The boxes with MP

denote max-pooling layers. The

like notation specifies that the max-pooling layer performs max-pooling operation in a neighbourhood of size with a stride of pixels. The boxes with FCX

denote fully-connected layers, and the followed number specifies the number of neurons. The

Sum box denote element-wise sum operation. The Softmax

denote softmax classifier. All convolutional layers and fully-connected layers are followed by Rectified Linear Unit layer(ReLU).

FC6 and FC7 are followed by dropout layer with ratio=0.5 to prevents overfitting.
Figure 5: Network structure of Half DeepWriter

This section will firstly introduce the design of the multi-stream structure of DeepWrite and discuss how to preprocess the input image with various lengths as input for DeepWrite. Then we will describe the training and testing process with implementation details.

Iii-a Multi-Stream

Our basic network structure is similar to AlexNet structure [5], as depicted in Figure 5. In this paper, we denote this basic network structure as Half DeepWriter. Half DeepWriter takes as input a image patch. Input handwritten text images for identifying author are with various height and width. In particular, English sentence handwritten image are usually with high aspect-ratio, whose width is much bigger than its height. Resizing input image to fixed size distorts the the shape of handwriting, leading serious information loss. We thus employ a patch scanning strategy to address this problem. The patch scanning strategy is detailed below. However, scanning ignores spatial relationships between these image patches, which contains important information to determine the writer. On the other hand, it is expensive to keep complete spatial relationships between all image patches of input scanned handwritten image. As a trade-off, we leverage relationship between two adjacent image patches, leading to DeepWriter structure. The network structure of DeepWriter is depicted in Figure 4. DeepWriter takes as input a pair of image patches. Patch 2 is adjacent to Patch 1, as depicted in Figure 6. Out1 and out2

, output vectors of

FC7 of DeepWriter, are merged by element-wise sum operation. Detailed configuration of DeepWriter is specified in the caption of Figure 4. The number of model parameters in DeepWriter is the same as that in Half DeepWriter. Therefore, DeepWriter dose not increase the risk of overfitting, requiring the same size of training data size as Half DeepWriter. We experimentally demonstrate that considering spatial relationship between image patches benefits writer identification. The comparison between DeepWriter and Half DeepWriter on 301 writers from IAM dataset with English sentence handwritten text as input is shown in Table 1.

Model Accuracy
Half DeepWriter 98.23%
DeepWriter 99.01%
TABLE 1: Comparison between DeepWriter and Half DeepWriter

Iii-B Patch Scanning Strategy

Firstly, we resize the image so that min(w,h)=113 while maintaining its aspect ratio. Secondly, image patches are cropped from the resized image. Finally, image patches for testing are uniformly sampled from these cropped image patches with a specific ratio. The sample ratio in this paper is set to 20% with Chinese character input and 10% with English sentence input

Iii-C Kernel Size

Conv1 and Conv2 layers of DeepWriter and Half DeepWriter filter their input with smaller kernels with smaller stride compared to that of AlexNet. This structure adjustment is inspired by the observation that AlexNet fed with image patch degrades identification accuracy. Therefore, we decrease the kernel size and stride step of Conv1 and Conv2 layers to handle more image details. This network structure adjustment also decreases the number of parameters, thus decreasing the risk of overfitting. The comparison between AlexNet and its variants on 301 writers from IAM dataset with English handwritten image patch as input is shown in Table 2.

Patch size Configuration Accuracy
Conv1:
Conv2:
Conv1:
Conv2:
Conv1:
Conv2:
91.35%
TABLE 2: Kernel size comparison

Iii-D Neuron Number

Comparing to AlexNet, FC6 and FC7 layers of DeepWriter and Half DeepWriter have less neurons. The size of training data and number of classes of this task are smaller than those of ILSVRC [13]. Therefore We believe that appropriate neuron number reduces the risk of overfitting. We chose the number of neurons of FC6 and FC7 through contrast experiment on validation set, varying neuron number of Half DeepWriter, on 301 writers from IAM dataset with English handwritten image patch as input. Experiment result is shown in Table 3. We finally set the neuron number of FC6 and FC7 layers of DeepWriter and Half DeepWriter to 1024.

Neuron number Accuracy
4096
1024 92.15%
512
TABLE 3: Neuron number comparison

Iii-E Feature Sharing

We also observe that handwritten images of different languages share some common features for identifying writers. On IAM dataset, we finetune DeepWriter from Half DeepWriter model pretained on HWDB1.1, whose data size is much bigger than IAM dataset. On HWDB1.1 dataset, we finetune Half DeepWriter from the above DeepWriter model. Table 4 shows comparison between whether joint training or not.

Dataset Train Accuracy
IAM Pretrained on HWDB 99.01%
IAM Trained directly on IAM 98.80%
HWDB1.1 Pretrained on IAM 93.85%
HWDB1.1 Trained directly on HWDB1.1 93.45%
TABLE 4: Benefit from joint training

Iii-F Training Details

We augment training data by resizing the shorter edge of input image to 113 with original aspect ratio and then randomly cropping image patches from the input image. It is important to keep the original aspect ratio which contains important information of handwriting habits for identifying writer. The identification accuracy degrades seriously when the input image is distorted.

Firstly, the Half DeepWriter was trained on HWDB1.1 dataset. We trained Half DeepWriter using mini-batch gradient descent. The batch size was set to 256, momentum to 0.9, and weight decay to . The learning rate was initialized at , and then decreased by a factor of 10 every iterations. The learning was stopped after 400K iterations.

Secondly, the DeepWriter for IAM dataset was fintuned from Half DeepWriter model pretained on HWDB1.1 dataset. The batch size was set to 256, momentum to 0.9, and weight decay to . The base learning rate was initialized at

, and then decreased by a factor of 10 every 20K iterations. The learning was stopped after 40K iterations. The learning rate of softmax layer correlated to specific dataset was set to tenfold larger than base learning rate.

Finally, the Half DeepWriter was finetuned from the above DeepWriter model in the same way as that of training directly.

Iii-G Testing Details

Given a scanned handwritten image, the testing procedure follows this pipeline: scan the image to generate image patches following the strategy presented above; input image patch pair or image patch into DeepWriter or Half DeepWriter to compute score vector ; compute final score of writer , where denotes the number of image patches; return the writer with highest score. Noting that the score vector outputted by DeepWriter can be treated as a probability distribution over all writers, we thus average score vectors of all image patch pairs or image patch to construct the final prediction of input image. The testing pipeline is depicted in Figure 6.

Figure 6: Pipeline of testing. Stream 1 and Stream 2 share the same parameters.

Iv Experiments

Iv-a Data sets

The IAM dataset (version 3.0) [15] contains unconstrained handwritten English text from 657 different writers, using different pens. Handwritten pages in IAM dataset were scanned at a resolution of 300dpi and saved as PNG images with 256 gray levels. IAM dataset contains 1,539 pages of scanned text which contains 5,685 isolated sentences. 301 writers contribute more than 1 page of scanned text. In this paper, we train, validate and test in sentence images. Sentence images contributed by each writer are divided into training set, validation set and testing set according to the ratio 4 : 1 : 1.

The HWDB1.1 dataset [14] contains handwritten Chinese text from 300 different writers, which were scanned at a resolution of 300dpt and saved with 256 gray levels. HWDB1.1 contains 1,172,907 Chinese character images. Each writer contributes about 3,755 different Chinese characters. The Chinese character images contributed by each writer are divided into training set, validation set, and testing set according to the ratio 4 : 1 : 1.

Iv-B Experimental Results

We use the off-the-shelf resource Caffe

[17] to train our Half DeepWriter and DeepWriter. Our Half DeepWriter achieves identification accuracy of on 300 writers with merely one Chinese character input. Our DeepWriter achieves identification accuracy of on 301 writers from IAM dataset on English sentence level, on 657 writers from IAM dataset on English sentence level. In addition, DeepWriter achieves identification accuracy of When given two adjacent English handwritten image patches, which usually cover 2 to 3 English alphabets. DeepWriter taking as input three adjacent image patches, which usually cover 3 to 4 English alphabets, achieves identification accuracy of . Experimental results above demonstrate that our models can obtain high identification accuracy with little handwritten text input.

We summarize experiment results of our method and several published writer identification methods in Table 5. [1, 2, 27, 26, 28, 29]

follow the classic pipeline to address off-line writer identification problem: propose and combine multiple handcrafted features; employ Euclidean, cosine or trained SVM(Support Vector Machines) as similarity metric; perform nearest neighbour search to compute writer of input handwritten image.

[3] employs Deep CNNs to address on-line writer identification problem, as summarized in RELATED WORKS section. Our method outperforms previous start-of-art methods a large margin. DeepWriter achieve similar identification accuracy with much less input text. In addition, DeepWriter only need to store the trained model for test, without storing big reference data set. Because DeepWriter dose not need to perform heavy search computation, the test procedure is fast.

Year Input type Dataset Language Number of writer Input text for test Accuracy
DeepWriter 2016 off-line IAM English 301 1 sentence 99.01%
DeepWriter 2016 off-line IAM English 657 1 sentence 97.3%
DeepWriter 2016 off-line IAM English 301 about 3 alphabets 96.92%
DeepWriter 2016 off-line IAM English 301 about 4 alphabets 98.01%
Half DeepWriter 2016 off-line HWDB1.1 Chinese 300 1 character 93.85%
Bulacu et al. [1] 2007 off-line IAM English 650 1 page 89%
Jain et al. [2] 2011 off-line IAM English 300 1 page 93.3%
Jain et al. [2] 2011 off-line IAM English 650 1 page 92.1%
Brink et al. [27] 2012 off-line IAM English 657 1 page 97%
Bertolini et al. [26] 2013 off-line IAM English 650 1 page 96.7%
He et al. [28] 2015 off-line IAM English 650 1 page 91.1%
Hannad et al. [29] 2016 off-line IAM English 657 6 text lines at most 89.54%
Yang et al. [3] 2015 on-line CASIA Handwriting Database English 134 1 page 98.51%
Yang et al. [3] 2015 on-line CASIA Handwriting Database Chinese 187 1 page 95.72%
TABLE 5: Comparison with text-independent writer identification methods

V Conclusion and Future Work

In this paper, we introduce a novel data-driven text-independent model to identify writer for off-line handwritten scanned images. We learn a carefully designed deep Convolutional Neural Network to extract discriminative features from handwritten image patches. We investigate how the network structure affects identification accuracy and introduce multi-stream structure to leverage spatial relationship between handwritten image patches. We also investigate the appropriate method to augment training data for writer identification. We achieve high identification accuracy even merely taking as input one Chinese character or 4 English alphabets. In the future, we will investigate the off-line text-independent writer verification task with discriminative features extracted by DeepWriter. We will also investigate multi-task learning of identification and verification.

References

  • [1] Bulacu M, Schomaker L. Text-independent writer identification and verification using textural and allographic features[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2007, 29(4): 701-717.
  • [2] Jain R, Doermann D. Offline writer identification using k-adjacent segments[C]. Document Analysis and Recognition (ICDAR), 2011 International Conference on. IEEE, 2011: 769-773.
  • [3] Yang W, Jin L, Liu M. DeepWriterID: An End-to-end Online Text-independent Writer Identification System[J]. arXiv preprint arXiv:1508.04945, 2015.
  • [4] Li B, Sun Z, Tan T. Online text-independent writer identification based on stroke’s probability distribution function[M]. Advances in Biometrics. Springer Berlin Heidelberg, 2007: 201-210.
  • [5]

    Krizhevsky, A., Sutskever, I., Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).

  • [6] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
  • [7] He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition[J]. arXiv preprint arXiv:1512.03385, 2015.
  • [8] Girshick R. Fast r-cnn[C]. Proceedings of the IEEE International Conference on Computer Vision. 2015: 1440-1448.
  • [9] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]. Advances in Neural Information Processing Systems. 2015: 91-99.
  • [10]

    Sun Y, Wang X, Tang X. Deep learning face representation from predicting 10,000 classes[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 1891-1898.

  • [11] Sun Y, Liang D, Wang X, et al. Deepid3: Face recognition with very deep neural networks[J]. arXiv preprint arXiv:1502.00873, 2015.
  • [12] Ciresan D, Meier U, Schmidhuber J. Multi-column deep neural networks for image classification[C]. Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012: 3642-3649.
  • [13] J. Deng, A. Berg, S. Satheesh, H. Su, A. Khosla, and L. Fei-Fei. ILSVRC-2012, 2012. URL http://www.image-net.org/challenges/LSVRC/2012/.
  • [14] C.-L. Liu, F. Yin, D.-H. Wang, Q.-F. Wang, CASIA online and offline Chinese handwriting databases, Proc. 11th International Conference on Document Analysis and Recognition (ICDAR), Beijing, China, 2011, pp.37-41.
  • [15] U. Marti and H. Bunke. The IAM-database: An English Sentence Database for Off-line Handwriting Recognition. Int. Journal on Document Analysis and Recognition, Volume 5, pages 39 - 46, 2002.
  • [16] CASIA Handwriting Database, http://biometrics.idealtest.org/dbDetailForUser.do?id=10
  • [17] Jia Y, Shelhamer E, Donahue J, et al. Caffe: Convolutional architecture for fast feature embedding[C]. Proceedings of the ACM International Conference on Multimedia. ACM, 2014: 675-678.
  • [18] H. Said, T. Tan, and K. Baker, “Personal Identification Based on Handwriting,” Pattern Recognition, vol. 33, no. 1, pp. 149-160, 2000.
  • [19]

    H. Said, G. Peake, T. Tan, and K. Baker, “Writer Identification from Non-Uniformly Skewed Handwriting Images,” Proc. Ninth British Machine Vision Conf., pp. 478-487, 1998.

  • [20] T. Tan, “Rotation Invariant Texture Features and Their Use in Automatic Script Identification,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 7, pp. 751-756, July 1998.
  • [21] Y. Zhu, T. Tan, and Y. Wang, “Font Recognition Based on Global Texture Analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 10, pp. 1192-1200, Oct. 2001.
  • [22]

    Leclerc F, Plamondon R. Automatic signature verification: The state of the art—1989–1993[J]. International Journal of Pattern Recognition and Artificial Intelligence, 1994, 8(03): 643-660.

  • [23] Srihari S N, Beal M J, Bandi K, et al. A statistical model for writer verification[C]. Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on. IEEE, 2005: 1105-1109.
  • [24] Hafemann L G, Sabourin R, Oliveira L S. Writer-independent Feature Learning for Offline Signature Verification using Deep Convolutional Neural Networks[J]. arXiv preprint arXiv:1604.00974, 2016.
  • [25] Brink A, Bulacu M, Schomaker L. How much handwritten text is needed for text-independent writer verification and identification[C]. Pattern Recognition, 2008. ICPR 2008. 19th International Conference on. IEEE, 2008: 1-4.
  • [26] Bertolini D, Oliveira L S, Justino E, et al. Texture-based descriptors for writer identification and verification[J]. Expert Systems with Applications, 2013, 40(6): 2069-2080.
  • [27] Brink A A, Smit J, Bulacu M L, et al. Writer identification using directional ink-trace width measurements[J]. Pattern Recognition, 2012, 45(1): 162-171.
  • [28] He S, Wiering M, Schomaker L. Junction detection in handwritten documents and its application to writer identification[J]. Pattern Recognition, 2015, 48(12): 4036-4048.
  • [29] Hannad Y, Siddiqi I, El Kettani M E Y. Writer identification using texture descriptors of handwritten fragments[J]. Expert Systems with Applications, 2016, 47: 14-22.