Handwritten script recognition constitutes the problem of identification of the script that a particular text document has been written in. The basis for script recognition is the unique spatial relation that strokes of a particular script have with each other, that makes it possible to distinguish the scripts from one another. Most prevalent handwritten text recognition systems are language dependent, while many digital documents and images using multiple scripts exist, Fig. 1, especially in geographical regions using multiple scripts. This makes handwritten script recognition an important first step towards automated interpretation of handwritten text documents [8, 33].
Recurrent Neural Networks (RNNs) have proven to be very effective [11, 30, 21, 34] for handwriting recognition tasks. An RNN is a brand of Artificial Neural Networks in which connections between the hidden nodes can form directed loops. This architecture comprises delay functions on these loops enable the Neural Network to have an internal state or memory. This looping architecture allows RNNs to process arbitrary lengths of input and produce arbitrary lengths of output. For training RNNs we use rnnlib .
To train our recognition model, the input can be presented as online or offline data. Online data comprises the trace/strokes of a pen on a recording screen as separate characters and words are written, for character and word level data respectively. Offline data comprises of images. At the character level, these are images of characters of a script and similarly at the word level, these are images of words. Online data possesses spatio-temporal information which proves to be beneficial for our script recognition task.
Existing models for script recognition focus on extracting linguistic and/or statistical models for script recognition at the word or the text-line level [3, 7, 14, 25, 8, 33]. In contrast, our model is based on the hypothesis that the curves of online character level strokes of each script comprise sufficient distinctive features for the script recognition at the word level as well. The curves are represented by the spatio-temporal information obtained as the text is written. Online character level training data is much lighter than word level data thus providing the benefit of faster training. It further reduces the requirement of larger word level datasets. This is highly applicable in a country such as India, which has a diverse set of scripts, in order to bootstrap script recognition with smaller amounts of data. We stress on the models trained using just the character level data and test them for word level data.
To extend our online models for prediction of offline data, we use Stroke Recovery. A Stroke Recovery method is one which takes as input a text image and outputs the possible order of strokes that are required to write the character [9, 13, 16]
. In other words, it extracts the temporal information of the handwritten data. Training of offline data demands larger datasets and training time vis-a-vis that of online data. Thus, we developed a Stroke Recovery method to obtain a possible sequence of pen strokes where each stroke is a sequence of pixels the pen would cover. We then classify the output into the appropriate script using recognition models obtained by training using the online data.
The main contributions of this paper : (i) evaluate script recognition using spatio-temporal features (ii) evaluate the model for 5 scripts at the character-level and apply character-level models for word-level prediction for 4 distinct scripts, (iii) apply the same to offline word level recognition and proposes clubbed model implementations.
The rest of the paper is structured as follows. The second section discusses recent works for script recognition and stroke recovery. The third section describes the methodology followed. The fourth section details the datasets used and details the results of all experiments conducted. The fifth section discusses possible sources of error in the results and the last section covers conclusions, followed by the references.
2 Recent Works
Handwritten Script recognition is an important component for recognition of documents written in more than a single script and comprehensive surveys are [8, 33]. Script identification from printed documents is not as complex and difficult as because of the various writing styles of individuals in handwritten data. Various works have been reported on printed document script identification .
A popularly explored approach to Handwritten script recognition is to extract linguistic and/or statistical features and club it with an SVM model. In , such solution for word-level offline classification is proposed for Thai-Roman script classification. It exploits linguistic features such as loop width feature, component overlapping feature and others. They used a standard SVM and a Gaussian Kernel SVM classifier and reported a best accuracy for Thai-Roman classification at 99.62%.
A technique for script identification in torn documents is proposed by Chanda et al.  in which Roman and Indic scripts are considered. The authors work with rotation Invariant Zernike features and the rotation Dependent Gradient Feature, using PCA-based methods to predict orientation and then apply an SVM classifier at the character level. The results are calculated for the word level using majority voting at the character level, followed by prediction at the document level in a similar fashion. At the character level the results are 81.13% and 71.33% and at the document level, 98.33% and 96.7% for Gradient and Zernike features respectively. Compared to script recognition at the paragraph level, recognition at the character level is more difficult. However, it enjoys the benefit of smaller datasets and faster training times. In our approach, the system learns the curves from the character level and is applied directly on the word. It does not use majority voting or prediction of the segmented characters for word level prediction.
An application by Moalla et al. [23, 22] was proposed for extraction of Arabic text from documents containing Arabic and English words. The method proposed in  was to match a trained Arabic template dataset with characters in the test set. The accuracy was 100% on a total of 478 words in the test set. The method in  was based on feature (morphological and statistical) matching and the recognition accuracy was 98%. Unlike these works, our approach does not use linguistic or statistical features, instead utilizing spatio-temporal curves in online data.
In , Ferrer et al.
propose a method to train several classifiers for script recognition models used for offline word level script recognition in domains where more than one script could be used in the same sentence. They use a word information index (wii) to estimate the amount of information included in a word. Different classifiers are trained using sets of words with different amounts of information. During recognition, the wii for the test word is calculated and the corresponding classifier is used. The results are reported across a set of 6 wii information classifiers, with the classifier with least amount of information reporting 72.92% accuracy and one with the most information reporting 99.49%.
KNN based techniques have also been explored by various works for offline script recognition as discussed in . In , Regional local features belonging to each script to detect the different scripts are used. In 
, to identify eight major scripts, namely Latin, Devanagari, Gujarati, Gurumukhi, Kannada, Malayalam, Tamil, and Telugu at block level, a scheme based upon features extracted using Discrete Cosine Transform (DCT) and Wavelets is used. A KNN classifier is then employed for the identification. In and , different approaches for script identification using texture features are used. In , The texture features are extracted using the co-occurrence histograms of wavelet-decomposed images. The correlation between the sub-bands at the same resolution exhibits a strong relationship and a KNN classifier is used for the identification of scripts.
Neural networks based solutions are also popular for script identification, discussed in . In one of the earliest works, neural nets were employed for script identification in postal automation systems [29, 28, 26, 27]. In , a method for locating address-block and extracting postal code from the address that had been written in more than one script is proposed. In , a two-stage neural network-based general classifier is used for the recognition of postal code digits written in Arabic or Bengali numerals and the final assignment of script class is done in a second stage using majority voting. Methods for word-wise script recognition in postal addresses using features like the water reservoir concept, headline etc., in a tree classifier were proposed in . Further along this idea, a two-stage MLP network for script identification was proposed in .
Though MLPs are trainable, a huge amount of parameters make them harder to train. In 
, a combined architecture using Convolutional neural networks followed by RNNs followed by a fully connected layer is proposed for script identification in images. This end-to-end architecture aims to first extract the image features, followed by an Bi-RNN to learn the arbitrary output for script identification and evaluated on SIW-13 and CVSI2015. In , BLSTM is used for printed Devanagri Script recognition. For every word 5 different features are extracted (a) the lower profile, (b) the upper profile, (c) the ink-background transitions, (d) the number of black pixels, and (e) the span of the foreground pixels. These are then passed through a Bi-RNN architecture using Connectionist Temporal Classification objective function leading more the 9% WER improvement. In , a 1D-LSTM architecture, with one hidden layer is used for script identification at the text-line level to learn binary script models, and the prediction accuracy for English-Greek identification obtained is 98.19%. In our approach, the system is using RNNs for character level modeling and using this model for prediction at the word level.
When working on offline script recognition, an approach is to extract the initial strokes. In 
, Elbaati et al. proposed an approach to stroke recovery by first segmenting the image into strokes and labeling all the edges as segments or parts of strokes. They run a Genetic Algorithm to optimize these strokes and produce the best possible segment order. An application of the above developed method is used in, which combines offline and online data for recognition. The offline features are complemented by the temporal data from the extracted strokes, which is a future possibility for our work as well.
In , Kato et al. propose a stroke recovery technique which works for single stroke characters. The system labels each edge in the image and bridges them in an algorithmic manner without the use of any Learning Methods. Our approach for stroke recovery is similar in approach and we extend the system to account for multiple strokes in the input image.
Our recurrent neural network uses the BLSTM architecture and Cross Entropy Error for the objective function optimized by the network. This section details our model, the training and testing method followed by stroke recovery.
BLSTM is a recurrent neural network architecture which comprises of Long short-term memory[15, 11]
layers and bidirectional connections. The LSTM nodes have an internal architecture primary comprised of three gate nodes. At each epoch, the input gate determines the contribution of the incoming value to the retained value, the forget gate determines the contribution of the previous retained value towards the next value and the output gate controls the contribution of the retained value towards the output of the node. This allows the memory cell to preserve its state over a long range of time and to model the context at the feature level.
The 1D sequence recognition is improved by processing the input signal in both directions, i.e., one layer processes the signal in the forward direction while another layer processes it in the backward direction. This also helps the prediction accuracy when predicting the for data obtained from stroke recovery, to account for strokes recovered in the backward direction. The output of both layers is combined at the next layer as a feature map. Using rnnlib , it is possible to have multiple forwards and backward layers in each LSTM layer as well as multiple feature maps at the output layer, and to stack multiple LSTM layers using sub-sampling.
3.2 Cross Entropy Error
When Cross Entropy Error is used for the objective function in a binary network, the error is:
Where is the target class (0 or 1), and
can be understood as the probability3 that the input belongs to a class. Details can be found in .
When extended to multi-class networks with k classes, the error function becomes:
Similarly, Extending (2) for multiple classes.
3.3 Training using Character-Level Data
As detailed Figure LABEL:fig:flow_chart, the first step to train our models using online character level data. These are then
sequentially tested on character level and word level data. Further we extend the predictions using the same model to
offline data by using stroke recovery.
3.3.1 Online character based approach
Each character in the raw online data was presented as a sequence of strokes. We appended the data together, as a series of vectors, each vector with three values. The first and the second values are the x and y coordinate of the pixel respectively and the third value is 1 or 0, depending on whether the pixel is the first pixel of a stroke, or not. The datasets were obtained from various different sources and recording devices. This prompted a need for a universal Normalization Technique for the vectors in the training data, to eliminate features specific to the dataset and retain only features of the script. The following standardization is used:
Where vector, is the array of three values and and
are the mean and the standard deviation of all the vectors in the particular dataset. Subsequently, the system was extended to 5 languages by adding the datasets for Bengali, Tamil and Telugu languages to the training.
The same approach was applied to word level online data, which was also collected from various different datasets. First we tested the word level dataset on the models trained using the character level data followed by training a separate word level model for benchmarking. The test results for both are reported.
3.3.2 Offline character based approach
The offline model was trained using the raw pixel data of the offline images. We trained for Hindi, Bengali, Tamil and English and normalized the datasets by resizing the height of the character image to a constant value, while retaining the aspect ratio of the original image. Different datasets present images with different stroke widths. This variation is removed systematically by first thinning the image to a unit width image followed by thickening the stokes to a uniform width, all using the Fiji  application. This approach was limited to the character level as it was very data intensive and slow to train. For classification of word images, we developed a mechanism for stroke recovery.
3.4 Offline to Online conversion - Stroke Recovery
To utilize neural networks trained using online character level data for offline data prediction, we need to extract the the temporal data. Thus, we extract stroke information from offline data using the following form of stroke recovery. The first step is to obtain the skeleton of the provided binary image dataset for which we used Fiji’s  skeletonize functionality. Then the main module calculates the critical points of the skeleton, which comprise the endpoints, points connected to only one other point, and the junction points, points connected to more than two points i.e. they lie on at least two strokes. in a given image.
The method then reduces each junction point and its neighboring junction points into one joint junction area, by connecting each junction point in a neighborhood with each of the outlets of each of the neighboring Junction points Fig. 4.
To begin stroke recovery, our method needs to select a suitable start point and it utilizes a straight-forward strategy to select the endpoint closest to the top most corner of the image as the start point. This is based on the insight that the scripts we are working on are written in the left to right order and the strokes begin from the top most corner. Demerits of this strategy are discussed in the final section under error analysis.
On reaching a junction area during the stroke recovery process, the system calculates the slope of the incoming curve and the slope of each of the exiting curves from the junction area. Our strategy is based on maintaining continuity and avoiding jerks to recover the most probable path of the stroke. The slope based selection method selects the outgoing curve that has the slope closest to that of the incoming curve Fig. 4(a), thus ensuring the system continues along the outlet that best maintains the continuity.
In case the image contains multiple and disjoint strokes, we might reach an endpoint in the stroke recovery method before all the points have been covered. This implies that all reachable points have been covered and prompts a need to select the next start point to continue stroke recovery and cover the rest of the strokes. The selection of the next start point needs to be made from the set of all calculated endpoints and the neighbors of the junction points encountered thus far. We experimented with two strategies here, the first is to prioritize selection from the end points over the neighbors of the junction points. Further, among the endpoints, the ones closest to the top left corner are given higher priority. The second approach is to prioritize the neighbors of the junction points over the end points the priority, where among the neighbors, the priority is given in a first come first serve basis. The latter approach outperforms the former in terms of accuracy of predictions made on the strokes retrieved.
3.5 Combination of Online Models
Two RNN instances trained using the same data might converge to different local minimum. A combination model approach attempts to club the results of different models to attempt to lead to an improvement in overall prediction. For a test word, the model takes the predictions by different networks and the cross entropy error for each prediction. If the predictions mismatch, it computes the error if both networks had predicted the same value. For a binary model, the error for prediction Class when error for is present can be calculated using 1.
The model chooses the prediction with the lower error value as the final output for a data point and was used to attempt to improve predictions made by the networks on word level data recovered by stroke recovery.
4 Datasets and Experiments
The source for the online data, Table 1, for the English language at the word level is IAM Online  and for the character data is the Chars74K dataset . For Hindi, Tamil and Telugu languages, the online character and word data is obtained from the Lipi Toolkit, HP Labs . For the Bangla script, the online character and word data is obtained from CMATERdb  and the Assamese online character data is obtained from the UCI Repository .
|Script Name||Data Level||Data Set Size||Reference|
The source for the offline data, Table 2, for the English language at the word (line) level is IAM Dataset  and at the character data is the Chars74K dataset . For Hindi and Bangla the offline character and word data is obtained from .
4.2 Experiments on Online Data
This subsection covers the results for models trained with online character level data predicting online character and word level, Table 3. The RNN models architectures used 3 hidden layers, all with lstm nodes, using a gradient descent optimizer with a momentum set at 0.9, a learning rate of , using the cross entropy error function and with stopping value for required improvement in the best result set at 20.
|Eng-Hindi Model||Test-Char level||Test-Word Level|
|RNN Char Model||99.75%||100%|
|RNN Word Model||-||100%|
|HMM Char Model||95.70%||77.66%|
|HMM Word Model||-||98.60%|
|Perceptron Char Model||77.38%||56.6%|
|Perceptron Word Model||-||94.10%|
|Many-language Model||5-Lang Char||4-Lang Word|
|5-Lang RNN Char Model||99.74%||99.82%|
|4-Lang RNN Char Model||-||99.7%|
In the first script recognition task, handwritten character level online data was used to train a RNN for two languages, Hindi and English. The input provided had 2000 instances of online character level data from each of the languages and the model was able to converge within two hours. The accuracy obtained on a test set of 2000 unseen instances from each language was 96.95%.
HMM and Structured Perceptron models are trained with the input data comprising 2000 instances and used to set up the baseline for this and all further experiments. In the HMM model, an individual HMM was trained for each label class at training time using hmmlearn Toolkit. For a test sequence, the probability of the sequence belonging each of the trained classes is calculated and the label with the highest probability is assigned to the test sequence. HMMs converge to a local solution depending on a random initial state and so we found the optimal combinations of HMM models of each language by calculating the result of different combinations of models on a validation set. The Structured Perceptron is discussed in  and we use its implementation in the seqlearn Toolkit.
We then trained RNN models for these two scripts using larger amounts of data, 12500 instances of each of the languages. The network took longer to train, of the order of a day, and predicted with a 99.75% accuracy on the same test set used above. The error on the validation set per training epoch is seen in Fig 6.
The script recognition of the word level handwritten online data was then tested on the above character level network. The test set selected comprised of 2250 test cases of each language and the accuracy of prediction was 100% for the binary English-Hindi. This result demonstrated the basic hypothesis of this paper. To compare these results with models trained using online word data, we then trained word level Eng-Hin model, which required a considerably longer training time, of the order of a week. The accuracy of the word-level model was 100%. Results with the Baseline models are detailed in Table 3.
Thereafter, we trained four Indic scripts, Hindi, Bengali, Telugu and Tamil and one Latin script, English, using 2000 samples from each of the languages. The prediction accuracy on a test set of 4000 unseen data points of each language was 99.74%. We tested the prediction of word level data for four languages using the trained character level network as the Telugu on-line word dataset was not available. The prediction accuracy on a test set with 2000 word level instances of each language was 99.8%. The model trained with online word data for the four languages reported an accuracy of 99.7%.
|Recovered Strokes - Char Data||Prediction - online Model|
|Recovered Strokes - Word Data||Prediction - online Model|
|Eng-Hin RNN Clubbed||75.2%|
4.3 Experiments on Offline Data
We discuss results of the pixel based RNN model, on stroke recovered offline data and the final results on clubbing results from more than one online models.
4.3.1 Offline character training
At the character level, an RNN based model was trained for English, Hindi, Bengali and Tamil scripts using the normalized raw pixel data as the initial input. The model was trained using 550 instances of each language to obtain an 80.8% on a test set comprising 250 instances of each language.
|Images||Offline Model||Online Model||Combination of Models|
4.3.2 Script Recovery and recognition
We retrieved strokes from the character and word images using the stroke recovery method and tested the prediction results from networks trained with online character level data. Our experiments were restricted to binary models. At the character level, the test set to contained strokes retrieved from 250 character images from each of the languages. The English-Hindi model obtained the highest prediction accuracy of 89.2%.
The same tests were repeated for word-level classification for two models, English-Hindi and English-Bangla with test sets of 250 word images from each of the languages. The RNN models obtained prediction accuracies of 75.6% and 72.9% respectively. Detailed results along with the performance on the baseline models is in Table 4. Qualitative results can be seen in Table 5 and Fig. 8.
In another experiment, we trained the Baseline Perceptron models using the Stroke Recovered data for English and Hindi data, using 700 training data points and tested on 250 instances to obtain an accuracy of 84.64%. Using the same technique as above, we used this model for word-level classification to get 59.91% as the accuracy.
4.3.3 Combination of Online models for Offline prediction
For the combination of models, the system was provided with the model predictions of two separate networks on a given test set for two English-Hindi networks. One of these was biased towards the Hindi data and obtained an initial prediction accuracy of 70.9% accuracy for a given test set and a second network obtained an initial prediction accuracy of 75.6%. The Clubbed model obtained an overall prediction accuracy of 75.2%. As seen in Table 5, it improved the result for the shown data points.
5 Error Analysis
A major discrepancy is seen in the results between the predictions for online and offline data, both at the character and at the word level. A possible cause of this discrepancy is in the stroke recovery method. The method uses heuristics for tasks such as selection of a start point for the stroke recovery process which might be erroneous in some cases. An example is in Fig.7. The continuity model used in the stroke recovery algorithm can be further improved by using contour information of the incoming curve and outgoing paths at the junction points for better outlet selection.
A second discrepancy is seen in the predictions made for the offline character level data and the offline word level data. Any error in the stroke recovery of an image propagates and leads to a cascading effect in the subsequent strokes. Since the number of strokes in a sample word is generally greater than the number of strokes in a character, this error builds up and can be the cause of the drop in accuracy. A possible solution to this could be to divide the word into regions and histograms and apply stroke recovery individually on each of the regions, followed by prediction of the individual entities. The final result could then be calculated via a voting system.
A future direction of this work to reduce the discrepancy between the online and offline results would be to combine online and offline features to train RNNs for better character wise training.
This paper proposes a unique approach for word-wise script recognition using character-wise training of Recurrent Neural Networks. This is demonstrated for both online and offline datasets. It demonstrates the basic hypothesis that the curves, or the temporal data, of online character level data comprise sufficient features for script recognition task of word level data. Training networks at the character level has major benefits such as faster training and requires marginal data, vis-a-vis word-level training. Thus, it is highly applicable to bootstrap script recognition using limited data for a diverse set of scripts and stands out from previous models that focus on extracting linguistic and/or statistical models and require more data for script recognition.
Offline data lacks the temporal data, which forms a crucial part for script prediction using our model. To over come this, we have developed a stroke recovery system for retrieving the strokes for offline character level and word level data and then using networks trained with online character level data for script prediction.
-  B.V.Dhandra and M. Hangarge. Article: Offline handwritten script identification in document images. International Journal of Computer Applications, 4(5):1–5, July 2010.
-  S. Chanda, K. Franke, and U. Pal. Identification of indic scripts on torn-documents. In 2011 International Conference on Document Analysis and Recognition, ICDAR 2011, Beijing, China, September 18-21, 2011, pages 713–717, 2011.
-  S. Chanda, U. Pal, and O. R. Terrades. Word-wise thai and roman script identification. ACM Trans. Asian Lang. Inf. Process., 8(3):11:1–11:21, 2009.
Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms.In
Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10, EMNLP ’02, pages 1–8, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics.
-  N. Das, R. Sarkar, S. Basu, M. Kundu, M. Nasipuri, and D. K. Basu. A genetic algorithm based region sampling for selection of local features in handwritten digit recognition application. Appl. Soft Comput., 12(5):1592–1606, May 2012.
T. E. de Campos, B. R. Babu, and M. Varma.
Character recognition in natural images.
VISAPP 2009 - Proceedings of the Fourth International Conference on Computer Vision Theory and Applications, Lisboa, Portugal, February 5-8, 2009 - Volume 2, pages 273–280, 2009.
-  B. V. Dhandra and M. Hangarge. Morphological reconstruction for word level script identification. CoRR, abs/1106.5156, 2011.
-  T. Dube, A. P. Shivaprasad, and D. Ghosh. Script recognition - a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32:2142–2161, 2010.
-  A. Elbaati, M. Kherallah, A. Ennaji, and A. M. Alimi. Temporal order recovery of the scanned handwriting. In Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, ICDAR ’09, pages 1116–1120, Washington, DC, USA, 2009. IEEE Computer Society.
-  M. A. Ferrer, A. Morales, N. Rodriguez, and U. Pal. Multiple training - one test methodology for handwritten word-script identification. In 14th International Conference on Frontiers in Handwriting Recognition, ICFHR 2014, Crete, Greece, September 1-4, 2014, pages 754–759, 2014.
-  A. Graves, S. Fernández, M. Liwicki, H. Bunke, and J. Schmidhuber. Unconstrained on-line handwriting recognition with recurrent neural networks. In Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 3-6, 2007, pages 577–584, 2007.
-  A. Graves, M. Liwicki, S. Fernández, R. Bertolami, H. Bunke, and J. Schmidhuber. A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell., 31(5):855–868, May 2009.
-  M. Hamdani, H. E. Abed, M. Kherallah, and A. M. Alimi. Combining multiple hmms using on-line and off-line features for off-line arabic handwriting recognition. In 10th International Conference on Document Analysis and Recognition, ICDAR 2009, Barcelona, Spain, 26-29 July 2009, pages 201–205, 2009.
-  P. S. Hiremath, J. D. Pujari, S. Shivashankar, and V. Mouneswara. Script identification in a handwritten document image using texture features. 2010 IEEE 2nd International Advance Computing Conference (IACC), pages 110–114, 2010.
-  S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735–1780, Nov. 1997.
-  Y. Kato and M. Yasuhara. Recovery of drawing order from single-stroke handwriting images. IEEE Trans. Pattern Anal. Mach. Intell., 22(9):938–949, Sept. 2000.
UCI machine learning repository, 2013.
-  M. Liwicki and H. Bunke. Iam-ondb - an on-line english sentence database acquired from handwritten text on a whiteboard. In Proceedings of the Eighth International Conference on Document Analysis and Recognition, ICDAR ’05, pages 956–961, Washington, DC, USA, 2005. IEEE Computer Society.
-  S. Madhvanath, D. Vijayasenan, and T. M. Kadiresan. Lipitk: A generic toolkit for online handwriting recognition. In ACM SIGGRAPH 2007 Courses, SIGGRAPH ’07, New York, NY, USA, 2007. ACM.
-  U. Marti and H. Bunke. The iam-database: an english sentence database for offline handwriting recognition. IJDAR, 5(1):39–46, 2002.
J. Mei, L. Dai, B. Shi, and X. Bai.
Scene text script identification with convolutional recurrent neural
23rd International Conference on Pattern Recognition, ICPR 2016, Cancún, Mexico, December 4-8, 2016, pages 4053–4058, 2016.
I. Moalla, A. Alimi, and A. Benhamadou.
Extraction of arabic words from multilingual documents.
Proc. Conf. Artificial Intelligence and Soft Computing, Marbella, Sep 2004.
-  I. Moalla, A. Elbaati, A. A. Alimi, and A. Benhamadou. Extraction of arabic text from multilingual documents. In IEEE International Conference on Systems, Man and Cybernetics, volume 4, page 5 pp., Oct 2002.
-  U. Pal, R. Jayadevan, and N. Sharma. Handwriting recognition in indian regional scripts: A survey of offline techniques. 11(1):1:1–1:35, Mar. 2012.
-  G. G. Rajput and A. H. B. Handwritten script recognition using dct and wavelet features at block level. IJCA, Special Issue on RTIPPR, (3):158–163, 2010.
-  K. Roy and U. Pal. Word-wise hand-written script separation for indian postal automation. In Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition, pages 521–526, 2006.
-  K. Roy, U. Pal, and B. B. Chaudhuri. Neural network based word-wise handwritten script identification system for indian postal automation. In Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing, 2005., pages 240–245, Jan 2005.
-  K. Roy, S. Vajda, A. Belaid, U. Pal, and B. B. Chaudhuri. A system for indian postal automation. In Proceedings of the Eighth International Conference on Document Analysis and Recognition, ICDAR ’05, pages 1060–1064, Washington, DC, USA, 2005. IEEE Computer Society.
-  K. Roy, S. Vajda, U. Pal, and B. B. Chaudhuri. A system towards indian postal automation. In Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition, IWFHR ’04, pages 580–585, Washington, DC, USA, 2004. IEEE Computer Society.
-  N. Sankaran and C. V. Jawahar. Recognition of printed devanagari text using blstm neural network. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pages 322–325, Nov 2012.
-  J. Schindelin, I. Arganda-Carreras, E. Frise, V. Kaynig, M. Longair, T. Pietzsch, S. Preibisch, C. Rueden, S. Saalfeld, B. Schmid, J.-Y. Tinevez, D. J. White, V. Hartenstein, K. Eliceiri, P. Tomancak, and A. Cardona. Fiji: an open-source platform for biological-image analysis. Nat Meth, 9(7):676–682, July 2012.
-  N. Sharma, R. Mandal, R. Sharma, U. Pal, and M. Blumenstein. Icdar2015 competition on video script identification (cvsi 2015). In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), ICDAR ’15, pages 1196–1200, Washington, DC, USA, 2015. IEEE Computer Society.
-  K. Ubul, G. Tursun, A. Aysa, D. Impedovo, G. Pirlo, and T. Yibulayin. Script identification of multi-script documents: A survey. IEEE Access, 5:6546–6559, 2017.
-  A. Ul-Hasan, M. Z. Afzal, F. Shafait, M. Liwicki, and T. M. Breuel. A sequence learning approach for multiple script identification. In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), ICDAR ’15, pages 1046–1050, Washington, DC, USA, 2015. IEEE Computer Society.