Log In Sign Up

EdgeNet: A novel approach for Arabic numeral classification

Despite the importance of handwritten numeral classification, a robust and effective method for a widely used language like Arabic is still due. This study focuses to overcome two major limitations of existing works: data diversity and effective learning method. Hence, the existing Arabic numeral datasets have been merged into a single dataset and augmented to introduce data diversity. Moreover, a novel deep model has been proposed to exploit diverse data samples of unified dataset. The proposed deep model utilizes the low-level edge features by propagating them through residual connection. To make a fair comparison with the proposed model, the existing works have been studied under the unified dataset. The comparison experiments illustrate that the unified dataset accelerates the performance of the existing works. Moreover, the proposed model outperforms the existing state-of-the-art Arabic handwritten numeral classification methods and obtain an accuracy of 99.59 validation phase. Apart from that, different state-of-the-art classification models have studied with the same dataset to reveal their feasibility for the Arabic numeral classification. Code available at


page 3

page 6


Neural Coreference Resolution for Arabic

No neural coreference resolver for Arabic exists, in fact we are not awa...

Deep Learning Autoencoder Approach for Handwritten Arabic Digits Recognition

This paper presents a new unsupervised learning approach with stacked au...

Calliar: An Online Handwritten Dataset for Arabic Calligraphy

Calligraphy is an essential part of the Arabic heritage and culture. It ...

A Deep CNN Architecture with Novel Pooling Layer Applied to Two Sudanese Arabic Sentiment Datasets

Arabic sentiment analysis has become an important research field in rece...

New Results for the Text Recognition of Arabic Maghribī Manuscripts – Managing an Under-resourced Script

HTR models development has become a conventional step for digital humani...

Digital Audio Forensics: Blind Human Voice Mimicry Detection

Audio is one of the most used way of human communication, but at the sam...

SpotFast Networks with Memory Augmented Lateral Transformers for Lipreading

This paper presents a novel deep learning architecture for word-level li...

1 Introduction

Handwritten numeral classification (HNC) is one of the most prominent research areas in computer vision for several decades. HNC is considered as a sub-field of optical character recognition (OCR) and aims to convert handwritten images into computer readable texts

[1]. In general, HNC has several real-life applications such as reading passports, recognize license-plates, sort postal mails, process bank cheques, and address book identification [2]. Considering the importance of HNC, a noticeable amount of research works has been conducted on Arabic script.

Arabic is known to be the fourth most widely used language script with more than 422 million native speakers. It is being used as an official language in more than twenty-six countries [3]. Unfortunately, existing Arabic HNC methods are still inefficient compared to the classification methods of other language scripts [4, 5]. Existing Arabic HNC methods are not optimized due to two main reasons. First, the recent Arabic HNC works are focused on homogeneous data samples [5]. Performance evaluation of these methods in diverse data samples is still due. Second, the recent works [6, 7] used stacked CNN inspired by LeNet [8]. However, it is well known that the network architecture with the residual connection can accelerate the classification performance for even complex data samples [9]. Moreover, performance evaluation of the existing state-of-the-art network architectures like VGG, ResNet, DenseNet etc. in Arabic HNC is still unexplored.

The limitations of the existing work have inspired this study to propose a robust and effective method for Arabic HNC. Here, robust and effective are defined as learning from diverse data samples and utilizing available knowledge with an efficient deep model. Hence, this work proposes to unify the existing Arabic numeral datasets and augment them for data diversity (robust) followed by a novel deep network with the low-level features (edge) as residual connection (effective). Particularly, edge information has been used in different computer vision applications such as object detection [10]

, face recognition

[11], stereo matching [12], image synthesizing [13]

, image inpainting

[14], etc. So far, none of the existing HNC works utilize the edge features as residual connection [9]. The proposed model with edge connection is refereed as EdgeNet in this study.

This study contributes to the Arabic HNC as follows:

  1. Unify and extend (with augmentation) existing Arabic numeral datasets to introduce data diversity.

  2. Propose a novel deep network (EdgeNet), which can utilize the low-level edge features and propagate it with residual connections.

  3. Study the performance of the existing work with the proposed unified dataset.

  4. Explore different state-of-the-art classification networks for Arabic HNC.

This paper has been organized into five sections. Section II highlights on related works, section III described the proposed method. Section IV demonstrates results and comparisons. Finally, Section V concludes the study.

2 Related Works

According to the classification approaches, the Arabic HNC works can be divided into three basic categories: 1) Handcrafted features with a linear classifier, 2) Deep Learning, and 3) Hybrid Approach.

Handcrafted features with a linear classifier. Prior to the widespread usage of deep learning methods, handcrafted features with linear classifier was considered as the state-of-the-art method for numeral classification. At that period, [5] studied the feasibility of different linear classifier for ANC. In their study, they had found that gradient features learned SVM can perform better than the other linear classifiers. In the later year, [15]

used a Gabor-based features extraction and applied to an SVM for ANC. With their method, they had achieved a validation accuracy of 97.94% on a private dataset containing 21,120 image samples. In another study,


utilized the discrete cosine transform (DCT) coefficients approach on a dynamic Bayesian network (DBN). They used a public dataset

[17] with 70,000 image samples and obtained a validation accuracy of 85.26%.

Deep Learning.

In recent years, the convolutional neural network (CNN) has outperformed the handcrafted feature extraction based methods for HNC. Particularly, LeNet had demonstrated a significant improvement over handcrafted feature based classification methods. Following the trend,

[6] applied LeNet for Arabic HNC and used the same dataset as [16] used in their method. They outperformed their previous study with deep learning and achieved an accuracy of 88.00%. In the following year, [18]

utilized deep autoencoder on the same dataset and set new benchmark results with 98.50 % validation accuracy. At the same year,

[7] also applied a stacked CNN on another public dataset containing 3000 data sample. Unfortunately, due to the lack of training samples, they were unable to outperform other deep learning based methods and obtained a classification result of 97.40 %. However, in follow up study, [19] had introduced data augmentation and significantly increased trainable parameters. Subsequently, the model was able to demonstrate a satisfactory improvement over the previous approaches and obtained a validation accuracy of 99.40%. To the best concern, this method used data augmentation for the first time in ANC. However, their method is questionable due to a large number of trainable parameters. It has been suspected to suffer from an overfitting phenomenon due to lack of insufficient training samples.

Hybrid Approach. Apart from the previous two categories, a hybrid approach has been adopted by [20]

. Moreover, they had combined restricted Boltzmann machine (RBM) and CNN for Arabic HNC. They had used the same dataset as

[7] and achieved an accuracy of 98.59% in the validation phase.

Despite the potential capability of edge feature as a residual connection, the recent methods on Arabic HNC are still focusing on stacked CNN. Moreover, existing works are focused on a specific type of data samples. Their conservative approaches restrict them to obtain maximum performance. Thus, this study focuses on overcoming the limitation of the existing works with a robust and effective HNC method. Moreover, the various state-of-the-art classification networks have also been studied for finding their feasibility for a widely used language script like Arabic.

3 Proposed Method

This study proposes a novel method for the Arabic handwritten numeral classification. As Fig. 1 demonstrates, the proposed method consists of two basic phases: a) data preparation and b)learning from data (EdgeNet). The data preparation method merges the existing Arabic numerals datasets and represents all data into a uniform dataset. Here, the merged dataset with unprocessed data sample has been denoted as the unprocessed unified dataset (). The unprocessed unified dataset () has been processed and presented as the processed unified dataset (). The training set from the processed unified dataset () has been enlarged with the augmentation and is denoted as the unified dataset (). Later, the image samples from the unified dataset () and their corresponding edge images have been used to study the proposed EdgeNet. Further, the EdgeNet utilizes the inputs and propagate low-level edge feature through a residual connection. The extracted features from the feature extraction block of EdgeNet has been feed into a softmax classifier for the final prediction.

Figure 1: Overview of the proposed work. a) Data Preparation: existing numeral datasets have been merged into a single dataset. The unprocessed unified dataset has been processed to obtain the processed unified dataset (). The training set () from the processed unified dataset () has been augmented to introduce data diversity and obtained the unified dataset (). b) Learning from data: Image samples from the unified dataset () has been used to extracted edge image. Edge image and the preprocessed images have been used to study the proposed EdgeNet.

3.1 Data Preparation

Depending on the region, Arabic script can have different handwritten numeral shapes [21, 22]. Moreover, the handwritten shape of each numeral can be overlapped with another numeral class with different convention e.g. "0" in Eastern Arabic dataset looks almost as same as "5" in Perso-Arabic dataset. As a result, Arabic HNC is considered as a challenging task. Fig. 2 illustrates samples of the widely used handwritten numeral shapes from the Arabic script. To utilize the variation of handwritten samples, this study proposes to merge existing Arabic numeral datasets and prepare for the model evaluation.

Figure 2: Sample shapes of Arabic handwritten numerals. The top row: Latin equivalent of Arabic numerals, the middle row: handwritten numeral samples of Eastern-Arabic region, the bottom row: handwritten numeral samples of Perso-Arabic region. The used datasets contain more variations of Arabic handwritten numerals.

3.1.1 Data Unification

Three publicly available datasets have been used to prepare the unprocessed unified dataset (): PMU-AD [23], CMATERDB 3.3.1 [24], and MADBASE [17]. These datasets are collected separately and contain a different number of image samples for an individual image class. In unprocessed unified dataset (), the data samples have been shuffled randomly without changing their corresponding image classes. After that, the unprocessed unified dataset () containing 78,180 image samples has been divided into three sub-division as per convention [25], where training set () contains 62,540 (80% of overall data), validation set () contains 11,712 (15% of overall data) and testing set () set contains 3,928 (5% of overall data) samples. Table. 1 demonstrates the data distribution used in this study. Note that, the processed unified dataset () also incorporated with the same data distribution as the unprocessed unified dataset ().

Dataset Training Samples () Validation Samples () Testing
Samples ()
CMATERDB 3.1.1 3,000 - - 3,000
PMU-UD 5,180 - - 5,180
MADBase 60,000 10,000 - 70,000
Unified Dataset ()
62,540 (80%) 11,712 (15%) 3,928 (5%) 78,180
Table 1: Overview of used datasets. The proposed unprocessed and processed unified dataset has been obtained from the existing Arabic benchmark datasets (numerals).

3.1.2 Data Preprocessing

The unprocessed unified dataset ) comprises of sample images with various text representations (as they are collected from the different dataset). Fig. 3 shows the image samples from used datasets. In PMU-AD dataset, the text appears as black on a white background with an image dimension of pixels. The CMATERDB 3.1.1 dataset has the same text representation as PMU-AD with an image dimension of pixels. The MADBase dataset has a different text representation than previously described datasets. The MADBase dataset adopted the image representation convention suggested by well known MNIST dataset [26]. Moreover, image samples of MADBase dataset have an image dimension of pixels and the text appears as white on a black background. However, in the proposed unified dataset () all image samples have to be represented in a uniform manner. Hence, the unprocessed unified dataset () has been processed as follows: . Here, , , and represent a prepossessing function, unprocessed unified dataset, and processed unified dataset respectively. As a part of data preprocessing, a sample image ) from unprocessed unified dataset ) has been resized into dimension. The background of all images has presented in white through bitwise "not" operation. Although previous study [17] reported that white information on a black background can accelerate the model performance, for the proposed EdgeNet it does not fit well (please see Section 4.1 for details). A morphological dilation [27] has been applied with a kernel size of to enhance the information [25]. Finally, processed unified dataset ) has been presented as , where ,, and denote the preprocessed training set, preprocessed validation set, and preprocessed testing set. Moreover, each image sample () of processed unified dataset () has been presented as , where and denote height and width respectively.

Figure 3: Image samples from the used datasets. a) Sample image from PMU-AD dataset with the dimension of . b) Sample image from CMATERDb 3.3.1. dataset with the dimension of . c) Smaple image from MADBase dataset with the dimension of .

3.1.3 Data Augmentation

To the best concern, this method used data augmentation for the first time in ANC. Data augmentation is known to be a useful technique to introduce data diversity in a specific dataset [28]. It also helps the deep model to avoid overfitting. In this study, the data augmentation has been intended to introduce data diversity in the available data samples. However, this study does not incorporate into proposing any new data augmentation method. Hence, the augmentation techniques have been adopted as per the suggestion of previous studies [25, 2]. The augmentation has been performed as follows: , where , , represent the data augmentation method, preprocessed training set and augmented training set respectively. Here, the augmented training set () can be represented as , where, represent the preprocessed, rotated, compressed, and translated images respectively. In later sections, images from has denoted as for simplicity. The data augmentation has performed as follows:

  • Rotation : Image () has been randomly rotated between (-45, +45) to obtain the rotated image () [25, 29].

  • Block Effect: Image () has been resized into 1444 dimension to loss some information[2]. After that, it has been resized into actual input dimension to obtain a compressed image ().

  • Translation: Translated image () has been obtained by applying a translation matrix () as affine translation. In order to find the translation matrix (), shift direction () has been randomly selected between (-5,+5) pixels [30, 25].

Fig. 4 shows a sample image (processed) and its corresponding augmented variants.

Figure 4: Example of data augmentation. a) Sample image (). b) Rotated image (). c) Blocked image () d) Translated image ().

The training samples of the training set () has been extended to 250,160 in the augmented training set (). The processed unified dataset () has been extended to unified dataset () as . The unified dataset () helps this study to introduce data diversity for Arabic HNC. In addition, the diverse data collection aims to prepare the proposed EdgeNet to handle more real-life data samples. As Fig. 5 demonstrates the proposed Unified dataset () contains more scattered data distribution comparing to the existing benchmark datasets. Here, the data distribution has been visualized with t-Distributed Stochastic Neighbor Embedding (t-SNE) [31], which allow intuiting the data arrangement of each dataset in a two-dimensional (2D) space.

Figure 5: Data distribution of used datasets using t-Distributed Stochastic Neighbor Embedding (t-SNE) (better visual representation can be perceived while viewed in color). The proposed Unified dataset () is comprised of more scattered distribution comparing to the existing datasets, which aims to help the proposed model to learn more diverse information for real-life applications. a) CMATERDb 3.1.1. b) PMU-UD. c) MADBase. d) Unified dataset ().

3.2 Learning from Data

Edge feature and residual connection are two different useful techniques to accelerate performance of a deep model [13]. This study proposes to use an edge feature as residual connection for Arabic HNC. Hence, the edge image () has been extracted from a sample image (). Moreover, the proposed EdgeNet has been feed with an input set () as . It allows the proposed model to learn two different kinds of information simultaneously.

3.2.1 Edge Extraction

Low-level edge feature has been used in several computer vision applications. Although there are different edge extraction techniques, the canny edge extraction method is known to be useful for the capability of noise suppression and precise shape extraction [32]. Hence, for the proposed EdgeNet, the canny edge extraction method has been used. Edge has been extracted from a sample image () as , where is the edge image with height () and width (). An optimal parameter setting of , , and has been applied as per the suggestion from a previous study [13]. Fig. 6 demonstrates the extracted edge images and their corresponding input samples. The feasibility of the different edge extraction methods has also been demonstrated in a Section. 4.1.

Figure 6: Edge image () has been extracted from Sample image () through canny edge extraction method. a) Image (). b) Edge image ().

3.2.2 EdgeNet Architecture

As Fig. 7 shows, the proposed model utilizes the input set (

) for the feature extraction with two concurrent convolutional layers (iConv for image feature extraction and eConv for edge feature extraction). Each of these layers comprises a feature depth of 16. The outputs of the iConv and eConv have been concatenated into a single tensor and feed into the Conv1 layer with a depth of 32. After that, two consecutive dilated convolution layers (Conv2 and Conv3) with dilation rate of 2 have been used to extract the high-level features

[13]. The output of Conv3 layer is concatenated with low-level edge feature (eConv) as residual connection and fed into another convolutional layer with feature depth of 32. This work denotes this residual connection as an edge connection. Here, the edge connection is used to propagate low-level information to the top layer of the feature extraction block. In the feature extraction block, each convolutional layer (including iConv and eConv) is comprised with a kernel size of

, strides size of 1, activated with a ReLu

[33] function and followed by dropout of 25%. Here, the dropout is used to avoid overfitting [25]. After the feature extraction, a global average pooling layer has been introduced to optimize the extracted features [34]. The pooling layer comprises of a kernel dimension of

. The output feature vector of the pooling layer has been flattened into a 1D tensor and feed into a softmax classifier. The softmax classifier is comprised of a fully connected (FC1) layer with a dimension of 128 and activated with a ReLu function. Like the feature extraction block, a dropout (rate of 25%) has been applied after FC1. The FC1 layer has been followed by FC2, which is the final layer of the proposed EdgeNet. The FC2 has the same dimension of image classes and activated with a softmax function


Figure 7: The network architecture of proposed EdgeNet. The low-level edge features have been propagated in the feature extraction block as an edge connection. Extracted features have been fed to a softmax classifier for the final prediction.

3.3 Experiment Setup

The trainable parameters of a deep model can be calculated with the following equation [36]:


Here, = total number of parameter, = total number of convolutional layers, = index of convolutional layer, = input dimension of convolutional layer, = filter size, = number of filter, = output dimension of convolutional layer, = index of fully connected (FC) layer, = number of FC layer, = input dimension of FC layer, and = output dimension of FC layer. Note that the pooling layer has been ignored as it does not add any trainable parameters to a deep model. As per Eq. 1, the total number of trainable parameters in EdgeNet is 836,938, which is slimmer than the state-of-the-art Arabic HNC work [19] with 4,977,290 trainable parameters. Lower amount of trainable parameters helps the proposed model to train faster and avoid overfitting (please see Section 4 for the details). The proposed model has been trained with an augmented training set () and validated with the validation set (). The best weight from the validation phase has been preserved and studied with a testing set (

) for the cross-validation. The learning rate of 0.001 has been selected to control the gradient steps. The batch size has been fixed to 128 and trained for 100 epochs. Adadelta


optimizer has been applied to optimize the categorical cross entropy losses. In the later section, the different variants of the proposed model have experimented for the justification with the same hyperparameters (please see section 4.1 for details). All experiment has been conducted on a machine running on Ubuntu 16.04.6 with a hardware configuration of 24GB random-access memory (RAM) and Intel core i7-7700 central processing unit (CPU). A Titan-XP graphical accelerated processing unit (GPU) has been utilized to accelerate the learning process.

4 Results and Comparison

The proposed EdgeNet has been justified with different experiments and compared with existing Arabic HNC works. Finally, the feasibility of the state-of-the-art network architecture from different classification application has been studied for the Arabic HNC.

4.1 Experiments with EdgeNet Variants

The different variants of proposed EdgeNet have been studied by modifying parameters such as data representation, edge extraction method, and removing edge connection. However, the overall architecture of EdgeNet has been remained unchanged. Table. 2 shows the experiments with EdgeNet variants. Here, EdgeNet, EdgeNet, EdgeNet, and EdgeNet denote the variants of EdgeNet, which denote without edge connection, Sobel edge extraction method, inverse data, and Laplacian of Gaussian (LoG) edge extraction method respectively. The experimental results illustrate that precise shape extraction through canny edge method and propagation of low-level features through edge connection help the proposed EdgeNet to outperform other variants. Moreover, it has obtained the maximum accuracy of 99.59% in the validation phase. Fig. 8 and Fig. 9

illustrate the validation accuracy and training loss over training iterations of three EdgeNet variants. Note that only canny edge features have been considered for data visualization as it outperformed other edge features such as Sobel and LoG. From the data visualization, it can be observed that the edge connection with low-level edge features accelerates the learning and validation performance of the proposed EdgeNet. However, the edge images contain white information on black background. Hence, the inverse data (white information on a black background) does not fit well with the proposed EdgeNet. The black background of inverse data samples is inefficient for the proposed EdgeNet as it introduces a tendency of overfitting while training the proposed model.

Network Variant Description Accuracy (Max)
EdgeNet Edge connection and edge image has been removed. 99.51
EdgeNet Sobel edge kernel [38]. 99.53
EdgeNet Image background changed into black 99.54
EdgeNet Laplacian of Gaussian (LoG) with kernel [39] 99.57
EdgeNet Proposed method 99.59
Table 2: Experiments with EdgeNet variants. The proposed EdgeNet with canny edge extraction method outperforms other variants due to precise edge extraction through the canny edge and low-level edge feature propagation through the edge connection.
Figure 8: Validation accuracy over training iteration. The proposed EdgeNet demonstrates a consistent validation accuracy over training phase and outperform other variants.
Figure 9: Training loss over training iteration. The proposed EdgeNet with canny edge learn features comparing to other variants.

The best weight of proposed EdgeNet has been utilized for further analysis. Moreover, each numeral class has been studied to identify the weak points of the proposed model. Fig. 10 demonstrates that similar shape like , and degrade the overall performance of the proposed EdgeNet.

Figure 10: Confusion matrix on the best weight of EdgeNet. Handwritten numerals with similar shape degraded the performance of the proposed EdgeNet.

4.2 Comparison with Existing Arabic Numeral Works

The existing Arabic HNC works lack of fair comparisons with other works. In most cases, the existing works conduct comparisons with other method incorporated with different datasets and different learning process (e.g., number of iterations). For the fair comparison, every method should be studied with the same dataset and should follow the same learning process. Hence, the existing HNC methods have been studied with the unified dataset (). It allows this study to make a fair comparison with the proposed EdgeNet. It also reveals the performance of existing methods in a diverse data collection. As like EdgeNet, the existing methods have been trained with augmented training set (), validated with the processed validation set (), and cross-validated with the processed testing set (). The existing models been have tuned with their suggested hyperparameters. Linear classifiers have fed with their suggested handcrafted feature vectors. All models have been trained for 100 iterations (epochs). Table. 3 shows the method, their reported results, and performance on the unified dataset ().

Method Reported Result Result in Unified Dataset
Reported Accuracy Data Sample Validation Testing
DCN-DBN[16] 85.26 70000 92.41 87.02
LeNet[6] 88.00 70000 99.12 99.10
MLP[7] 93.80 3000 95.57 88.78
CNN[7] 97.40 3000 99.03 99.01
Gabor-SVM[15] 97.94 21120 98.34 95.99
RBM-CNN[20] 98.59 3000 99.02 98.93
Autoencoder[18] 98.50 70000 98.96 98.93
CNN[19] 99.40 3000 99.40 99.33
- 99.59 99.50
Table 3: Comparison with existing works. The data diversity of unified dataset helps existing methods to accelerate their performance. Moreover, the proposed EdgeNet has utilized data diversity and outperformed the existing works in both validation and testing.

As Table. 3 demonstrates, the proposed method outperforms the existing works in both validation and testing phase. It achieved a validation and testing accuracy of 99.59% and 99.50% respectively. The data diversity of unified dataset () accelerates the performance of the existing methods as well. In both validation and testing phase, the existing methods surpassed their reported results. Particularly, [6] method demonstrate a significant improvement (more than 11%) over their reported result with the unified dataset () due to optimized performance of LeNet. The experiment results also justify that the deep model outperformed the handcrafted feature extraction based methods in a diverse data collection. The deep models are consistent in the testing phase as well. In exception, [19] method does not demonstrate validation improvement on unified dataset () due to a large number of training parameters (4,977,290) and apparent overfitted.

4.3 Study state-of-the-art Classification Networks on Unified Dataset

Several state-of-the-art network architectures outperformed the LeNet like simple stacked CNN for image classification. Unfortunately, none of the existing Arabic HNC work explored those network architectures and compared with their respective works. However, this work studied the feasibility of state-of-the-art network architectures for Arabic HNC. All networks have been tuned with their suggested hyperparameters and trained until the model converge with the given data. Table. 4 illustrates the overview of studied network architectures and their performance on the unified dataset.

Network Architecture Accuracy (%)
ACGAN [40] 98.83
Hybrid-HOG [29] 99.20
EvilNet [2] 99.23
HRNN [41] 99.25
VGG19 [42] 99.27
ResNet [9] 99.31
VGG16 [42] 99.42
InceptionV2 [43] 99.43
Densenet [44] 99.47
EdgeNet (Proposed) 99.59
Table 4: Comparison with state-of-the-art network architectures. The proposed EdgeNet outperforms compared network architectures without introducing a massive number of trainable parameters.

As the Table. 4 shows, proposed EdgeNet can outperform the existing network architectures with significantly lesser trainable parameters. Apart from that, It also reveals that the network architectures such as ResNet, DenseNet, VGG, and Inception can outperform the existing Arabic HNC methods.

4.4 EdgeNet performance on benchmark digit dataset (MNIST)

The proposed EdgeNet has been optimized particularly for ANC. In general, handwritten numerals of Arabic script is more complex than other language scripts (i.g., English). However, the performance of the proposed EdgeNet has been studied on a benchmark digit dataset known as MNIST. Here, the base MNIST dataset (e.g., without augmentation) has been used to study the performance evaluation. As Table. 5 demonstrates, the proposed EdgeNet outperforms other classification models for MNIST dataset. It has obtained an accuracy of 99.55% in the validation phase.

Network Architecture Accuracy (%)
ACGAN 98.78
HRNN 98.92
ResNet 99.21
Hybrid-HOG 99.22
VGG16 99.29
LeNet 99.31
VGG19 99.31
InceptionV2 99.32
Densenet 99.39
EvilNet 99.49
EdgeNet (Proposed) 99.55
Table 5: Performance evaluation of state-of-the-art network architecture on base MNIST dataset. The proposed EdgeNet demonstrates consistency for MNIST dataset and outperforms existing methods.

5 Conclusion

In this study, a novel approach for Arabic handwritten numeral classification has been proposed. The existing benchmark datasets have been unified and extended with augmentation. It allows this study to introduce data diversity. Finally, a deep network with edge connection has been proposed. The experimental results demonstrate that the proposed model with edge connection can outperform the existing state-of-the-art methods for the unified dataset. In the validation phase, the proposed EdgeNet achieved a classification accuracy of 99.59%. The feasibility of state-of-the-art network architectures is also studied for Arabic HNC. It has been found that different network architectures like VGG, DenseNet, ResNet, Inception etc. can outperform existing Arabic HNC methods. In addition, the proposed EdgeNet can outperform those network architectures without introducing a massive number of trainable parameters. In the foreseeable future, EdgeNet will be studied for more complex data distribution.


  • [1] Dil Nawaz Hakro, Imdad A Ismaili, Abdullah Zawawi Talib, Zeeshani Bhatti, and Ghulam Nabi Mojai. Issues and challenges in sindhi ocr. Sindh University Research Journal-SURJ (Science Series), 46(2), 2014.
  • [2] SMA Sharif and Mahdin Mahboob. Evil method: A deep cnn model for bangla handwritten numeral classification. In 2017 4th International Conference on Advances in Electrical Engineering (ICAEE), pages 217–222. IEEE, 2017.
  • [3] Iyad Abu Doush, Faisal Alkhateeb, and Anwaar Hamdi Gharaibeh. A novel arabic ocr post-processing using rule-based and word context techniques. International Journal on Document Analysis and Recognition (IJDAR), 21(1-2):77–89, 2018.
  • [4] Yasser M Alginahi. A survey on arabic character segmentation. International Journal on Document Analysis and Recognition (IJDAR), 16(2):105–126, 2013.
  • [5] Sherif Abdleazeem and Ezzat El-Sherif. Arabic handwritten digit recognition. International Journal of Document Analysis and Recognition (IJDAR), 11(3):127–141, 2008.
  • [6] Ahmed El-Sawy, EL-Bakry Hazem, and Mohamed Loey. Cnn for handwritten arabic digits recognition based on lenet-5. In International Conference on Advanced Intelligent Systems and Informatics, pages 566–575. Springer, 2016.
  • [7] Akm Ashiquzzaman and Abdul Kawsar Tushar. Handwritten arabic numeral recognition using deep learning neural networks. In

    2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR)

    , pages 1–4. IEEE, 2017.
  • [8] Yann LeCun et al. Lenet-5, convolutional neural networks. URL: http://yann. lecun. com/exdb/lenet, 20, 2015.
  • [9] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • [10] C Lawrence Zitnick and Piotr Dollár. Edge boxes: Locating object proposals from edges. In European conference on computer vision, pages 391–405. Springer, 2014.
  • [11] Yongsheng Gao and Maylor KH Leung. Face recognition using line edge map. IEEE transactions on pattern analysis and machine intelligence, 24(6):764–779, 2002.
  • [12] Xiao Song, Xu Zhao, Hanwen Hu, and Liangji Fang. Edgestereo: A context integrated residual pyramid network for stereo matching. arXiv preprint arXiv:1803.05196, 2018.
  • [13] SMA Sharif and Young Ju Jung. Deep color reconstruction for sparse color sensor. Optics Express, 27(16), 2019.
  • [14] Kamyar Nazeri, Eric Ng, Tony Joseph, Faisal Qureshi, and Mehran Ebrahimi. Edgeconnect: Generative image inpainting with adversarial edge learning. arXiv preprint arXiv:1901.00212, 2019.
  • [15] Sabri A Mahmoud. Arabic (indian) handwritten digits recognition using gabor-based features. In 2008 International Conference on Innovations in Information Technology, pages 683–687. IEEE, 2008.
  • [16] Jawad H AlKhateeb and Marwan Alseid. Dbn-based learning for arabic handwritten digit recognition using dct features. In 2014 6th international conference on Computer Science and Information Technology (CSIT), pages 222–226. IEEE, 2014.
  • [17] Ezzat Ali El-Sherif and Sherif Abdelazeem. A two-stage system for arabic handwritten digit recognition tested on a new large database. In Artificial Intelligence and Pattern Recognition, pages 237–242, 2007.
  • [18] Mohamed Loey, Ahmed El-Sawy, and Hazem El-Bakry. Deep learning autoencoder approach for handwritten arabic digits recognition. arXiv preprint arXiv:1706.06720, 2017.
  • [19] Akm Ashiquzzaman, Abdul Kawsar Tushar, Ashiqur Rahman, and Farzana Mohsin. An efficient recognition method for handwritten arabic numerals using cnn with data augmentation and dropout. In Data Management, Analytics and Innovation, pages 299–309. Springer, 2019.
  • [20] Ali Alani. Arabic handwritten digit recognition based on restricted boltzmann machine and convolutional neural networks. Information, 8(4):142, 2017.
  • [21] Seyyed Khorashadizadeh and Ali Latif. Arabic/farsi handwritten digit recognition usin histogra of oriented gradient and chain code histogram. International Arab Journal of Information Technology (IAJIT), 13(4), 2016.
  • [22] Hamid Salimi and Davar Giveki. Farsi/arabic handwritten digit recognition based on ensemble of svd classifiers and reliable multi-phase pso combination rule. International Journal on Document Analysis and Recognition (IJDAR), 16(4):371–386, 2013.
  • [23] Ghazanfar Latif. PMU-UD dataset download, 2018.
  • [24] N Das. CMATERDb dataset download.
  • [25] SMA Sharif and Mahdin Mahboob. Deep hog: A hybrid model to classify bangla isolated alpha-numerical symbols. Neural Network World, 29(3):111–133, 2019.
  • [26] Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  • [27] Ole Sigmund. Morphology-based black and white filters for topology optimization. Structural and Multidisciplinary Optimization, 33(4-5):401–424, 2007.
  • [28] K Manjusha, M Anand Kumar, and KP Soman. Integrating scattering feature maps with convolutional neural networks for malayalam handwritten character recognition. International Journal on Document Analysis and Recognition (IJDAR), 21(3):187–198, 2018.
  • [29] SMA Sharif, Nabeel Mohammed, Nafees Mansoor, and Sifat Momen. A hybrid deep model with hog features for bangla handwritten numeral classification. In 2016 9th International Conference on Electrical and Computer Engineering (ICECE), pages 463–466. IEEE, 2016.
  • [30] SMA Sharif and Mahdin Mahboob. A comparison between hybrid models for classifying bangla isolated basic characters. In 2017 4th International Conference on Advances in Electrical Engineering (ICAEE), pages 211–216. IEEE, 2017.
  • [31] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.

    Journal of machine learning research

    , 9(Nov):2579–2605, 2008.
  • [32] Chaohui Zhan, Xiaohui Duan, Shuoyu Xu, Zheng Song, and Min Luo. An improved moving object detection algorithm based on frame difference and edge detection. In Fourth International Conference on Image and Graphics (ICIG 2007), pages 519–523. IEEE, 2007.
  • [33] Yuanzhi Li and Yang Yuan. Convergence analysis of two-layer neural networks with relu activation. In Advances in Neural Information Processing Systems, pages 597–607, 2017.
  • [34] Y-Lan Boureau, Jean Ponce, and Yann LeCun. A theoretical analysis of feature pooling in visual recognition. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 111–118, 2010.
  • [35] Rob A Dunne and Norm A Campbell.

    On the pairing of the softmax activation and cross-entropy penalty functions and the derivation of the softmax activation function.

    In Proc. 8th Aust. Conf. on the Neural Networks, Melbourne, volume 181, page 185. Citeseer, 1997.
  • [36] Kaiming He and Jian Sun. Convolutional neural networks at constrained time cost. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5353–5360, 2015.
  • [37] Matthew D Zeiler. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012.
  • [38] Claudia I Gonzalez, Patricia Melin, Juan R Castro, Olivia Mendoza, and Oscar Castillo. An improved sobel edge detection method based on generalized type-2 fuzzy logic. Soft Computing, 20(2):773–784, 2016.
  • [39] Şaban Öztürk and Bayram Akdemir. Comparison of edge detection algorithms for texture analysis on glass production. Procedia-Social and Behavioral Sciences, 195:2675–2682, 2015.
  • [40] Augustus Odena, Christopher Olah, and Jonathon Shlens. Conditional image synthesis with auxiliary classifier gans. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2642–2651. JMLR. org, 2017.
  • [41] Junyoung Chung, Sungjin Ahn, and Yoshua Bengio. Hierarchical multiscale recurrent neural networks. arXiv preprint arXiv:1609.01704, 2016.
  • [42] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  • [43] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-First AAAI Conference on Artificial Intelligence, pages 4278–4284, 2017.
  • [44] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017.