Cervical Cytology Classification Using PCA GWO Enhanced Deep Features Selection

06/09/2021 ∙ by Hritam Basak, et al. ∙ 17

Cervical cancer is one of the most deadly and common diseases among women worldwide. It is completely curable if diagnosed in an early stage, but the tedious and costly detection procedure makes it unviable to conduct population-wise screening. Thus, to augment the effort of the clinicians, in this paper, we propose a fully automated framework that utilizes Deep Learning and feature selection using evolutionary optimization for cytology image classification. The proposed framework extracts Deep feature from several Convolution Neural Network models and uses a two-step feature reduction approach to ensure reduction in computation cost and faster convergence. The features extracted from the CNN models form a large feature space whose dimensionality is reduced using Principal Component Analysis while preserving 99 this feature space using an evolutionary optimization algorithm, the Grey Wolf Optimizer, thus improving the classification performance. Finally, the selected feature subset is used to train an SVM classifier for generating the final predictions. The proposed framework is evaluated on three publicly available benchmark datasets: Mendeley Liquid Based Cytology (4-class) dataset, Herlev Pap Smear (7-class) dataset, and the SIPaKMeD Pap Smear (5-class) dataset achieving classification accuracies of 99.47 thus justifying the reliability of the approach. The relevant codes for the proposed approach can be found in: https://github.com/DVLP-CMATERJU/Two-Step-Feature-Enhancement



There are no comments yet.


page 6

page 19

page 20

page 21

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Cervical Cytology Classification Using PCA & GWO Enhanced Deep Features Selection

Hritam Basak

Department of Electrical Engineering, Jadavpur University

188, Raja S.C. Mullick Road,

Jadavpur, Kolkata-700032, West Bengal, INDIA

Email: hritambasak48@gmail.com

Rohit Kundu

Department of Electrical Engineering, Jadavpur University

188, Raja S.C. Mullick Road,

Jadavpur, Kolkata-700032, West Bengal, INDIA

Email: rohitkunduju@gmail.com

Sukanta Chakraborty

Theism Medical Diagnostics Centre

Dum Dum, Kolkata-700030, West Bengal, INDIA

Email: drsukantachakraborty@gmail.com

Nibaran Das*

Department of Computer Science & Engineering, Jadavpur University

188, Raja S.C. Mullick Road,

Jadavpur, Kolkata-700032, West Bengal, INDIA

Email: nibaran.das@jadavpuruniversity.in

*Corresponding author: Nibaran Das

*Corresponding author email: nibaran.das@jadavpuruniversity.in

*Corresponding author contact number: +913324572407

1 Introduction


Figure 1: Overall workflow of the proposed framework

Cancer is the second leading cause of death worldwide, causing around 9.6 million deaths every year and about 1/6th of the total deaths of the population throughout the globe. Studies also suggest that the economic impact of cancer is significant as most of these deaths have been reported from poor and middle-income countries where the living index is low and healthcare infrastructure is comparatively insufficient for diagnosis and potential treatments. Among them, cervical cancer is the fourth most common cancer worldwide, having around 570 thousands of reported cases every year and the second most common cancer in women causing around 311 thousands of deaths per year [17].

Pap smear test is currently the most reliable and renowned screening tool for the detection of pre-cancerous cells and cervical lesions in women based on microscopic studies of the cells. However, the process is time-consuming since pathologists need a lot of time to classify each cell from a slide of over 10,000 cells. Thus, an automated detection tool needs to be developed to classify pre-cancerous lesions for early detection and widespread screening.

With the advent of artificial intelligence and deep learning in the domain of medical sciences and healthcare

[7, 5, 4], it is more becoming to lie on the results predicted by this decision-support system [6, 12] to undermine the observer-bias issues. In this paper, we seek to develop an alternative approach that utilizes the Deep learning-based feature extraction and optimization algorithm that gives excellent multi-class classification accuracy, performing robustly and outperforming several existing methods like [18, 40, 11, 9, 32, 31, 15].

In this research, we extracted deep features [3, 8] from pre-trained CNN models and concatenated the features to form a large feature space. This is followed by the application of the Principal Component Analysis (PCA) method for dimensionality reduction of the feature space, while keeping most of the important features intact, for which we retained 99% of the variance of the feature space. Some of the features extracted from the CNN classifiers might be non-informative or might even lead to a higher misclassification rate. To eliminate such redundant, misleading features, generally, evolutionary optimization algorithms are preferred by researchers, but applying such algorithms directly to CNN-extracted features leads to computational wastage due to the very high dimensionality of the feature space. PCA reduces the dimensionality of the feature space or number of data points by combining highly correlated variables to form a smaller number of new variables, known as ”principal components”. From PCA, we also get higher variations in the data, even in a lower dimension. This new feature space of dimensionality lower than the original space, when used as an input to a metaheuristic optimization algorithm, significantly reduces the computation cost since a lower population size is required for faster convergence. The number of iterations required for reaching the global optima is also greatly reduced (i.e., faster evolution).

Specifically, among different evolutionary optimization algorithms available, we used the Grey Wolf Optimizer (GWO) [29]

for optimal feature set selection. The GWO is embedded with a Support Vector Machines (SVM) classifier


with Radial Basis Function (RBF) kernel for fitness assignment and final classification. This method significantly decreased the training time for the classification task, while maintaining the competitive accuracy of predictions. The overall workflow of the proposed framework is shown in Figure


The contributions of this paper can be summarized as follows:

  1. In the current paper, we propose a framework for the optimal selection of deep features extracted from Convolutional Neural Network (CNN) classifiers.

  2. The dimensionality of the feature set extracted from the CNNs is large and thus Principal Component Analysis (PCA) is used to reduce the dimensionality while retaining the highly discriminating features. The resulting feature subset, when used as the input of GWO, reduces the computation and ensures faster convergence.

  3. Optimal features are selected through the use of a nature-inspired evolutionary optimization algorithm, the Grey Wolf Optimizer (GWO), for the first time in the cervical cytology domain, which filters out only the non-redundant features for making the final predictions.

  4. The proposed framework has been evaluated on three publicly available datasets: the Herlev Pap Smear dataset [24], the SIPaKMeD Pap Smear dataset [34] and the Mendeley Liquid Based Cytology dataset [23], achieving classification accuracies of 98.32%, 97.87% and 99.47% respectively. The proposed method outperformed traditional CNN based approaches and is comparable to state-of-the-art methods.

The rest of the paper is organized as follows: Section 2 surveys some of the recent developments in the automated detection of cervical cancer; Section 3 describes the proposed methodology in detail; Section 4 evaluates the performance of the proposed framework on three publicly available benchmark datasets and Section 5 concludes the paper.

2 Related Work

Previous studies show that feature extraction and selecting non-redundant features is an important part of the classification process and it affects the classification result significantly. Different methods have been explored over the years, like traditionally handcrafted feature extraction and feature selection [2], Simulated Annealing (SA)[37], Convolutional Neural Networks [41], Fuzzy-C means [39], to name a few. Some have given very good results in binary classification but not so much in multi-class classification, only a few have successfully given good results even for multi-class classification problem [9, 11, 27]. One of the major concerns is the availability of open-access datasets and the volume of images available in them for each class, which is a major setback for a lot of proposed methods in the literature.

Chankong et al. [11] used different classifiers for binary and multi-class classification with the best accuracy obtained using ANN. William et al. [38] used an enhanced Fuzzy C-means by extracting features from the cell images in Herlev Dataset and got an accuracy of 98.88%. Byriel et al. [10] used the ANFIS neuro-fuzzy classification technique which achieved an accuracy of 96% on the 2-class problem but performed much worse on the 7-class problem. Zhang et al. [45] used MLBC slides stained in H&E in two ways: once posing it as a 2-class dataset and again using it as a 7-class dataset. They used the Fuzzy Clustering-means approach and got an accuracy of 96-97% in the 2-class problem and 72-77% on the 7-class problem. Bora et al. [9] used Ensemble classifier on the Herlev Dataset and got an accuracy of 96.5%. Marinakis et al. [26] in a way similar to [45, 42]

of using the Herlev Dataset as both a 2-class and a 7-class problem, used genetic algorithm combined with nearest neighbour classifier and got the best result with 1 nearest neighbour classifier giving an accuracy of 98.14% on the 2-class problem and 96.95% on the 7-class problem both in 10-fold cross-validation. Zhang et al.

[46] used a deep convolutional neural network architecture thus removing the need for cell segmentation like [22, 12] on the Herlev Dataset and achieved an accuracy of 98.3%.

However, the number of publicly available datasets related to the smear images of cervical cytology is quite less and each of them contains only nearly a thousand images or less. So it becomes quite difficult to design a classical deep learning or machine learning model with that few images to classify between these images with an improved accuracy than the preexisting methods. However, transfer learning can be used for this purpose that can quite significantly tackle this issue where we use a pre-trained model (for example ResNet-50 trained with ImageNet

[14]), fine-tune the model, and use that for the classification purpose. Akter et al. [1] performed experimentations with different machine learning classifiers performed detailed comparative analysis on their performance. Data augmentation can be another solution where we can virtually increase the dataset size by slight movement or rotation or some other changes of the images. However, these methods cannot improve the results significantly as they cannot add more features or information to the algorithm to learn from. Therefore, as suggested by [33], we tried a new optimal feature selection approach for improving the classification accuracy and robustness of the task.

3 Materials and Method

The experiment consisted of the following steps: (1) Data acquisition (collecting image datasets of Pap smear test results from different sources), (2) Data Preprocessing (structuring the data incorrect formats and verifying the datasets), (3) Feature extraction (Extracting the important features from the datasets using different CNN models), (4a) Combining the features from different CNNs (to increase the effectiveness of the features) (4b) Feature reduction using Principal Component Analysis (PCA) method (to discard the redundant features and to improve the classification time), (5) Fitting these features to the classifier and (6) Analysis of the results. The whole task was performed using a machine having NVIDIA Tesla K80 GPU with 12 GB of available RAM size.

3.1 Datasets Used

(a) Herlev Dataset
(b) Mendeley Dataset
(c) SIPaKMeD Dataset
Figure 2: Examples of images from each class in the three publicly available datasets

We use three publicly available cervical cytology datasets in this study for evaluating the proposed classification framework:

  1. Herlev Pap Smear dataset by Jantzen et al. [24]

  2. Mendeley Liquid Based Cytology dataset by Hussain et al. [23]

  3. SIPaKMeD Pap Smear dataset by Plissiti et al. [34]

These datasets are described in brief in the following subsections.

Dataset Class Category Cell type Number of images
Pap Smear
(Total: 917)
1 Normal Intermediate Squamous Epithelial 70
2 Normal Columnar Epithelial 98
3 Normal Superficial Squamous Epithelial 74
4 Abnormal
Mild Squamous
non-keratinizing Dysplasia
5 Abnormal
Squamous cell carcinoma
in-situ intermediate
6 Abnormal
Moderate Squamous
non-keratinizing Dysplasia
7 Abnormal
Severe Squamous
non-keratinizing Dysplasia
(Total: 963)
1 Normal Negative for Intraepithelial Malignancy 613
2 Abnormal
Low grade Squamous
Intraepithelial Lesion (LSIL)
3 Abnormal
High grade Squamous
Intraepithelial Lesion (HSIL)
4 Abnormal Squamous Cell Carcinoma (SCC) 74
Pap Smear
(Total: 4049)
1 Normal Superficial-Intermediate 831
2 Normal Parabasal 787
3 Abnormal Koilocytotic 825
4 Abnormal Dyskeratotic 813
5 Benign Metaplastic 793
Table 1: Distribution of images in the three publicly available datasets

3.1.1 Herlev Pap Smear Dataset

The Herlev Pap Smear dataset is a publicly available benchmark dataset consisting of 917 single cell images distributed unevenly among 7 different classes. The distribution of images in each class are tabulated in Table 1.

3.1.2 Mendeley Liquid Based Cytology Dataset

The Mendeley LBC dataset [23] developed at Obstetrics and Gynecology department of Guwahati Medical College and Hospital, consists of 963 whole slide images of cervical cytology distributed unevenly in four different classes as shown in Table 1.

3.1.3 SIPaKMeD Pap Smear dataset

The SIPaKMeD Pap Smear dataset by Plissiti et al. [34] consists of 4049 images of isolated cells (extracted from 966 whole slide images) categorized into five different classes based on their cytomorphological features. The distribution of images in the dataset is shown in Table 1.

3.2 Deep Features Extraction

Handcrafted or manual feature extraction using traditional Machine Learning techniques has limitations both in terms of the number of features and their correlations. Extracting features from a large dataset is a tedious task and can incorporate human biases, affecting the quality of the features that can eventually affect the classification task. Redundant features might be extracted which might lead to higher rates of misclassification. So, in this work, we extract deep features from CNN classifiers. Deep Learning models use backpropagation to learn the important features themselves, and thus eliminates the tedious process of using handcrafted features. For the present study, we have used ResNet-50

[19], VGG-16 [35], Inception v3 [36] and DenseNet-121 [21] for extraction of features from the penultimate layer of the models.

While performing feature extraction from a CNN, we use the pre-trained model and fine-tune the CNN using our data, letting each image propagate through the layers in a forwarding direction, terminating at the pre-final layer, and taking out the output of this layer as the feature vector. We use pre-trained weights (Transfer Learning) in this study because the biomedical data is scarce and insufficient for Deep Learning models to work efficiently if trained from scratch. ImageNet


dataset consists of 14 billion images divided into 1000 classes. We use the models pretrained on this dataset and replace the final classification layer of size 1000 with a layer of size equals to the number of classes in our dataset. A model pretrained on such a large dataset already has learned important features from image data, and just needs fine-tuning for less number of epochs to train the final classification layer that we added.

3.2.1 Vgg-16

The main characteristics of VGG nets [35] includes the use of convolution layers which gave a noticeable improvement in network performances while making the network deep.

receptive filters were used throughout the entire net with strides of

. Local Response Normalization (LRN) is not used in VGG Nets because memory consumption is more in such cases. The small-sized convolution filters give VGG Nets a chance to have a very large number of weight layers which in turn boosts performance. The input has a shape of

. In the present work, we fine-tune the VGG-16 model using our datasets, employing Stochastic Gradient Descent (SGD) optimizer and Rectified Linear Unit (ReLU) activation function.

3.2.2 ResNet-50

The ResNet-50 architecture [19] consists of residual skip connections embedded that make the training of the network easier. The gradient vanishing problem is addressed at the same time due to the embedding of the skip connections, which allows very deep networks to be accommodated for a controlled computation cost. sized inputs are used in the ResNet-50 model, with SGD optimizer and ReLU activation function.

3.2.3 Inception v3

The salient feature of the Inception v3 architecture [36], is the inception blocks that use parallel convolutions followed by channel concatenation. This leads to vivid features being extracted but with seemingly shallow networks. Parallel convolutions also allow the overfitting problem to be addressed while controlling the computational complexity. Inputs of shape are used with SGD optimizer and ReLU activation function for deep features extraction.

3.2.4 DenseNet-121

The DenseNet model [21] was proposed to address the vanishing gradient descent problem. The fundamental blocks in the DenseNet architecture are connected densely to each other leading to low computational requirement since the number of trainable parameters decreased heavily. The DenseNet architectures add small sets of feature maps owing to their narrow architecture. We used the DenseNet-121 variant using the SGD optimizer and ReLU activation function for deep features extraction.

3.3 Principle Component Analysis

Principal Component Analysis (PCA) is a linear dimensionality reduction method that transforms the higher dimensional data into a lower dimension by maximizing the variance of the lower dimension. PCA was first introduced by Wold et al. in 1987 [44], however, further development and implementation of PCA in machine learning problems was done quite significantly in the later period [45], [46]. The covariance matrix of the feature vector is computed first and followed by the computation of eigenvectors of this matrix. The eigenvectors that have the largest eigenvalues contribute to the formation of new reduced dimensionality of the feature vector. Thus, instead of losing some of the important features of the data, we kept the most important of the features by preserving 99% of the variance. Before applying the PCA algorithm for feature dimension reduction, we need to perform data preprocessing that is required for the further steps. Depending upon the

-dimensional training set

, we need to perform mean normalization or feature scaling similar to the supervised learning algorithms. The mean of each feature is computed as in Equation



Now, we replace each of the value with the value so that each of them has exactly zero mean value, however, if different features have different mean values, we can scale them so that they belong in a comparable range. In supervised learning, this scaling process of the element is defined by Equation 2, where, is the value or the static deviation of feature.


For reducing the dimension of the feature from to (where ), and to define the surface in -dimensional space onto which we project the data, we need to find the mean square error of the projected data on the dimensional vector. The computational proof of the calculation of these m vectors: and the projected points: on these vectors is complicated and beyond the scope of this paper. The covariance matrix is computed as in Equation 3 where, the vector has dimension and has dimension, thus making the covariance matrix of dimension of . Next, we calculate the eigenvalues and eigenvectors of the covariance matrix which represent the new magnitude of the feature vectors in the transformed vector space and their corresponding directions respectively. The eigenvalues quantify the variance of all the vectors as we are dealing with the covariance matrix. If an eigenvector has high valued eigenvectors, that means that it has high variance and contains various important information about the dataset. On the other hand, eigenvectors will small eigenvalues contain very small information about the dataset.


Hence the complete principal component of a data vector in the transformed coordinates can be allocated a score where is the eigenvector of the covariance matrix of . Therefore the full PCA decomposition of the vector can be represented as , where is the eigenvector of the covariance matrix. Now, we need to select -number of eigenvalues from these eigenvectors by maximizing the variance of the preserved original data while reducing the total square reconstruction error. Next, we calculate the Cumulative Explained Variance (CEV) which is the sum of variances (information) containing in the top principal components. Then we set a threshold value above which, the eigenvalues will be considered as useful and the rest will be discarded as unimportant features. For our experiment we have set the threshold value to 99, meaning that we have kept 99% of the variance of the data retained in the reduced feature vector. As different CNN extracts features of different modalities, the number of selected features after PCA and GWO are different based on the feature distribution in those feature sets.

The pseudo-code of dimensionality reduction using PCA is shown in Algorithm 1.

define_function: PCA
Feature set of dimension
Compute Co-variance matrix
while (i d) do
    while (j d) do
        sample mean of feature
        sample mean of feature
    end while
end while
decompose into eigenvalues and eigenvectors
calculate cumulative explained variance (CEV)
if (CEV threshold) then
    Construct projection matrix W
end if
transform input using
obtain k-dimensional feature subspace
Algorithm 1 Pseudo-code for Principal Component Analysis

3.4 Grey Wolf Optimizer

Grey Wolf Optimization (GWO) [29]

is a nature-inspired meta-heuristic optimization algorithm that mimics the leadership hierarchy of Grey Wolf (

) and their hunting process for the optimization. Four types of GWO agents are deployed for simulation of the optimization algorithm named alpha, beta, delta, and omega. They mimic the three-step hunting methods of the grey wolf: finding the prey, encircling them, and finally attacking them for the sake of optimization.

The grey wolves follow the leadership of the alpha wolf, which is the topmost category of their strict social hierarchy. The alpha wolf is not necessarily the strongest and the fittest ones, but they can maintain the discipline of the whole pack. The major decisions are taken by the alpha wolves but often accompanied by the subordinates, the beta wolf. They are the next lower level of wolves in their social hierarchy and convey the decisions o the alpha to the lower levels of wolves. They are generally the fittest candidates for alpha if the alpha becomes old or weak and plays a major role in maintaining the pack as a whole. The next level of wolves is called delta and they play a very major role in decision-making and other important activities of the pack. The last and the least important category of the pack is named omega and they often play the role of scapegoat in society. Thus the complete pack is formed based on dominance hierarchy. The mathematical model for the steps of optimization that mimics their hunting process is described below.

3.4.1 Social Hierarchy

Similar to the social hierarchy of grey wolf, the optimizer allocates the three fittest solutions as alpha, beta, and delta and the rest of the search agents are bound to arrange them and adjust accordingly as the parameters of the alpha, beta, and delta wolves. These three wolves are followed by the omega wolves.

3.4.2 Encircling the Prey (Optimal Solution)

To mathematically represent the encircling of prey, Equations 4, and 5 are used where is the present iteration and and are the coefficient vectors, indicates the vector position of the prey and indicates the vector position of the grey wolf.


The expression for and areas in Equations 6 and 7 respectively, where and are the random valued vectors between and inclusive and the value of decreases from to

linearly with increase in iteration. The two random variables

and allow a grey wolf agent to reach any position between the points. The agents of the grey wolf algorithm can position themselves around the fittest solution by adjustment of the value of and .


The same is applicable for -dimensional optimization where the grey wolf agents move along the hyper-sphere or hyper-cubes around the fittest solution obtained.

3.4.3 Reaching the Optimal Solution

In real life, grey wolves hunt for their prey being led by the alpha who is accompanied by beta and delta wolves and the other wolves follow their instructions. To simulate this hunting principle, we allow some random valued agents and find their fitness and consider the three most accurate results as alpha, beta, and delta as in abstract search space we have no idea about the position of the agents and the prey. The rest of the agents including the omega wolves are bound to change their positions and orientations according to the three best wolf agents.


The search agents update their position by these equations, however, the final position of the agents are not predefined, rather they are the random positions according to the position of the alpha, beta, and delta agents and within a certain circle which is determined by the position of the three best-fit solutions.

3.4.4 Exploiting the Prey

Grey wolves encircle their prey until the prey stops movement and this freezing the prey is known as exploiting. In the mathematical model, the value of is decreased with the agents approaching the prey, and hence the value of is modified further. The fluctuation of is stopped as the value of changes from to . The value of changes from to with the increase in iterations. The search agents can take any position between their current position and the position of the prey as the alpha, beta, and delta wolves approach the prey for hunting.

3.4.5 Exploring for the prey

The grey wolf agents diverge in search of prey and they finally converge for attacking the prey. In mathematical modelling, this phenomenon is regulated by the value of ; if the value of is greater than or less than , the grey wolf agents diverge from each other and find for some more suitable prey. However, if has a value between and , the agents converge towards the prey.

Figure 3: Flowchart showing the workflow of the Grey Wolf Optimization algorithm used in the proposed framework.

The overall pseudo-code of the GWO algorithm is shown in Algorithm 2 and the flowchart for the algorithm is shown in Figure 3.

define_function: GWO
Number of Search Agents:
Maximum number of iterations:
Initialize the GWO population
Initialize a, A, and C // According to Equations 6,7
Calculate the fitness of each search agent
= Alpha wolf (Best search agent)
= Beta wolf (Second best search agent)
= Delta wolf (Third best search agent)
while t  do
    for each search agent do
       Update position of current search agent // According to Equation 5
    end for
    Update a, A, and C
    Calculate the fitness of each search agent
    Update // According to Equations 11, 12 & 13
end while
Algorithm 2 Pseudo-code for the Grey Wolf Optimizer for feature selection.

3.5 Classification

After optimization, the final step is to fit the selected features to the classifier for the classification task. Due to a large number of features in some cases, we used incremental learning where a small batch of the dataset is selected for training the classifier and the loop over all the dataset and continue training until we reach convergence. This is fast and computationally efficient. We used an SVM classifier with the ‘RBF’ kernel for the multi-class classification task.

Raw RGB images:
=deep features extracted from CNN;
; // concatenation of features
// Using algorithm 1
// Using Algorithm 2
train-test split
// Train SVM Classifier
// Make predictions on test set
Compare predictions and labels and evaluate performance
Algorithm 3 Pseudo-code for overall workflow.

3.5.1 Support Vector Machine

SVM [47] is a supervised learning model, which, in a set of training examples, properly labelled with different classes, add new examples to each class making a complete non-probabilistic binary classifier out of this SVM, and is associated with some typical learning algorithms which analyse the data, specifically used for regression and classification tasks. SVM model representation of the training samples in the feature plane is such that a separation between the examples belonging to different classes becomes so prominent, that a curve can be fit in that space between two classes which maintain maximum distances from every point of each class and SVM fits that curve.

4 Results and Discussion

Model used for
feature extraction
No. of features
(before PCA)
No. of features
(after PCA)
Reduction in
feature dimension (%)
Improvement in
average training time (%)
ResNet-50 [19] 100353 383 99.62 88.425
VGG-16 [35] 25088 364 98.55 85.215
DenseNet-121 [21] 50177 330 99.34 81.449
Inception v3 [36] 131073 325 99.75 84.228
ResNet-50+VGG-16 125441 456 99.63 80.221
DenseNet-121+Inception v3 181250 687 99.62 82.694
DenseNet-121+Inception v3
306691 796 99.74 85.737
Table 2: Reduction in feature dimension and improvements in training time after principal component analysis on the Herlev dataset
(a) Herlev Pap Smear dataset
(b) Mendeley LBC dataset
(c) SIPaKMeD Pap Smear dataset
Figure 4: ROC curves obtained by the proposed method for the three datasets: (a) Herlev Pap Smear dataset (b) Mendeley LBC dataset and (c) SIPaKMeD Pap Smear dataset.

After extracting the features from the dataset using the CNN architectures said in Section 3, and the features were concatenated. We then use PCA (which retained 99% of the variance of the data) for the reduction in the dimensionality of the feature space and improvements in feature qualities respectively. Table 2 shows the statistics of reduction in feature dimensionality as well as the improvement of training time after this for the Herlev dataset. Then, we used the GWO algorithm and finally split the dataset and calculated the accuracy score for the training, validation, and testing sets. The overall workflow is shown in the form of pseudo-code in Algorithm 3. The results of our experiments are discussed in this section.

The metrics used for performance evaluation of the classification task for the multi-class problem is calculated based on Equations 15, 16, 17, 18

which are derived from a confusion matrix



To cross-validate the results of the classification task on different datasets and different features, we performed an AUC-ROC test on different datasets. The ROC (Receiver Operating Characteristics) curve is an important analyzing tool for validating the clinical findings of our experiment. The different line segments in the OVA (One Vs. All) ROC represent different classes stating that how good the features and the classifier performance are for classifying the different classes which can be broadly categorized in normal and infected cases. It represents the graphical analysis of the TPR (True Positive Rate) against the FPR (False Positive Rate) as the two operating characteristics criterion of the classifier based on the features selected. A false-positive result is a case when data of a healthy or uninfected class is predicted as an unhealthy or infected case by a classifier and it’s a major drawback of the classification task. This is reciprocated by the points lying far above the diagonal line of the ROC curve suggesting that the TPR is significantly high as compared to FPR. Another important feature for analyzing the classification result is the AUC (Area Under Curve) of the ROC curve which was computed considering the 97% of the confidence interval. The analysis using the AUC-ROC curves for different datasets and different features are discussed further.

4.1 Results on Herlev Pap Smear Dataset


Figure 5: Results on the Herlev Pap Smear dataset

The results obtained on different experiments on the Herlev Pap Smear dataset are shown in Figure 5. The best classification results observed this dataset was achieved by merging the feature extracted from ResNet-50 and VGG-16 models, which gave the performance metrics as follows: Accuracy = 98.32%, Precision = 98.66%, Recall = 97.65% and F1-score = 98.12%.

4.2 Results on the Mendeley LBC Dataset


Figure 6: Results on the Mendeley LBC dataset

The results obtained on different experiments on the Mendeley LBC dataset are shown in Figure 6. The best results on this dataset are obtained by merging features extracted from VGG-16, ResNet-50, Inception v3 and DenseNet-121: Accuracy = 99.47%, Precision = 99.14%, Recall = 99.27% and F1-score = 99.20%.

4.3 Results on SIPaKMeD Pap Smear dataset


Figure 7: Results on the SIPaKMeD Pap Smear Dataset

The results obtained on different experiments on the SIPaKMeD Pap Smear dataset are shown in Figure 7. The best results on the dataset are obtained by merging features extracted from VGG-16 and ResNet-50: Accuracy = 97.87%, Precision = 98.56%, Recall = 99.12% and F1-score = 98.89%.

Dataset Feature extractor model
Pap Smear
ResNet-50 [19] 97.77 0.026 96.33 0.032 96.55 0.028
VGG-16 [35] 96.36 0.031 94.21 0.109 95.56 0.081
DenseNet-121 [21] 97.89 0.02 97.01 0.024 96.61 0.024
Inception v3 [36] 96.33 0.032 95.17 0.098 95.32 0.094
ResNet-50+VGG-16 98.77 0.011 98.00 0.019 98.32 0.016
DenseNet-121+Inception v3 97.91 0.026 96.01 0.031 97.62 0.027
DenseNet-121+Inception v3
98.3 0.018 97.95 0.021 98.06 0.019
ResNet-50 [19] 96.88 0.066 96.11 0.071 96 0.079
VGG-16 [35] 97.91 0.051 96.39 0.068 97.56 0.059
DenseNet-121 [21] 97.16 0.061 96.15 0.07 96.32 0.071
Inception v3 [36] 97.5 0.058 97.05 0.06 97.2 0.061
ResNet-50+VGG-16 99.04 0.039 97.96 0.054 98.64 0.054
DenseNet-121+Inception v3 98.06 0.044 96.49 0.067 97.02 0.064
DenseNet-121+Inception v3
99.58 0.03 98.88 0.043 99.47 0.04
Pap Smear
ResNet-50 [19] 96.85 0.028 96.77 0.049 96.03 0.048
VGG-16 [35] 96.71 0.03 94.02 0.071 95.26 0.059
DenseNet-121 [21] 96.31 0.035 96.04 0.058 96.12 0.046
Inception v3 [36] 96.02 0.039 95.91 0.06 95.78 0.055
ResNet-50+VGG-16 98.48 0.014 97.55 0.041 97.87 0.034
DenseNet-121+Inception v3 97.32 0.02 95.39 0.066 96.33 0.044
DenseNet-121+Inception v3
96.92 0.025 96.66 0.051 96.46 0.042
Table 3: Accuracies and Losses on training, validation and testing sets using both PCA and GWO on the three datasets dataset (all the accuracy measurements are in % and measured after 30 epochs)

4.4 Comparison with Existing Literature

Optimization Mendeley LBC Dataset Herlev Pap-smear Dataset SIPaKMeD 5-class Dataset
Alogotithms ACC # of Features ACC # of Features ACC # of Features
PSO 95.90 920 92.58 992 90.14 1014
MVO 96.91 720 94.26 764 90.48 843
GWO 92.14 810 92.40 807 89.98 791
MFO 94.20 803 93.19 851 90.58 832
WOA 95.08 843 92.36 847 90.58 802
FFA 94.42 715 92.46 820 89.56 792
BAT 95.64 857 94.58 762 90.21 749
GA 98.23 724 95.26 784 95.43 796
PCA+GWO 99.47 762 98.32 796 97.87 736

Table 4:

Comparison (ACC, in %) with standard optimization algorithms: PSO = Particle Swarm Optimization

[25]; MVO = Mean Variance Optimization [16]; GWO = Grey Wolf Optimizer [29]; MFO = Moth Flame Optimization [30]; WOA = Whale Optimization Algorithm [28]; FFA = Firefly Algorithm [44]; BAT = Bat Optimization Algorithm [43]; GA = Genetic Algorithm [13, 20]

Several models have been proposed in the literature for cervical cell classification as discussed in Section 2. Our proposed work and the results achieved are therefore compared with some of these models that used the same datasets to assess the reliability of the proposed framework and the results are tabulated in Table 5. No papers as of yet have been published that use the Mendeley LBC dataset, and thus we are unable to compare our method in that dataset.

Dataset Method Results
Pap Smear
Genctav et al. [18]
Precision: 88%±0.15
Recall: 93%±0.15
Bora et al. [9] Accuracy: 96.51%
Win et al. [40] Accuracy: 90.84%
Chankong et al. [11] Accuracy: 93.78%
Proposed Method
Accuracy: 98.32%
Precision: 98.66%
Recall: 97.65%
F1-score: 98.12%
Pap Smear
Win et al. [40] Accuracy: 94.09%
Plissiti et al. [34]
1. Deep Convolutional+SVM: 93.35%±0.62
2. Deep Fully Connected+SVM: 94.44%±1.21
3. CNN: 95.35%±0.42
Proposed Method
Accuracy: 97.87%
Precision: 98.56%
Recall: 99.12%
F1-score: 98.89%
Table 5: Comparison of the proposed method with existing literature

4.5 McNemar’s Statistical Test

The McNemar’s statistical test has been performed in the present work, for the statistical analysis of the proposed classification framework. For this, the proposed model has been compared to the CNN models from which the features were extracted and used for the final classification. The results are shown in Table 6

. To reject the null hypothesis that the two models are similar, the

from the McNemar’s test should remain below 5% (i.e., 0.05), and from the table, it can be seen that for every comparison case, the . Thus, the null hypothesis can be rejected and it can be concluded that the proposed model is dissimilar to any of the feature extractor models and performs superior to them. Thus statistical analysis of the proposed model justifies the reliability of the approach devised in this research.

McNemar’s Test p-value
Performed with Herlev Pap Smear Mendeley LBC SIPaKMeD Pap Smear
ResNet-50 0.0046 0.0012 0.0005
VGG-16 0.0001 0.0211 0.0007
DenseNet-121 0.0103 0.0089 0.0315
Inception v3 0.0007 0.0061 0.0100
Table 6: Results obtained from McNemar’s statistical test. For all three datasets, the proposed framework is compared to the CNN models whose features have been used. The is less than 0.05 for every case and thus, the null hypothesis is rejected.

5 Conclusions and Future Work

The need for automation in the cervical cancer detection domain arises due to the high mortality rate throughout the close. Motivated by this cause, we developed a fully automated detection framework that optimizes deep features for classification. The two-level enhancement boosted the classification performance while simultaneously reducing the training time significantly. This research also explored the hybridization of multiple CNN-based deep features to extract more discriminating information from the dataset.

An alternative way of feature selection is exalted in this research that uses Principal Component Analysis (PCA) and Grey Wolf Optimization (GWO). The two-level feature reduction approach introduced in this paper leverages the advantages of both methods resulting in optimal feature set selection. The proposed method achieves better results juxtaposed to end-to-end classification with CNN models, while simultaneously reducing the computation cost. Very high classification accuracy of 99.47%, 98.32%, and 97.87% on the three publicly available benchmark datasets, namely Mendeley LBC, Herlev Pap Smear and SIPaKMeD Pap Smear datasets respectively tantamount to state-of-the-art methods.

However, there is scope for further improvement by utilizing different classification models and using hybrid metaheuristic feature selection algorithms. This paper craved a path for further research in this field as well as multi-domain adaptation.

The proposed pipeline can be used as a test-bed for several classification problems, not only in biomedical applications but in other computer vision problems as well. The feature selection can be further addressed by developing an end-to-end multi-objective hybrid optimization algorithm, that selects optimal feature set, where the objective function aims to increase the classification performance by selecting the least number of features, thereby reducing the computational cost simultaneously.


The work is supported by SERB (DST), Govt. of India (Ref. no. EEQ/2018/000963).

Conflict of interest

The authors declare that they have no conflict of interest.


  • [1] L. Akter, M. M. Islam, M. S. Al-Rakhami, M. R. Haque, et al. (2021) Prediction of cervical cancer from behavior risk using machine learning techniques. SN Computer Science 2 (3), pp. 1–10. Cited by: §2.
  • [2] H. A. AlMubarak, J. Stanley, P. Guo, R. Long, S. Antani, G. Thoma, R. Zuna, S. Frazier, and W. Stoecker (2019) A hybrid deep learning and handcrafted feature approach for cervical cancer digital histology image classification. International Journal of Healthcare Information Systems and Informatics (IJHISI) 14 (2), pp. 66–87. Cited by: §2.
  • [3] A. Azaza, M. Abdellaoui, and A. Douik (2021) Off-the-shelf deep features for saliency detection. SN Computer Science 2 (2), pp. 1–10. Cited by: §1.
  • [4] H. Basak, S. Ghosal, M. Sarkar, M. Das, and S. Chattopadhyay (2020)

    Monocular depth estimation using encoder-decoder architecture and transfer learning from single rgb image

    In 2020 IEEE 7th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON), pp. 1–6. Cited by: §1.
  • [5] H. Basak, R. Hussain, and A. Rana (2021) DFENet: a novel dimension fusion edge guided network for brain mri segmentation. arXiv preprint arXiv:2105.07962. Cited by: §1.
  • [6] H. Basak, R. Kundu, A. Agarwal, and S. Giri (2020)

    Single image super-resolution using residual channel attention network

    In 2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS), pp. 219–224. Cited by: §1.
  • [7] H. Basak and R. Kundu (2020) Comparative study of maturation profiles of neural cells in different species with the help of computer vision and deep learning. In International Symposium on Signal Processing and Intelligent Recognition Systems, pp. 352–366. Cited by: §1.
  • [8] H. Basak and A. Rana (2021) F-unet: a modified u-net architecture for segmentation of stroke lesion. In Computer Vision and Image Processing, pp. 32–43. External Links: ISBN 978-981-16-1086-8 Cited by: §1.
  • [9] K. Bora, M. Chowdhury, L. B. Mahanta, M. K. Kundu, and A. K. Das (2017) Automated classification of pap smear images to detect cervical dysplasia. Computer methods and programs in biomedicine 138, pp. 31–47. Cited by: §1, §2, §2, Table 5.
  • [10] J. Byriel (1999) Neuro-fuzzy classification of cells in cervical smears. Master’s Thesis, Technical University of Denmark: Oersted-DTU, Automation. Cited by: §2.
  • [11] T. Chankong, N. Theera-Umpon, and S. Auephanwiriyakul (2014) Automatic cervical cell segmentation and classification in pap smears. Computer methods and programs in biomedicine 113 (2), pp. 539–556. Cited by: §1, §2, §2, Table 5.
  • [12] S. Chattopadhyay and H. Basak (2020) Multi-scale attention u-net (msaunet): a modified u-net architecture for scene segmentation. arXiv preprint arXiv:2009.06911. Cited by: §1, §2.
  • [13] K. A. De Jong (1975) Analysis of the behavior of a class of genetic adaptive systems. Technical report Cited by: Table 4.
  • [14] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In

    2009 IEEE conference on computer vision and pattern recognition

    pp. 248–255. Cited by: §2, §3.2.
  • [15] S. Dey, S. Das, S. Ghosh, S. Mitra, S. Chakrabarty, and N. Das (2020) SynCGAN: using learnable class specific priors to generate synthetic data for improving classifier performance on cytological images. In Communications in Computer and Information Science, pp. 32–42. External Links: Link Cited by: §1.
  • [16] I. Erlich, G. K. Venayagamoorthy, and N. Worawat (2010) A mean-variance optimization algorithm. In

    IEEE Congress on Evolutionary Computation

    pp. 1–6. Cited by: Table 4.
  • [17] J. Ferlay, M. Colombet, I. Soerjomataram, C. Mathers, D. Parkin, M. Piñeros, A. Znaor, and F. Bray (2019) Estimating the global cancer incidence and mortality in 2018: globocan sources and methods. International journal of cancer 144 (8), pp. 1941–1953. Cited by: §1.
  • [18] A. GençTav, S. Aksoy, and S. ÖNder (2012) Unsupervised segmentation and classification of cervical cell images. Pattern recognition 45 (12), pp. 4151–4168. Cited by: §1, Table 5.
  • [19] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §3.2.2, §3.2, Table 2, Table 3.
  • [20] J. H. Holland et al. (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press. Cited by: Table 4.
  • [21] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708. Cited by: §3.2.4, §3.2, Table 2, Table 3.
  • [22] J. Huang, T. Wang, D. Zheng, and Y. He (2020) Nucleus segmentation of cervical cytology images based on multi-scale fuzzy clustering algorithm. Bioengineered 11 (1), pp. 484–501. Cited by: §2.
  • [23] E. Hussain, L. B. Mahanta, H. Borah, and C. R. Das (2020) Liquid based-cytology pap smear dataset for automated multi-class diagnosis of pre-cancerous and cervical cancer lesions. Data in Brief, pp. 105589. Cited by: item 4, item 2, §3.1.2.
  • [24] J. Jantzen, J. Norup, G. Dounias, and B. Bjerregaard (2005) Pap-smear benchmark data for pattern classification. Nature inspired Smart Information Systems (NiSIS 2005), pp. 1–9. Cited by: item 4, item 1.
  • [25] J. Kennedy and R. Eberhart (1995) Particle swarm optimization. In Proceedings of ICNN’95-International Conference on Neural Networks, Vol. 4, pp. 1942–1948. Cited by: Table 4.
  • [26] Y. Marinakis, G. Dounias, and J. Jantzen (2009) Pap smear diagnosis using a hybrid intelligent scheme focusing on genetic algorithm based feature selection and nearest neighbor classification. Computers in Biology and Medicine 39 (1), pp. 69–78. Cited by: §2.
  • [27] J. Martínez-Más, A. Bueno-Crespo, R. Martínez-España, M. Remezal-Solano, A. Ortiz-González, S. Ortiz-Reina, and J. Martínez-Cendán (2020) Classifying papanicolaou cervical smears through a cell merger approach by deep learning technique. Expert Systems with Applications 160, pp. 113707. Cited by: §2.
  • [28] S. Mirjalili and A. Lewis (2016) The whale optimization algorithm. Advances in engineering software 95, pp. 51–67. Cited by: Table 4.
  • [29] S. Mirjalili, S. M. Mirjalili, and A. Lewis (2014) Grey wolf optimizer. Advances in engineering software 69, pp. 46–61. Cited by: §1, §3.4, Table 4.
  • [30] S. Mirjalili (2015) Moth-flame optimization algorithm: a novel nature-inspired heuristic paradigm. Knowledge-based systems 89, pp. 228–249. Cited by: Table 4.
  • [31] S. Mitra, N. Das, S. Dey, S. Chakrabarty, M. Nasipuri, and M. K. Naskar (2020) Cytology image analysis techniques towards automation: systematically revisited. arXiv preprint arXiv:2003.07529. Cited by: §1.
  • [32] S. Mitra, S. Dey, N. Das, S. Chakrabarty, M. Nasipuri, and M. K. Naskar (2019-04) Identification of malignancy from cytological images based on superpixel and convolutional neural networks. In Studies in Computational Intelligence, pp. 103–122. External Links: Link Cited by: §1.
  • [33] K. Niedzielewski, M. E. Marchwiany, R. Piliszek, M. Michalewicz, and W. Rudnicki (2020) Multidimensional feature selection and high performance parallex. SN Computer Science 1 (1), pp. 1–7. Cited by: §2.
  • [34] M. E. Plissiti, P. Dimitrakopoulos, G. Sfikas, C. Nikou, O. Krikoni, and A. Charchanti (2018) SIPAKMED: a new dataset for feature and image based classification of normal and pathological cervical cells in pap smear images. In 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 3144–3148. Cited by: item 4, item 3, §3.1.3, Table 5.
  • [35] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §3.2.1, §3.2, Table 2, Table 3.
  • [36] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna (2016) Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826. Cited by: §3.2.3, §3.2, Table 2, Table 3.
  • [37] X. Wang and J. M. Garibaldi (2005) Simulated annealing fuzzy clustering in cancer diagnosis. Informatica 29 (1). Cited by: §2.
  • [38] W. William, A. Ware, A. H. Basaza-Ejiri, and J. Obungoloch (2019) A pap-smear analysis tool (pat) for detection of cervical cancer from pap-smear images. Biomedical engineering online 18 (1), pp. 16. Cited by: §2.
  • [39] W. William, A. Ware, A. H. Basaza-Ejiri, and J. Obungoloch (2019) Cervical cancer classification from pap-smears using an enhanced fuzzy c-means algorithm. Informatics in Medicine Unlocked 14, pp. 23–33. Cited by: §2.
  • [40] K. P. Win, Y. Kitjaidure, K. Hamamoto, and T. Myo Aung (2020) Computer-assisted screening for cervical cancer using digital image processing of pap smear images. Applied Sciences 10 (5), pp. 1800. Cited by: §1, Table 5.
  • [41] M. Wu, C. Yan, H. Liu, Q. Liu, and Y. Yin (2018) Automatic classification of cervical cancer from cytological images by using convolutional neural network. Bioscience reports 38 (6). Cited by: §2.
  • [42] D. Xue, X. Zhou, C. Li, Y. Yao, M. M. Rahaman, J. Zhang, H. Chen, J. Zhang, S. Qi, and H. Sun (2020) An application of transfer learning and ensemble learning techniques for cervical histopathology image classification. IEEE Access 8, pp. 104603–104618. Cited by: §2.
  • [43] X. Yang and A. H. Gandomi (2012) Bat algorithm: a novel approach for global engineering optimization. Engineering computations. Cited by: Table 4.
  • [44] X. Yang (2009) Firefly algorithms for multimodal optimization. In International symposium on stochastic algorithms, pp. 169–178. Cited by: Table 4.
  • [45] L. Zhang, H. Kong, C. Ting Chin, S. Liu, X. Fan, T. Wang, and S. Chen (2014) Automation-assisted cervical cancer screening in manual liquid-based cytology with hematoxylin and eosin staining. Cytometry Part A 85 (3), pp. 214–230. Cited by: §2.
  • [46] L. Zhang, L. Lu, I. Nogues, R. M. Summers, S. Liu, and J. Yao (2017) DeepPap: deep convolutional networks for cervical cell classification. IEEE journal of biomedical and health informatics 21 (6), pp. 1633–1643. Cited by: §2.
  • [47] Y. Zhang (2012) Support vector machine classification algorithm and its application. In International Conference on Information Computing and Applications, pp. 179–186. Cited by: §1, §3.5.1.