Deep Cervix Model Development from Heterogeneous and Partially Labeled Image Datasets

by   Anabik Pal, et al.
National Institutes of Health

Cervical cancer is the fourth most common cancer in women worldwide. The availability of a robust automated cervical image classification system can augment the clinical care provider's limitation in traditional visual inspection with acetic acid (VIA). However, there are a wide variety of cervical inspection objectives which impact the labeling criteria for criteria-specific prediction model development. Moreover, due to the lack of confirmatory test results and inter-rater labeling variation, many images are left unlabeled. Motivated by these challenges, we propose a self-supervised learning (SSL) based approach to produce a pre-trained cervix model from unlabeled cervical images. The developed model is further fine-tuned to produce criteria-specific classification models with the available labeled images. We demonstrate the effectiveness of the proposed approach using two cervical image datasets. Both datasets are partially labeled and labeling criteria are different. The experimental results show that the SSL-based initialization improves classification performance (Accuracy: 2.5 from both datasets during SSL further improves the performance (Accuracy: 1.5 min). Further, considering data-sharing restrictions, we experimented with the effectiveness of Federated SSL and find that it can improve performance over the SSL model developed with just its images. This justifies the importance of SSL-based cervix model development. We believe that the present research shows a novel direction in developing criteria-specific custom deep models for cervical image classification by combining images from different sources unlabeled and/or labeled with varying criteria, and addressing image access restrictions.



There are no comments yet.


page 1


Resolution-Based Distillation for Efficient Histology Image Classification

Developing deep learning models to analyze histology images has been com...

Self-Supervised Learning from Unlabeled Fundus Photographs Improves Segmentation of the Retina

Fundus photography is the primary method for retinal imaging and essenti...

Intelligent Masking: Deep Q-Learning for Context Encoding in Medical Image Analysis

The need for a large amount of labeled data in the supervised setting ha...

Smart-Inspect: Micro Scale Localization and Classification of Smartphone Glass Defects for Industrial Automation

The presence of any type of defect on the glass screen of smart devices ...

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

While there has been remarkable progress in the performance of visual re...

Towards Human-Machine Cooperation: Self-supervised Sample Mining for Object Detection

Though quite challenging, leveraging large-scale unlabeled or partially ...

Does the Layout Really Matter? A Study on Visual Model Accuracy Estimation

In visual interactive labeling, users iteratively assign labels to data ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Cervical cancer is the fourth common cancer in women worldwide. Regular cervical screening can help in the early detection of the pre-cancerous lesions and reduce premature death. Visual inspection with acetic acid (VIA) is a commonly used low-cost cervical screening approach, but it suffers from significant inter- and intra-reader variability. In this regard, the availability of an automated cervical image classification system could address this limitation  [6]. It is challenging to develop a diagnostic system using traditional classification approaches as it needs enormous effort to develop effective hand-crafted features and are known to underperform [10, 7, 5]. Hence, the deep learning approaches [8] can be considered to build a robust classification model. However, the development of a deep model for robust classification needs a huge number of images and their class labels [3]

. Labeling cervical images is costly, needs multiple experts’ agreement and requires multiple diagnostic information. Transfer learning 

[13], i.e. transferring knowledge from natural images, is a commonly used approach to overcome these data limitations. However, transferring knowledge from the same domain may be more effective than transferring knowledge from a different domain.

Collaboration among multiple organizations to perform centralized learning (CL) by uniting all labeled images from all sources is another effective solution to overcome the data scarcity. However, supervised learning using combined images has limitations. First, different imaging devices may be used for image acquisition resulting in variations in the visual quality across different sources. It is challenging to estimate the proper ratio of images to be selected from various sources for the CL. Moreover, there may be data-sharing restrictions. To deal with data-sharing restrictions, federated learning (FL) can be employed 

[12]. However, federated supervised learning needs research efforts to address variability in class distribution among the sources [11]. Furthermore, inclusion of a dataset having noisy labels impede the training to produce robust models. Finally, and most importantly, the cervical image labeling criteria vary and depends on: availability of other diagnostic results, population under study, treatment planning, severity grading strategy, etc. The variety in the image labeling criteria across datasets makes the task more challenging as it restricts researchers in performing any kind of supervised collaborative (CL or FL) learning.

In this paper, we propose a self-supervised learning (SSL) based approach to develop a pre-trained cervix model (or cervix model). We use two cervical image datasets which are (i) labeled in a heterogeneous manner: labeling criteria vary across datasets and (ii) partially labeled: not all images in the datasets are labeled. As the SSL does not require any label, it allows us to include all available images in our datasets for cervix model development. Both centralized SSL and federated SSL are experimented. To evaluate the effectiveness of the developed cervix model, criteria (or dataset) specific classification models are trained with the available labeled images. The classification networks are initialized with the developed cervix models. Note that, according to our survey, no image dataset is publicly available for machine learning research towards supporting experts’ effort in visual assessment of acetic acid applied cervix and no well-accepted classification network is available for the present task. Hence, we confined our experiments only with the present datasets for the chosen competing algorithms.

In summary, our work is motivated from [14] and [2]. In [14], for medical image representation, an encoder-decoder based architecture was used to reconstruct original images from synthetically distorted images. In [2], a contrastive learning-based framework is proposed for natural image representation. In contrast to these works, we utilize the framework presented in  [2] and demonstrate the power of this framework for cervical image representation tasks in both centralized and federated learning schemes. We believe that our work has two key novelties: (a) it is the first work attempting to develop a cervix model from unlabelled images; (b) the first work where Federated Self-Supervised Learning (FSSL) is demonstrated for any medical image representation.

The remainder of the paper is organized as follows: Section II discusses the proposed methodology. The experimental protocol and analysis of experimental results are presented in Section III and Section IV respectively. Finally, Section V concludes the paper.

Ii Methods

Ii-a Self-supervised learning

Self-supervised learning (SSL) is a discriminative approach for visual representation learning. Developers define a pretext task and develop a deep model which captures the image semantics (i.e. good initialization weights for related domain’s downstream tasks) with zero labeling cost. In this paper, we employ a contrastive feature learning algorithm as the pretext task- i.e. all images will be semantically well separated, and an image and its augmented version will be semantically closer [2]. During SSL training, every mini-batch of size is constructed with random images and an augmented version of them. The training loss () between an image and its augmented version is given as:



is the feature vector of

image; is an indicator function evaluating to 1 iff ; representation transpose operation and is a constant. The loss in a mini-batch is computed across all pairs constructed with an image and its augmented version.

Ii-B Federated Self-supervised Learning (FSSL)

Recently, several different data protection laws and regulations (GDPR 2018 by EU, CCPA 2020 in the US, etc) have been created to protect information leakage due to sensitive data sharing (especially medical/banking domain). This motivates the researchers to develop federated learning (FL) algorithms for robust inter-institutional collaborated deep development from the data distributed in multiple institutions (clients) without sharing the raw data [12].

In this paper, we experiment with Federated Self-supervised Learning (FSSL) for cervix model development. We perform FSSL in the following two different ways:- (a) Client-Server FSSL (CSFSSL) and (b) Peer-to-Peer FSSL (PPFSSL). The CSFSSL contains multiple clients and a server. In this framework, firstly, the server sends a deep model to all clients. Then independently, at every client, the model is fine-tuned for

-epochs with local data. After that, the updated models from all clients are aggregated at the server and used as the initialized model for the next iteration. This procedure continues until the deep model is trained. Note that, we use weight averaging for model aggregation. On the other hand, in the PPFSSL, SSL is circularly performed- firstly the starting client runs SSL with its own data for

-epochs and then sends the trained model to the next client at which SSL is performed for -epochs using the received model as starting point and then the updated model is sent to next client and so on. This circular computation is performed for few iterations and the final model is shared among the clients.

Ii-C Proposed Approach: Cervical Model Development

In this paper, we use centralized SSL (CSSL) and two different variants of FSSL presented in Sec II-B for cervix model development. The block diagram of the CSSL, CSFSSL, and PPFSSL are shown in Fig 1 (a), Fig 1 (b) and Fig 1 (c) respectively. All methods use contrastive feature learning-based SSL presented in Sec II-A for cervical model development. We perform flip (horizontal and vertical) axis, rotation, random shift, and random zoom, gamma changing, and brightness changing to get the augmented version of an image.

Fig. 1: Block diagram of the SSL training system- (a) Centralized SSL (CSSL) (b) Client-Server FSSL (CSFSSL) and (c) Peer-to-Peer FSSL (PPFSSL).

Iii Experimental Protocol

Iii-a Dataset description

National Cancer Institute (NCI) at the US National Institutes of Health (NIH) conducted two different cohort studies (NHS [1] and ALTS [9]) for cervical examination. During these studies, in every visit, two images of the acetic acid applied cervix are captured. For the present research, NCI provides us a subset of images collected during these studies. The dataset containing NHS images is referred to as NHS dataset and the dataset containing ALTS images is referred to as ALTS dataset. NCI scientists labeled a subset of images available in NHS and ALTS datasets as case (disease) or control (non-disease) based on the availability of several screening and diagnostic information (like visual assessment, HPV, cytology, histopathology, colposcopy, etc). Different criteria were used to label these two datasets as the study objectives differs111NHS was from general population study, and ALTS was from triage study in colposcopic clinics.. For our research, we split both NHS and ALTS datasets at the woman level into three disjoint subsets- train, validation, and test. The training images are used for model training, the validation images are used for training hyper-parameter selection, and the test images are used for classification performance evaluation. The split-wise number of patients, number of labeled images, and total available images (including labeled images) for the datasets are given in Table I.

Split Class Patients Labeled Images Total Images
Train Case 91 124 182 248 2029 3145
Control 181 242 361 481
Valid Case 22 31 44 62 520 791
Control 45 60 90 120
Test Case 25 34 49 68
Control 50 65 99 130
TABLE I: Data set splits.

Iii-B Network architecture

We use ResNet-50 as a backbone network architecture. For SSL approaches, firstly, the top 1000-way classification layer is removed and a dense layer with a ReLU activation is added. The dense layer serves as the image representation vector. We vary the number of neurons among [64, 128, 256] for this layer and empirically decide to set it

as we obtain very close performances. For preparing the classification model, we remove the dense layer and put a single output neuron with sigmoid activation. The output obtained from the sigmoid layer is the case probability. The classification network is fully fine-tuned with the prepared cervix model.

Iii-C Competing methods

The paper aims to develop a cervix model, i.e., weight initialization method for cervical image classification network. We compare among six different approaches (i) Random: network weights are randomly initialized. (ii) ImageNet

: network weights are taken from the pre-trained ImageNet classification model- knowledge is transferred from natural images. (iii)

Self-supervised Learning (SSL): SSL is employed with the available images in a dataset. (iv) Centralized Self-supervised Learning (CSSL): All images from both datasets are combined to train the SSL as shown in Fig 1(a). (v) Client-Server Federated Self-supervised Learning (CSFSSL): Images are not shared, client SSL models are aggregated in the server (see Fig 1(b)). (vi) Peer-to-Peer Federated Self-supervised Learning (PPFSSL): Image are not shared, circularly SSL based fine-tuning is performed among the clients (see Fig 1(c)).

Iii-D Parameter Settings

We vary the network hyper-parameters for both SSL and classification model training and choose the best hyper-parameters based on the validation loss. The SSL networks are trained with following parameters: learning rate , weight decay = , momentum = , (See Eq 1), epochs = and the classification networks are trained with following hyper-parameters: learning rate , weight decay = , momentum = , epochs = , batch size =4. In both SSL and classification model training, we randomly shuffle images during batch construction. The learning stops when the validation loss is not decreasing for epochs. We use reverse class weighting to address the class imbalance issue in classification model training.

Iii-E Implementation

The Keras 

[4] deep learning tool-kit is used for implementing the networks. The networks are trained with 2 GeForce RTX 2080 Ti GPUs installed with an Intel(R) Xeon(R) Gold 5218 CPU (@ 2.30GHz). We implement federated learning in the same computing resources as sequential processes.

Iii-F Evaluation metrics

In this paper, we evaluate the performance of the dataset-specific classification models produced from the supervised learning with the varying initialization approaches mentioned in Sec III-C

. The following four quantitative evaluation metrics are computed for performance comparison: (1) Accuracy (ACC), (2) Recall, (3) Precision, (4) F1-Score.

Iv Experimental results and discussion

Fig. 2:

Receiver Operating Curve (ROC): (a) NHS and (b) ALTS. The numeric values represent the AUC values for the classifiers with considered initialization.

The Receiver Operating Curves (ROC) for all supervised classifiers for both NHS and ALTS datasets developed with varying initialization approaches are shown in Fig 2. The quantitative classification performance (performed with the metrics mentioned in Sec III-C) for the same is shown in Table II. According to Table II, we find that ImageNet initialization improves the accuracy for both datasets but recalls did not improve. The cervical image classification model initialization with SSL based approach is more effective than the ImageNet weights. In the SSL approach, for the NHS dataset, best accuracy, recall, and F1Score are obtained when ; and best precision is received when . In the SSL approach for ALTS dataset, best accuracy and precision are obtained when ; best recall and F1Score are obtained when . According to our experimental results, we observe a noticeable performance improvement in using SSL. Thus the experimental results justify the importance of SSL-based cervical model development. The CSSL based initialization approach further improves the performance of SSL for both datasets. This observation supports the development of the cervix model uniting images from both datasets. We find that for CSSL, in general, produces the best performance in both NHS and ALTS. Hence, we evaluate the performance of both Federated Self-Supervised Learning (FSSL) models for . For CSFSSL, we vary the value of i.e. local model updating for , , and find that provides us the best result which is reported in the ninth row of Table II. For PPFSSL, we opt two approaches. The first approach starts SSL training with NHS (called PPFSSLNHS) images and the other approach starts SSL training with ALTS (called PPFSSLALTS) images. We set for experimental similarity, and the classification performance with this model initialization is listed in the tenth and eleventh rows of Table II. We find that both PPFSSL and CSFSSL produce comparative performance and in general, federated SSL produces better results than SSL with images in its own dataset. This justifies the effectiveness of FSSL in addressing data sharing constraints in cervix model development.

Initialization NHS ALTS
Method ACC Recall Precision F1_Score ACC Recall Precision F1_Score
Random 79.73 0.5306 0.7879 0.6341 76.77 0.5735 0.6964 0.6290
ImageNet 80.41 0.4898 0.8571 0.6234 77.78 0.5588 0.7308 0.6333
SSL (N 8) 83.11 0.6327 0.8158 0.7126 79.80 0.7206 0.7000 0.7101
SSL (N 16) 83.11 0.6939 0.7727 0.7312 80.30 0.6029 0.7736 0.6777
SSL (N 32) 83.11 0.6531 0.8000 0.7191 79.29 0.6176 0.7368 0.6720
CSSL (N 8) 86.49 0.7143 0.8537 0.7778 81.82 0.7794 0.7162 0.7465
CSSL (N 16) 85.81 0.7551 0.8043 0.7789 81.31 0.6324 0.7818 0.6992
CSSL (N 32) 85.14 0.7959 0.7647 0.7800 81.31 0.6471 0.7719 0.7040
CSFSSL 84.46 0.6735 0.8250 0.7416 79.80 0.6912 0.7121 0.7015
PPFSSLNHS 84.46 0.7347 0.7826 0.7579 80.81 0.6618 0.7500 0.7031
PPFSSLALTS 84.46 0.7755 0.7600 0.7677 80.30 0.6618 0.7377 0.6977
TABLE II: Performance evaluation

V Conclusion and scope of future work

This paper discusses the challenges behind cervical image analysis due to labeling unavailability and variability and presents a novel direction to address them. Experimental results shows that the self-supervised learning algorithm is proved to be an efficient and effective candidate to deal with the label scarcity as well as the labeling variability. In addition, the experimentations on Federated Self-Supervised Learning shed into light to deal with the data-sharing restrictions. To the best of our knowledge, this is the first attempt to develop a cervix model in light of a domain-specific pre-trained model for task-specific fine-tuning.

The development of an improved cervix model by uniting larger datasets from different sources independent of labeling criteria, unavailability of labels, and different imaging devices is the immediate future scope of this work. The cervical image datasets for which the labeling efforts made by researchers at NCI are in progress will be used for this research. The engineering implementation of federated learning to work in a real scenario is another important future work. The presented idea can be employed by other medical image analysis tasks for utilizing unlabeled data and providing the same domain transfer learning.

Vi Acknowledgement

We are very much grateful to Dr. Mark Schiffman of the Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, and his team for providing us the images and labels used in this paper.


  • [1] M. C. Bratti, A. C. Rodríguez, M. Schiffman, A. Hildesheim, J. Morales, M. Alfaro, D. Guillén, M. Hutchinson, M. E. Sherman, C. Eklund, J. Schussler, J. Buckland, L. A Morera, F. Cárdenas, M. Barrantes, E. Pérez, T. J Cox, R. D Burk, and R. Herrero (2004-02) Description of a seven-year prospective study of human papillomavirus infection and cervical neoplasia among 10000 women in guanacaste, costa rica. Pan American journal of public health 2 (15), pp. 75–89. External Links: Document Cited by: §III-A.
  • [2] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton (2020-13–18 Jul) A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, H. D. III and A. Singh (Eds.), Proceedings of Machine Learning Research, Vol. 119, Virtual, pp. 1597–1607. Cited by: §I, §II-A.
  • [3] T. Ching, D. S. Himmelstein, B. K. Beaulieu-Jones, A. A. Kalinin, B. T. Do, G. P. Way, E. Ferrero, P. Agapow, M. Zietz, M. M. Hoffman, W. Xie, G. L. Rosen, B. J. Lengerich, J. Israeli, J. Lanchantin, S. Woloszynek, A. E. Carpenter, A. Shrikumar, J. Xu, E. M. Cofer, C. A. Lavender, S. C. Turaga, A. M. Alexandari, Z. Lu, D. J. Harris, D. DeCaprio, Y. Qi, A. Kundaje, Y. Peng, L. K. Wiley, M. H. S. Segler, S. M. Boca, S. J. Swamidass, A. Huang, A. Gitter, and C. S. Greene (April, 2018) Opportunities and obstacles for deep learning in biology and medicine.. Journal of the Royal Society, Interface. External Links: Document Cited by: §I.
  • [4] F. Chollet et al. (2015) Keras. Note: Cited by: §III-E.
  • [5] K. Fernandes, J. S. Cardoso, and J. Fernandes (2018) Automated methods for the decision support of cervical cancer screening using digital colposcopies. IEEE Access 6 (), pp. 33910–33927. Cited by: §I.
  • [6] L. Hu, D. Bell, S. Antani, Z. Xue, K. Yu, M. P. Horning, N. Gachuhi, B. Wilson, M. S. Jaiswal, B. Befano, L. R. Long, R. Herrero, M. H. Einstein, R. D. Burk, M. Demarco, J. C. Gage, A. C. Rodriguez, N. Wentzensen, and M. Schiffman (2019-01) An Observational Study of Deep Learning and Automated Evaluation of Cervical Images for Cancer Screening. JNCI: Journal of the National Cancer Institute 111 (9), pp. 923–932. External Links: ISSN 0027-8874, Document Cited by: §I.
  • [7] E. Kim and X. Huang (2013) A data driven approach to cervigram image analysis and classification. In Color Medical Image Analysis, pp. 1–13. External Links: ISBN 978-94-007-5389-1 Cited by: §I.
  • [8] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A.W.M. van der Laak, B. van Ginneken, and C. I. Sánchez (2017) A survey on deep learning in medical image analysis. Medical Image Analysis 42, pp. 60 – 88. External Links: ISSN 1361-8415, Document, Link Cited by: §I.
  • [9] M. Schiffman and M. E. Adrianza (2000-Sept-Oct) ASCUS-lsil triage study. design, methods and characteristics of trial participants. Acta Cytol 44(5), pp. 726–742. External Links: Document Cited by: §III-A.
  • [10] Y. Srinivasan, B. Nutter, S. Mitra, B. Phillips, and E. Sinzinger (2006) Classification of cervix lesions using filter bank-based texture mode. In 19th IEEE Symposium on Computer-Based Medical Systems (CBMS’06), Vol. , pp. 832–840. Cited by: §I.
  • [11] M. Yang, A. Wong, H. Zhu, H. Wang, and H. Qian (2020) Federated learning with class imbalance reduction. External Links: 2011.11266 Cited by: §I.
  • [12] Q. Yang, Y. Liu, Y. Cheng, Y. Kang, T. Chen, and H. Yu (2019) Vol. . External Links: Document Cited by: §I, §II-B.
  • [13] Q. Yang, Y. Zhang, W. Dai, and S. J. Pan (2020) Transfer learning. Cambridge University Press. External Links: Document Cited by: §I.
  • [14] Z. Zhou, V. Sodha, J. Pang, M. B. Gotway, and J. Liang (2021) Models genesis. Medical Image Analysis 67, pp. 101840. External Links: ISSN 1361-8415 Cited by: §I.