I Introduction
Cervical cancer is the fourth common cancer in women worldwide. Regular cervical screening can help in the early detection of the pre-cancerous lesions and reduce premature death. Visual inspection with acetic acid (VIA) is a commonly used low-cost cervical screening approach, but it suffers from significant inter- and intra-reader variability. In this regard, the availability of an automated cervical image classification system could address this limitation [6]. It is challenging to develop a diagnostic system using traditional classification approaches as it needs enormous effort to develop effective hand-crafted features and are known to underperform [10, 7, 5]. Hence, the deep learning approaches [8] can be considered to build a robust classification model. However, the development of a deep model for robust classification needs a huge number of images and their class labels [3]
. Labeling cervical images is costly, needs multiple experts’ agreement and requires multiple diagnostic information. Transfer learning
[13], i.e. transferring knowledge from natural images, is a commonly used approach to overcome these data limitations. However, transferring knowledge from the same domain may be more effective than transferring knowledge from a different domain.Collaboration among multiple organizations to perform centralized learning (CL) by uniting all labeled images from all sources is another effective solution to overcome the data scarcity. However, supervised learning using combined images has limitations. First, different imaging devices may be used for image acquisition resulting in variations in the visual quality across different sources. It is challenging to estimate the proper ratio of images to be selected from various sources for the CL. Moreover, there may be data-sharing restrictions. To deal with data-sharing restrictions, federated learning (FL) can be employed
[12]. However, federated supervised learning needs research efforts to address variability in class distribution among the sources [11]. Furthermore, inclusion of a dataset having noisy labels impede the training to produce robust models. Finally, and most importantly, the cervical image labeling criteria vary and depends on: availability of other diagnostic results, population under study, treatment planning, severity grading strategy, etc. The variety in the image labeling criteria across datasets makes the task more challenging as it restricts researchers in performing any kind of supervised collaborative (CL or FL) learning.In this paper, we propose a self-supervised learning (SSL) based approach to develop a pre-trained cervix model (or cervix model). We use two cervical image datasets which are (i) labeled in a heterogeneous manner: labeling criteria vary across datasets and (ii) partially labeled: not all images in the datasets are labeled. As the SSL does not require any label, it allows us to include all available images in our datasets for cervix model development. Both centralized SSL and federated SSL are experimented. To evaluate the effectiveness of the developed cervix model, criteria (or dataset) specific classification models are trained with the available labeled images. The classification networks are initialized with the developed cervix models. Note that, according to our survey, no image dataset is publicly available for machine learning research towards supporting experts’ effort in visual assessment of acetic acid applied cervix and no well-accepted classification network is available for the present task. Hence, we confined our experiments only with the present datasets for the chosen competing algorithms.
In summary, our work is motivated from [14] and [2]. In [14], for medical image representation, an encoder-decoder based architecture was used to reconstruct original images from synthetically distorted images. In [2], a contrastive learning-based framework is proposed for natural image representation. In contrast to these works, we utilize the framework presented in [2] and demonstrate the power of this framework for cervical image representation tasks in both centralized and federated learning schemes. We believe that our work has two key novelties: (a) it is the first work attempting to develop a cervix model from unlabelled images; (b) the first work where Federated Self-Supervised Learning (FSSL) is demonstrated for any medical image representation.
Ii Methods
Ii-a Self-supervised learning
Self-supervised learning (SSL) is a discriminative approach for visual representation learning. Developers define a pretext task and develop a deep model which captures the image semantics (i.e. good initialization weights for related domain’s downstream tasks) with zero labeling cost. In this paper, we employ a contrastive feature learning algorithm as the pretext task- i.e. all images will be semantically well separated, and an image and its augmented version will be semantically closer [2]. During SSL training, every mini-batch of size is constructed with random images and an augmented version of them. The training loss () between an image and its augmented version is given as:
(1) |
where
is the feature vector of
image; is an indicator function evaluating to 1 iff ; representation transpose operation and is a constant. The loss in a mini-batch is computed across all pairs constructed with an image and its augmented version.Ii-B Federated Self-supervised Learning (FSSL)
Recently, several different data protection laws and regulations (GDPR 2018 by EU, CCPA 2020 in the US, etc) have been created to protect information leakage due to sensitive data sharing (especially medical/banking domain). This motivates the researchers to develop federated learning (FL) algorithms for robust inter-institutional collaborated deep development from the data distributed in multiple institutions (clients) without sharing the raw data [12].
In this paper, we experiment with Federated Self-supervised Learning (FSSL) for cervix model development. We perform FSSL in the following two different ways:- (a) Client-Server FSSL (CSFSSL) and (b) Peer-to-Peer FSSL (PPFSSL). The CSFSSL contains multiple clients and a server. In this framework, firstly, the server sends a deep model to all clients. Then independently, at every client, the model is fine-tuned for
-epochs with local data. After that, the updated models from all clients are aggregated at the server and used as the initialized model for the next iteration. This procedure continues until the deep model is trained. Note that, we use weight averaging for model aggregation. On the other hand, in the PPFSSL, SSL is circularly performed- firstly the starting client runs SSL with its own data for
-epochs and then sends the trained model to the next client at which SSL is performed for -epochs using the received model as starting point and then the updated model is sent to next client and so on. This circular computation is performed for few iterations and the final model is shared among the clients.Ii-C Proposed Approach: Cervical Model Development
In this paper, we use centralized SSL (CSSL) and two different variants of FSSL presented in Sec II-B for cervix model development. The block diagram of the CSSL, CSFSSL, and PPFSSL are shown in Fig 1 (a), Fig 1 (b) and Fig 1 (c) respectively. All methods use contrastive feature learning-based SSL presented in Sec II-A for cervical model development. We perform flip (horizontal and vertical) axis, rotation, random shift, and random zoom, gamma changing, and brightness changing to get the augmented version of an image.

Iii Experimental Protocol
Iii-a Dataset description
National Cancer Institute (NCI) at the US National Institutes of Health (NIH) conducted two different cohort studies (NHS [1] and ALTS [9]) for cervical examination. During these studies, in every visit, two images of the acetic acid applied cervix are captured. For the present research, NCI provides us a subset of images collected during these studies. The dataset containing NHS images is referred to as NHS dataset and the dataset containing ALTS images is referred to as ALTS dataset. NCI scientists labeled a subset of images available in NHS and ALTS datasets as case (disease) or control (non-disease) based on the availability of several screening and diagnostic information (like visual assessment, HPV, cytology, histopathology, colposcopy, etc). Different criteria were used to label these two datasets as the study objectives differs111NHS was from general population study, and ALTS was from triage study in colposcopic clinics.. For our research, we split both NHS and ALTS datasets at the woman level into three disjoint subsets- train, validation, and test. The training images are used for model training, the validation images are used for training hyper-parameter selection, and the test images are used for classification performance evaluation. The split-wise number of patients, number of labeled images, and total available images (including labeled images) for the datasets are given in Table I.
Split | Class | Patients | Labeled Images | Total Images | |||
NHS | ALTS | NHS | ALTS | NHS | ALTS | ||
Train | Case | 91 | 124 | 182 | 248 | 2029 | 3145 |
Control | 181 | 242 | 361 | 481 | |||
Valid | Case | 22 | 31 | 44 | 62 | 520 | 791 |
Control | 45 | 60 | 90 | 120 | |||
Test | Case | 25 | 34 | 49 | 68 | ||
Control | 50 | 65 | 99 | 130 |
Iii-B Network architecture
We use ResNet-50 as a backbone network architecture. For SSL approaches, firstly, the top 1000-way classification layer is removed and a dense layer with a ReLU activation is added. The dense layer serves as the image representation vector. We vary the number of neurons among [64, 128, 256] for this layer and empirically decide to set it
as we obtain very close performances. For preparing the classification model, we remove the dense layer and put a single output neuron with sigmoid activation. The output obtained from the sigmoid layer is the case probability. The classification network is fully fine-tuned with the prepared cervix model.
Iii-C Competing methods
The paper aims to develop a cervix model, i.e., weight initialization method for cervical image classification network. We compare among six different approaches (i) Random: network weights are randomly initialized. (ii) ImageNet
: network weights are taken from the pre-trained ImageNet classification model- knowledge is transferred from natural images. (iii)
Self-supervised Learning (SSL): SSL is employed with the available images in a dataset. (iv) Centralized Self-supervised Learning (CSSL): All images from both datasets are combined to train the SSL as shown in Fig 1(a). (v) Client-Server Federated Self-supervised Learning (CSFSSL): Images are not shared, client SSL models are aggregated in the server (see Fig 1(b)). (vi) Peer-to-Peer Federated Self-supervised Learning (PPFSSL): Image are not shared, circularly SSL based fine-tuning is performed among the clients (see Fig 1(c)).Iii-D Parameter Settings
We vary the network hyper-parameters for both SSL and classification model training and choose the best hyper-parameters based on the validation loss. The SSL networks are trained with following parameters: learning rate , weight decay = , momentum = , (See Eq 1), epochs = and the classification networks are trained with following hyper-parameters: learning rate , weight decay = , momentum = , epochs = , batch size =4. In both SSL and classification model training, we randomly shuffle images during batch construction. The learning stops when the validation loss is not decreasing for epochs. We use reverse class weighting to address the class imbalance issue in classification model training.
Iii-E Implementation
Iii-F Evaluation metrics
In this paper, we evaluate the performance of the dataset-specific classification models produced from the supervised learning with the varying initialization approaches mentioned in Sec III-C
. The following four quantitative evaluation metrics are computed for performance comparison: (1) Accuracy (ACC), (2) Recall, (3) Precision, (4) F1-Score.
Iv Experimental results and discussion
![]() |
![]() |
Receiver Operating Curve (ROC): (a) NHS and (b) ALTS. The numeric values represent the AUC values for the classifiers with considered initialization.
The Receiver Operating Curves (ROC) for all supervised classifiers for both NHS and ALTS datasets developed with varying initialization approaches are shown in Fig 2. The quantitative classification performance (performed with the metrics mentioned in Sec III-C) for the same is shown in Table II. According to Table II, we find that ImageNet initialization improves the accuracy for both datasets but recalls did not improve. The cervical image classification model initialization with SSL based approach is more effective than the ImageNet weights. In the SSL approach, for the NHS dataset, best accuracy, recall, and F1Score are obtained when ; and best precision is received when . In the SSL approach for ALTS dataset, best accuracy and precision are obtained when ; best recall and F1Score are obtained when . According to our experimental results, we observe a noticeable performance improvement in using SSL. Thus the experimental results justify the importance of SSL-based cervical model development. The CSSL based initialization approach further improves the performance of SSL for both datasets. This observation supports the development of the cervix model uniting images from both datasets. We find that for CSSL, in general, produces the best performance in both NHS and ALTS. Hence, we evaluate the performance of both Federated Self-Supervised Learning (FSSL) models for . For CSFSSL, we vary the value of i.e. local model updating for , , and find that provides us the best result which is reported in the ninth row of Table II. For PPFSSL, we opt two approaches. The first approach starts SSL training with NHS (called PPFSSLNHS) images and the other approach starts SSL training with ALTS (called PPFSSLALTS) images. We set for experimental similarity, and the classification performance with this model initialization is listed in the tenth and eleventh rows of Table II. We find that both PPFSSL and CSFSSL produce comparative performance and in general, federated SSL produces better results than SSL with images in its own dataset. This justifies the effectiveness of FSSL in addressing data sharing constraints in cervix model development.
Initialization | NHS | ALTS | ||||||
---|---|---|---|---|---|---|---|---|
Method | ACC | Recall | Precision | F1_Score | ACC | Recall | Precision | F1_Score |
Random | 79.73 | 0.5306 | 0.7879 | 0.6341 | 76.77 | 0.5735 | 0.6964 | 0.6290 |
ImageNet | 80.41 | 0.4898 | 0.8571 | 0.6234 | 77.78 | 0.5588 | 0.7308 | 0.6333 |
SSL (N 8) | 83.11 | 0.6327 | 0.8158 | 0.7126 | 79.80 | 0.7206 | 0.7000 | 0.7101 |
SSL (N 16) | 83.11 | 0.6939 | 0.7727 | 0.7312 | 80.30 | 0.6029 | 0.7736 | 0.6777 |
SSL (N 32) | 83.11 | 0.6531 | 0.8000 | 0.7191 | 79.29 | 0.6176 | 0.7368 | 0.6720 |
CSSL (N 8) | 86.49 | 0.7143 | 0.8537 | 0.7778 | 81.82 | 0.7794 | 0.7162 | 0.7465 |
CSSL (N 16) | 85.81 | 0.7551 | 0.8043 | 0.7789 | 81.31 | 0.6324 | 0.7818 | 0.6992 |
CSSL (N 32) | 85.14 | 0.7959 | 0.7647 | 0.7800 | 81.31 | 0.6471 | 0.7719 | 0.7040 |
CSFSSL | 84.46 | 0.6735 | 0.8250 | 0.7416 | 79.80 | 0.6912 | 0.7121 | 0.7015 |
PPFSSLNHS | 84.46 | 0.7347 | 0.7826 | 0.7579 | 80.81 | 0.6618 | 0.7500 | 0.7031 |
PPFSSLALTS | 84.46 | 0.7755 | 0.7600 | 0.7677 | 80.30 | 0.6618 | 0.7377 | 0.6977 |
V Conclusion and scope of future work
This paper discusses the challenges behind cervical image analysis due to labeling unavailability and variability and presents a novel direction to address them. Experimental results shows that the self-supervised learning algorithm is proved to be an efficient and effective candidate to deal with the label scarcity as well as the labeling variability. In addition, the experimentations on Federated Self-Supervised Learning shed into light to deal with the data-sharing restrictions. To the best of our knowledge, this is the first attempt to develop a cervix model in light of a domain-specific pre-trained model for task-specific fine-tuning.
The development of an improved cervix model by uniting larger datasets from different sources independent of labeling criteria, unavailability of labels, and different imaging devices is the immediate future scope of this work. The cervical image datasets for which the labeling efforts made by researchers at NCI are in progress will be used for this research. The engineering implementation of federated learning to work in a real scenario is another important future work. The presented idea can be employed by other medical image analysis tasks for utilizing unlabeled data and providing the same domain transfer learning.
Vi Acknowledgement
We are very much grateful to Dr. Mark Schiffman of the Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, and his team for providing us the images and labels used in this paper.
References
- [1] (2004-02) Description of a seven-year prospective study of human papillomavirus infection and cervical neoplasia among 10000 women in guanacaste, costa rica. Pan American journal of public health 2 (15), pp. 75–89. External Links: Document Cited by: §III-A.
- [2] (2020-13–18 Jul) A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, H. D. III and A. Singh (Eds.), Proceedings of Machine Learning Research, Vol. 119, Virtual, pp. 1597–1607. Cited by: §I, §II-A.
- [3] (April, 2018) Opportunities and obstacles for deep learning in biology and medicine.. Journal of the Royal Society, Interface. External Links: Document Cited by: §I.
- [4] (2015) Keras. Note: https://keras.io Cited by: §III-E.
- [5] (2018) Automated methods for the decision support of cervical cancer screening using digital colposcopies. IEEE Access 6 (), pp. 33910–33927. Cited by: §I.
- [6] (2019-01) An Observational Study of Deep Learning and Automated Evaluation of Cervical Images for Cancer Screening. JNCI: Journal of the National Cancer Institute 111 (9), pp. 923–932. External Links: ISSN 0027-8874, Document Cited by: §I.
- [7] (2013) A data driven approach to cervigram image analysis and classification. In Color Medical Image Analysis, pp. 1–13. External Links: ISBN 978-94-007-5389-1 Cited by: §I.
- [8] (2017) A survey on deep learning in medical image analysis. Medical Image Analysis 42, pp. 60 – 88. External Links: ISSN 1361-8415, Document, Link Cited by: §I.
- [9] (2000-Sept-Oct) ASCUS-lsil triage study. design, methods and characteristics of trial participants. Acta Cytol 44(5), pp. 726–742. External Links: Document Cited by: §III-A.
- [10] (2006) Classification of cervix lesions using filter bank-based texture mode. In 19th IEEE Symposium on Computer-Based Medical Systems (CBMS’06), Vol. , pp. 832–840. Cited by: §I.
- [11] (2020) Federated learning with class imbalance reduction. External Links: 2011.11266 Cited by: §I.
- [12] (2019) Vol. . External Links: Document Cited by: §I, §II-B.
- [13] (2020) Transfer learning. Cambridge University Press. External Links: Document Cited by: §I.
- [14] (2021) Models genesis. Medical Image Analysis 67, pp. 101840. External Links: ISSN 1361-8415 Cited by: §I.
Comments
There are no comments yet.