1 Introduction
Deep convolutional neural networks have become the state-of-the-art approach to tackle segmentation problems in medicine
Ronneberger et al. (2015); Sirinukunwattana and others (2017). However, there are several challenges that hinder the training and deployment of deep learning models in this context. First of all, a considerable amount of annotated images is needed to train a deep model, and annotating datasets for image segmentation is a tedious and time-consuming task that requires expert knowledge Litjens and others (2017). Moreover, there is an important generalisation challenge when using trained models that is known as domain shift (also known as distribution shift) Arvidsson and others (2018); Choudary and others (2020). This problem arises when the data distribution of the dataset used for training a model is not the same than the data that the model encounters when deployed. This is common in biomedical datasets since images greatly vary due to experimental conditions, the equipment (for instance, microscopes) and settings (for instance, focus and magnification) employed for capturing those images.This generalisation problem can be tackled by combining datasets from multiple sources Irvin et al. (2019) or using techniques like data augmentation Simard et al. (2003)
; nevertheless, it is not possible to foresee every new and unknown distribution. A different approach consists in applying transfer learning
Razavian et al. (2014), a technique that, instead of training a model from scratch, reuses a model pre-trained in a source dataset to train a new model in a target dataset. However, this requires the annotation of the target dataset, a time-consuming task that should be carried out for every new dataset. A different approach to handle the domain shift problem is image-to-image translation Isola and others (2017), a set of techniques that aim to learn the mapping between an input image and an output image using a training set of aligned image pairs; however, this requires paired data from the source and target domains, a challenge that can be faced by using unpaired image-to-image translation Zhu and others (2017).Unpaired image-to-image translation methods translate an image from a domain to a domain , and vice versa, in the absence of paired examples. This approach has been already employed in several medical segmentation tasks; for instance, the segmentation of the left ventricle in magnetic resonance images Yan and others (2019), the segmentation of digitally reconstructed radiographs Zhang and others (2018), and the segmentation of magnetic resonance imaging (MRI), abdominal CT and MRI, and mammography X-rays Jinzheng and others (2019). All these works are based on variants of CycleGAN Zhu and others (2017)
, an unpaired image-to-image translation method based on Generative Adversarial Networks (GANs) that requires two datasets: one of them contains images from the distribution employed for training the segmentation model, and the other contains images acquired in a different setting. This approach poses two challenges. First, both datasets must be available, and this might be an issue due to privacy concerns
Adler-Milstein and Jha (2012); and, secondly, CycleGAN variants must be trained, a process that demands the usage of GPUs and might be challenging for several users due to the difficulties of training GAN models Salimans and others (2016). The approach proposed in this paper to tackle these drawbacks consists in using style transfer methods Gatys et al. (2016); that is, techniques that render the content of an image using the style of another. Those techniques do not require a training process, and it is enough with releasing one image of the dataset employed for training the model that suffers the domain shift problem.In this work, we have studied both unpaired image-to-image translation methods and style transfer techniques to deal with the domain shift problem. As a running example, we have focused on the problem of segmenting tumour spheroids Nath and Devi (2016). In this context, we have observed, see Section 2, that models that achieve a mean IoU over 97% when evaluating with data following the same distribution as the training set, fail when they are employed with data following a different distribution (the IoU is, in some cases, under 15%). We have faced this domain shift problem by using both unpaired image-to-image translation methods and style transfer techniques. Namely, the contributions of our work are:
-
We explore several state-of-the-art style transfer and unpaired image-to-image translation methods to tackle the domain shift problem in the context of tumour spheroid segmentation.
-
We demonstrate the effectiveness of using both style transfer and unpaired image-to-image translation methods to improve the performance of a variety of advanced deep segmentation networks.
-
We show that style transfer methods achieve similar performance than unpaired image-to-image translation methods with the advantage of skipping the training step.
-
We provide an API to apply the studied methods not only in the context of spheroid segmentation but in general for medical imaging tasks. The API is available at https://github.com/ManuGar/ImageStyleTransfer.
2 Materials
Spheroids are the most widely used 3D models to study cancer since they can be used for studying the effects of different micro-environmental characteristics on tumour behaviour and for testing different preclinical and clinical treatments Nath and Devi (2016). The images from tumour spheroids greatly vary depending on the experimental conditions, and also on the equipment (microscopes) and conditions (focus and magnification) employed to capture the images Lacalle and others (2021).
Dataset | Images | Image size | Microscope | Magnification | Format | Type |
---|---|---|---|---|---|---|
BL5S | 50 | 1296 966 | Leica | 5x | TIFF | RGB |
BN2S | 154 | 1002 1004 | Nikon | 2x | ND2 | Gray 16bits |
BN10S | 105 | 1002 1004 | Nikon | 10x | ND2 | Gray 16bits |
BO10S | 64 | 3136 2152 | Olympus | 10x | JPG | RGB |
For our experiments, we have employed the 4 datasets presented in Lacalle and others (2021); a description of those datasets is provided in Table 1, and an image of each dataset is shown in Figure 1. As can be noticed from Table 1 and Figure 1, there are considerable differences among the images of each dataset. Three of those datasets (the BL5S, BN2S, and BN10S datasets) were employed for training 4 segmentation models (using the algorithms DeepLab v3 Chen et al. (2018), HRNet Seg Wang et al. (2020), U-net Ronneberger et al. (2015) and U-Net Qin et al. (2020)) and the last dataset (the BO10S dataset) was employed for testing. We have used this dataset split because the last dataset comes from a different laboratory so its style will not be the same as the others. The definition of those 4 architectures is available in the SemTorch package111The SemTorch package is available at https://github.com/WaterKnight1998/SemTorch
. All the architectures were trained with the libraries PyTorch
Paszke et al. (2019) and FastAI Howard and Gugger (2020) and using a GPU Nvidia RTX 2080 Ti. In order to set the learning rate for the different architectures, we employed the procedure presented in Howard and Gugger (2020); and, we applied early stopping when training all the architectures to avoid overfitting. The metric employed to measure the accuracy of the different methods is the IoU, also known as Jaccard index — this metric measures the area of intersection between the ground truth and the predicted region over the area of union between the ground truth and the predicted region.
When the models were evaluated using a test set formed by images following the same distribution than the training set, the 4 models achieved a performance over 97%, see Table 2. On the contrary, when those models were employed with images captured under different conditions (namely, using the BO10S dataset), the performance of the models decrease by up to 84%. In the next sections, we explore how style transfer methods, and unpaired image-to-image translation models can serve to deal with the domain shift problem in this context.
DeepLab v3 | HRNet-Seg | U-Net | U2-Net | |
---|---|---|---|---|
BL5S-BN2S-BN10S | 97.00 | 97.32 | 97.25 | 97.26 |
BO10S | 83.61 | 92.65 | 13.64 | 95.65 |
3 Style transfer
This section is devoted to present how style transfer methods can handle the domain shift problem. In addition, we introduce the API that we have developed to facilitate the use of those methods. Finally, we present the results obtained by the spheroid segmentation models when applied to images transformed using style transfer techniques.
We start by explaining the procedure to apply style transfer methods to deal with the domain shift problem of a model — such a procedure is summarised in Figure 2. We assume that a model has been trained using a source dataset of images, and we are interested in applying such a model to obtain the prediction associated with an image from a different distribution than the source dataset; we call this image, the target image. Instead of feeding the target image directly to the model, we first take an image from the source dataset and transfer the style of that image to the target image but preserving its content producing a transformed image. Finally, the transformed image is fed to the model to obtain the associated prediction.
The key component of the aforementioned process is the algorithm that transfers the style from the source dataset but keeping the content of the target image. In the literature, there are several style transfer algorithms Liu and others (2019); but, for our experiments, we have focused on three of them: neural style transfer (NST) Gatys et al. (2016)
, an optimisation technique that uses a Convolutional Neural Network (CNN) to decompose the content and style from images; deep image analogy
Liao and others (2017), a method that finds semantically-meaningful correspondences between two input images by adapting the notion of image analogy with features extracted from a CNN; and STROTSS
Kolkin et al. (2019), a variant of the NST algorithm that changes the optimisation objective of NST.
It is worth noting that the style transfer approach presented here can be applied to deal with the domain shift problem not only for segmentation problems, as in our work, but also to other computer vision tasks. Hence, these methods can be helpful for a great variety of problems. However, it might be difficult to apply these techniques since they are implemented in different libraries and using different frameworks, and each of them has its own particularities. In this work, we have addressed this drawback by developing a high-level Python API that allows the integration of style transfer algorithms independently of their underlying library and framework. The API currently includes the aforementioned methods (the project webpage provides information about the library that implements each method) and can be easily extended with new techniques. In order to apply the previously introduced procedure using our API, users only have to provide the style image, the target image, and the name of the algorithm to apply; the rest of the transformation process is automatically conducted by the API.
In our running example of segmenting tumour spheroids, and using our API, we randomly picked an image from the combination of the datasets BL5S, BN2S, and BN10S, and used it to transform the images from the BO10S dataset. Subsequently, we fed those images to the segmentation models presented in Section 2, and evaluated their performance, see Table 3. From the three studied style transfer algorithms, both the NST and STROTSS algorithms handle the domain shift problem; whereas, the images transformed with the deep image analogy algorithm produce even worse results than the original images from the BO10S dataset. Using the NST algorithm, all the segmentation models improve their IoU (the U-Net model improves its performance from 13.64% to 89.21%, and the other models have an IoU close to 95%). For the STROTSS algorithm, the results are also positive: two of the segmentation models improve (DeepLab and U-Net), and the other two achieve worse results, but still their IoU is over 92%. In the next section, we extend this study with Unpaired image-to-image translation methods.
DeepLab v3 | HRNet-Seg | U-Net | U-Net | |
---|---|---|---|---|
Base | 83.61 | 92.65 | 13.64 | 95.65 |
NST | 95.64 | 94.91 | 89.21 | 95.89 |
Deep Image Analogy | 0.00 | 45.13 | 0.66 | 0.84 |
STROTSS | 94.86 | 92.38 | 78.08 | 94.14 |
4 Unpaired image-to-image translation
We focus now on the procedure to apply unpaired image-to-image translation methods to tackle the domain shift problem, see Figure 3. Analogously to the style transfer approach, we assume that a model has been trained using a source dataset of images, and we have a dataset of images with a different data distribution called the target dataset. From the source and target datasets, we build a model that transforms images following the data distribution of the source dataset to the data distribution of the target dataset, and vice versa. Now, when we are interested in obtaining the prediction associated with a target image, we first employ the transformation model to transform the image; and, subsequently, the transformed image is fed to the prediction model.
In this approach, the key component is the algorithm employed to construct the transformation model. Currently, the most successful approaches for this task are based on Generative Adversarial Networks (GANs) Goodfellow and others (2014); and, are variants of the CycleGAN algorithm Zhu and others (2017), which translates an image from a source domain to a target domain by learning two mappings and such that they satisfy the cycle-consistency properties; that is and . For our experiments, we have studied 6 algorithms: CycleGAN Zhu and others (2017), DualGAN Yi and others (2017), ForkGAN Zheng and others (2020), GANILLA Hicsonmez and others (2020), CUT Park et al. (2020), and FASTCUT Park et al. (2020).

As we have previously mentioned for the style transfer methods, the aforementioned procedure for unpaired image-to-image translation can be applied to several kinds of bioimaging problems Palladino et al. (2020). Hence, we have extended the API presented in the previous section to provide access to unpaired image-to-image translation techniques. In this case, users of the API only have to provide the path to the source and target datasets, and the name of the translation algorithm to apply (the 6 algorithms previously mentioned are available in the API, and new methods can be easily included); after that, the transformation model is automatically trained and the images from the target dataset are transformed.
In our running example, the source dataset was formed by the combination of the datasets BL5S, BN2S, and BN10S; and the target dataset was the BO10S dataset. For training the translation models, we employed a GPU Nvidia RTX 2080 Ti, and the results obtained by the segmentation models when they were fed with the transformed images are summarised in Table 4. From the 6 studied algorithms, only the CycleGAN method solves the domain shift problem in our context. Using the transformation model produced by this algorithm, all the segmentation models improve their IoU (the U-Net model improves its performance from 13.64% to 72.34%, and the other models have an IoU over to 92%). On the contrary, the images produced by the rest of the transformations models are more difficult to segment, and the performance of the segmentation models decreases — the exception is the U-Net model that, in some cases, obtains better results with the transformed images, but still its IoU is under 40%.
DeepLab v3 | HRNet-Seg | U-Net | U2-Net | |
---|---|---|---|---|
Base | 83.61 | 92.65 | 13.64 | 95.65 |
CycleGAN | 94.97 | 92.97 | 72.34 | 95.87 |
DualGAN | 4.09 | 73.37 | 24.67 | 34.45 |
ForkGAN | 32.63 | 46.10 | 38.33 | 44.46 |
GANILLA | 24.27 | 76.24 | 3.26 | 82.97 |
CUT | 0.48 | 38.01 | 20.94 | 52.20 |
FastCUT | 6.08 | 79.52 | 1.12 | 2.98 |
5 Discussion
In the previous sections, we have demonstrated that both style transfer and unpaired image-to-image translation algorithms can be applied to deal with the domain shift problem in the context of segmenting tumour spheroids. However, there are only three successful methods: NST, STROTSS, and CycleGAN. Two of those methods are style transfer algorithms (NST and STROTSS), this is a relevant result since these methods do not require the training step of image-to-image translation models, can be run in a computer without special purpose hardware like GPUs, and only require the availability of an image from the source dataset. The most likely reason for the failure of most unpaired image-to-image translation is the challenge of training their underlying GAN models Salimans and others (2016); therefore, more research is necessary to facilitate the use of this kind of models.
Image | DeepLab v3 | HRNet Seg | U-Net | U-Net | |
---|---|---|---|---|---|
Base | ![]() |
![]() |
![]() |
![]() |
|
NST | ![]() |
![]() |
![]() |
![]() |
![]() |
Deep Image Analogy | ![]() |
![]() |
![]() |
![]() |
![]() |
STROTSS | ![]() |
![]() |
![]() |
![]() |
![]() |
CycleGAN | ![]() |
![]() |
![]() |
![]() |
![]() |
DualGAN | ![]() |
![]() |
![]() |
![]() |
![]() |
ForkGAN | ![]() |
![]() |
![]() |
![]() |
![]() |
GANILLA | ![]() |
![]() |
![]() |
![]() |
![]() |
CUT | ![]() |
![]() |
![]() |
![]() |
![]() |
FastCUT | ![]() |
![]() |
![]() |
![]() |
![]() |
We can visually inspect the images produced by the different transformation algorithms to discover the difficulties faced by the segmentation models, see Figure 4. We can notice that the three successful models (NST, STROTSS, and CycleGAN) produce images that preserve the content of the image but with a style that is similar to the style of those used for training the segmentation models. On the contrary, the deep image analogy method, and the DualGAN and ForkGAN models do not keep the content of the image; and, thus the segmentation models are not able to properly segment the images. For the rest of the image-to-image transformation models (GANILLA, CUT and FastCUT), the content of the image is kept, but the style is not properly transferred (colour artefacts are added to the transformed image); hence, the images are not segmented properly by the models.
From Figure 4, we can also appreciate the sensibility of the segmentation models to variations in the input image. The HRNet Seg and U-net models are more robust than the DeepLab and U-net models — recall that all the models achieved an IoU over 97% when evaluating in data from the distribution of the training set. Hence, the style transfer and image-to-image translation methods can be employed not only to deal with the domain shift problem of computer vision models, but also to evaluate the robustness of such models.
6 Conclusions
In this paper, we have studied the benefits of applying unpaired style transfer techniques and image-to-image translation methods to deal with the domain shift problem in the context of tumour segmentation. The results show us that, using those translation methods, it is possible to recover the performance of a model that suffers from the domain shift problem. In addition, we have shown that style transfer methods achieve similar results to those obtained by image-to-image translation methods with the advantage of not requiring a training step, and can be deployed by providing a single image from the source dataset. We have also noticed that style transfer techniques and image-to-image translation methods have a different impact on the performance of the segmentation models; hence, it is important to have a simple approach to test different algorithms. This has been solved in this work with the development of a high-level API that facilitates the process of testing different alternatives for style transfer and unpaired image-to-image translation.
Competing interests
The authors declare that they have no competing interests.
Funding
This work was partially supported by Ministerio de Ciencia e Innovación [PID2020-115225RB-I00 / AEI / 10.13039/501100011033]. Manuel García-Domínguez has a FPI grant from Community of La Rioja 2018.
References
- Sharing clinical data electronically: a critical challenge for fixing the health care system. JAMA 307 (16), pp. 1695–1696. External Links: Document Cited by: §1.
- Generalization of prostate cancer classification for multiple sites using deep learning. In IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pp. 191–194. External Links: Document Cited by: §1.
- Encoder-decoder with atrous separable convolution for semantic image segmentation. In Computer Vision – ECCV 2018, pp. 833–851. External Links: Document Cited by: §2.
- Advancing medical imaging informatics by deep learning-based domain adaptation. Year book of medical informatics 29 (1), pp. 129–138. External Links: Document Cited by: §1.
- A neural algorithm of artistic style. Journal of Vision 16, pp. 326. External Links: Document Cited by: §1, §3.
- Generative adversarial networks. In 28th International Conference on Neural Information Processing Systems (NIPS’14), pp. 2672–2680. Cited by: §4.
- GANILLA: generative adversarial networks for image to illustration translation. Image and Vision Computing 95, pp. 103886. External Links: Document Cited by: §4.
- Deep learning for coders with fastai and pytorch. O’Reilly. Cited by: §2.
-
CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison.
In
Thirty-Third AAAI Conference on Artificial Intelligence
, AAAI’19, Vol. 33, pp. 590–597. Cited by: §1. -
Image-to-image translation with conditional adversarial networks.
In
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pp. 5967–5976. External Links: Document Cited by: §1. - Towards cross-modal organ translation and segmentation: a cycle- and shape-consistent generative adversarial network. Medical Image Analysis 52, pp. 174 – 184. External Links: ISSN 1361-8415, Document Cited by: §1.
- Style transfer by relaxed optimal transport and self-similarity. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10043–10052. External Links: Document Cited by: §3.
-
SpheroidJ: An Open-Source Set of Tools for Spheroid Segmentation
. Computer Methods and Programs in Biomedicine , pp. . External Links: Document Cited by: §2, §2. - Visual attribute transfer through deep image analogy. ACM Transactions on Graphics 36 (4), pp. 120:1–120:15. External Links: Document Cited by: §3.
- A survey on deep learning in medical image analysis. Medical Image Analysis 42, pp. 60–88. External Links: Document Cited by: §1.
- Advanced deep learning techniques for image style transfer: a survey. Signal Processing: Image Communication 78, pp. 465–470. External Links: Document Cited by: §3.
- Three-dimensional culture systems in cancer research: Focus on tumor spheroid model. Pharmacology & Therapeutics 163, pp. 94–108. External Links: Document Cited by: §1, §2.
- Unsupervised domain adaptation via CycleGAN for white matter hyperintensity segmentation in multicenter MR images. In 16th International Symposium on Medical Information Processing and Analysis, J. Brieva, N. Lepore, M. G. Linguraru, and E. R. C. M.D. (Eds.), Vol. 11583, pp. 1 – 10. External Links: Document, Link Cited by: §4.
- Contrastive learning for unpaired image-to-image translation. In European Conference on Computer Vision, Cited by: §4.
- PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pp. 8024–8035. Cited by: §2.
- U-net: going deeper with nested u-structure for salient object detection. Pattern Recognition 106, pp. 107404. External Links: Document Cited by: §2.
- CNN features off-the-shelf: An astounding baseline for recognition. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW’14, pp. 512–519. Cited by: §1.
- U-net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Lecture Notes in Computer Science, Vol. 9351, pp. 234–241. External Links: Document Cited by: §1, §2.
- Improved techniques for training gans. In 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 2234–2242. Cited by: §1, §5.
- Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR’03), Vol. 2, pp. 958–964. Cited by: §1.
- Gland segmentation in colon histology images: the glas challenge contest. Medical Image Analysis 35, pp. 489–502. External Links: Document Cited by: §1.
- Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. External Links: Document Cited by: §2.
- The domain shift problem of medical image segmentation and vendor-adaptation by unet-gan. In Medical Image Computing and Computer Assisted Intervention (MICCAI 2019), pp. 623–631. External Links: Document Cited by: §1.
- DualGAN: unsupervised dual learning for image-to-image translation. In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2868–2876. External Links: Document Cited by: §4.
- Task driven generative modeling for unsupervised domain adaptation: application to x-ray image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 599–607. Cited by: §1.
- ForkGAN: seeing into the rainy night. In 16th European Conference on Computer Vision, pp. 1–16. Cited by: §4.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE International Conference on Computer Vision (ICCV 2017), pp. 2242–2251. External Links: Document Cited by: §1, §1, §4.