Log In Sign Up

SinGAN-Seg: Synthetic Training Data Generation for Medical Image Segmentation

by   Vajira Thambawita, et al.

Processing medical data to find abnormalities is a time-consuming and costly task, requiring tremendous efforts from medical experts. Therefore, Ai has become a popular tool for the automatic processing of medical data, acting as a supportive tool for doctors. AI tools highly depend on data for training the models. However, there are several constraints to access to large amounts of medical data to train machine learning algorithms in the medical domain, e.g., due to privacy concerns and the costly, time-consuming medical data annotation process. To address this, in this paper we present a novel synthetic data generation pipeline called SinGAN-Seg to produce synthetic medical data with the corresponding annotated ground truth masks. We show that these synthetic data generation pipelines can be used as an alternative to bypass privacy concerns and as an alternative way to produce artificial segmentation datasets with corresponding ground truth masks to avoid the tedious medical data annotation process. As a proof of concept, we used an open polyp segmentation dataset. By training UNet++ using both the real polyp segmentation dataset and the corresponding synthetic dataset generated from the SinGAN-Seg pipeline, we show that the synthetic data can achieve a very close performance to the real data when the real segmentation datasets are large enough. In addition, we show that synthetic data generated from the SinGAN-Seg pipeline improving the performance of segmentation algorithms when the training dataset is very small. Since our SinGAN-Seg pipeline is applicable for any medical dataset, this pipeline can be used with any other segmentation datasets.


page 6

page 7

page 8

page 9


3-D PET Image Generation with tumour masks using TGAN

Training computer-vision related algorithms on medical images for diseas...

Weakly supervised segmentation from extreme points

Annotation of medical images has been a major bottleneck for the develop...

Synthetic Tumors Make AI Segment Tumors Better

We develop a novel strategy to generate synthetic tumors. Unlike existin...

Virtual passengers for real car solutions: synthetic datasets

Strategies that include the generation of synthetic data are beginning t...

STAN: Synthetic Network Traffic Generation using Autoregressive Neural Models

Deep learning models have achieved great success in recent years. Howeve...

Towards Automatic Parsing of Structured Visual Content through the Use of Synthetic Data

Structured Visual Content (SVC) such as graphs, flow charts, or the like...

BinarySDG: binary sensor data generation with R

The scarcity of Smart Home data is still a pretty big problem, and in a ...

1 Introduction

ai has become a popular tool in medicine and has been vastly discussed in recent decades to augment performance of clinicians [15, 6, 20, 14]. According to the statistics discussed by Jiang et al. [15], ann [18] and svm [11] are the most popular ml algorithms used with medical data. These ml models learn from data; thus the medical data have a direct influence on the success of ml solutions in real applications. While the svm algorithms are popular within regression [10, 28] and classification [41] tasks, ann or dnn are used widely for all the types; regression, classification, detection and segmentation.

A segmentation model makes more advanced predictions than regression, classification, and detection as it performs pixel-wise classification of the input images. Therefore, medical image segmentation is a popular application of ai in medicine, so it is used more widely with different kinds of medical image data [22, 29, 17]. Polyp segmentation is one of popular segmentation tasks that uses ml techniques to detect and segment polyps in images/videos collected from gi screenings. Early identification of polyps in gi tract is critical to prevent colorectal cancers [37]. Therefore, many ml models have been investigated to segment polyps automatically in gi tract videos recorded from endoscopy [32, 12, 31] or PilCams examinations [23, 13, 8] to augment performance of doctors by detecting polyps missed by experts, thereby both decreasing the miss rates and reducing the observer variations.

Most of polyp segmentation models are based on cnn and are trained using publicly available polyp segmentation datasets [3, 2, 27, 30, 25]. However, these datasets have a limited number of images with corresponding expert annotated masks. For examples, the CVC-VideoClinicDB [2] dataset has images from polyp videos and non-polyp videos, the PICCOLO dataset [25] has manually annotated images ( white-light images and narrow-band images), and the Hyper-Kvasir [3] dataset has only segmented images, but also contains of unlabeled images.

We identified two main reasons for having small datasets in medical domain compared to other domains. The first reason is privacy concerns attached with medical data, and the second is the costly and time-consuming medical data annotation processes that the medical domain experts must perform.

The privacy concerns can vary from country to country and region to region according to data protection regulations introduced in the specific areas. For example, Norway should follow the rules given by the Norwegian data protection authority (NDPA) [34] and enforce the personal data act [35] in addition to following the gdpr [38] guidelines being the same for all European countries. While there is no central level privacy protection guideline in the US like gdpr in Europe, US rules and regulations are enforced through other US privacy laws, such as Health Insurance Portability and Accountability Act (HIPAA) [7] and California Consumer Privacy Act (CCPA) [5]. In Asian counties, they follow their own sets of rules, such as Japan’s Act on Protection of Personal Information [1], the South Korean Personal Information Protection Commission [21] and the Personal Data Protection Bill in India [36].

If research is performed with such privacy restrictions, the papers published are often theoretical methods only. According to the analyzed medical image segmentation studies in [24], have used private datasets. As a result, the studies are not reproducible. Researchers must keep datasets private due to medical data sharing restrictions. Furthermore, universities and research institutes that use medical domain data for teaching purposes use the same medical datasets for years, which affects the quality of education. In addition to the privacy concerns, the costly and time-consuming medical data labeling and annotation process [39] is an obstacle to producing big datasets for ai algorithms. Compared to other already time-consuming medical data labeling processes, a pixel-wise data annotation are far more demanding on the valuable medical experts’ time. The experts in the medical domain can perform the annotations fully trustable in terms of correctness. If the data annotations by experts are not possible, the experts should do at least a review process to make the annotations correct before using them in ai algorithms. The importance of having accurate annotations from experts for medical data is, for example, discussed by Yu et al. [40] using a mandible segmentation dataset of CT images. In this regard, researching a way to produce synthetic segmentation datasets is important to overcome the timely and costly medical data annotation process. Therefore, researching an alternative way for medical data sharing, bypassing both the privacy and time-consuming dataset generation challenges, is the main objective of this study.

In this regard, the contributions of this paper are as follows.

  • This study introduces the novel SynGAN-Seg pipeline to generate synthetic medical image and its corresponding segmentation mask using a modified version of the state-of-the-art SinGAN architecture with a fine-tuning step using a style-transfer method. We use polyp segmentation as a case study, the SinGAN-Seg can be applied for all types of segmentation tasks.

  • We have published the biggest synthetic polyp dataset and the corresponding masks at Moreover, we have published our generators as a python package at pypi ( to generate an unlimited number of polyps and corresponding mask images as needed. To the best of our knowledge, this is the first publicly available synthetic polyp dataset and the corresponding generative functions as a pypi package.

  • We show that synthetic images and corresponding mask images can improve the segmentation performance when the size of a training dataset is limited.

2 Method

In the pipeline of SinGAN-Seg, there are as depicted in Figure 1 two main steps: (1) training novel SinGAN-Seg generative models and (2) style transferring. The first step generates synthetic polyp images and corresponding binary segmentation masks representing the polyp area. The novel four channels SinGAN-Seg, based on the vanilla SinGAN architecture [26], is introduced in this first step. The novel training process of four channels SinGAN-Seg models is presented in this step. Using a single SinGAN-Seg model, we can generate multiple synthetic images and masks from a single real image and the corresponding masks. Therefore this generation process can be identified as generations, and it is denoted using , where represents the number of samples generated in the figure. The second step focuses on transferring styles such as features of polyps’ texture from real images into the corresponding generated synthetic images. This second step is depicted in the Step 2 in Figure  1.

Figure 1: The complete pipeline of SinGAN-Seg to generate synthetic segmentation datasets. Step 1: represents the training of four channels SinGAN models. Step 2: represents fine tuning step using the neural style transfer [9]. Four channels SinGAN: Single training step of our four-channels SinGAN. Note the stacked input and output compared to the original SinGAN implementation  [26]

which input only single image with a noise vector and output only an image. In our SinGAN implementation, all the generators (from

to ), except , get four channels image (a polyp image and a ground truth) as the input in addition to the input noise vector. The first generator, get only the noise vector as the input. The discriminators also get four channels images which consist of a RGB polyp image and a binary mask as input. The inputs to the discriminators can be either real or fake.

SinGAN-Seg is a modified version of SinGAN [26] which was designed to generate synthetic data from a gan trained only using a single image. The original SinGAN is trained using different scales of the same input image, the so-called image pyramid. This image pyramid is a set of images of different resolutions of a single image from low resolution to high resolution. SinGAN consists of a gan pyramid, which takes the corresponding image pyramid. In this study, we build on the implementation and the training process used in SinGAN, except for the number of input and output channels. The original SinGAN implementation [26] uses a three-channel RGB image as the input and produces a three-channel RGB image as the output. However, our SinGAN-Seg uses four-channels images as the input and the output. The four-channels image consist of the input RGB image and the single channel ground truth mask by stacking them together as depicted in the SinGAN-Seg model in Figure 1. The main purpose of this modification is to generate four-channels synthetic output, which consists of a synthetic image and the corresponding ground truth mask.

In the second step of the SinGAN-Seg pipeline, we fine-tune the output of the four channels SinGAN-Seg model using the style-transfer method introduced by Leon et al. [9]. This step aims to improve the quality of the generated synthetic data by transferring realistic styles from real images to synthetic images. As depicted in Step 2 in Figure 1, every generated image is enhanced by transferring style form the corresponding real image . Then, the style transferred output image is presented using where in this study, representing the images in thr training dataset. In this process, a suitable ratio should be found, and it is a hyper-parameter in this second stage. However, this step is a separate training step from the training step of the SinGAN-Seg generative models. Therefore, this step is optional to follow, but we strongly recommend this style-transferring step to enhance the quality of the output data from the first step.

3 Experiments and results

This section demonstrates all the experiments and results collected using a polyp dataset as a case study. For all the experiments discussed in the following sections, we have used Pytorch deep learning framework 


3.1 Data

We have used a polyp dataset published with HyperKvasir dataset [3] which consists of polyp findings extracted from endoscopy examinations. This polyp dataset has polyp findings and a corresponding segmentation mask annotated by experts. We use only the polyp dataset as a case study because of the time and resource-consuming training process of the SinGAN-Seg pipeline. Furthermore, we use three-fold cross-validation, which is another time-consuming technique, for the experiments performed to find the validity of using synthetic data instead of real data.

A few sample images and the corresponding masks of the polyp dataset of HyperKvasir are depicted in Figure 2. The polyp images of the dataset are RGB images. The masks of the polyp images are single-channel images with white () for true pixels, which represent polyp regions, and black () for false pixels, which represent clean colon or background regions. In this dataset, there are different sizes of polyps. The distribution of polyp sizes as a percentage of the full image size is presented in the histogram plot in Figure 3. In this dataset, there are more relatively small polyps compared to larger polyps according to the plot presented in Figure 3. Additionally, this dataset was used to prove that the performance of segmentation models trained with small datasets can be improved using our SinGAN-Seg pipeline.

This dataset was used for two purposes.

  1. To train SinGAN-Seg models to generate synthetic data.

  2. To compare performance of real and synthetic data for training segmentation ml models.

Figure 2: Sample images and corresponding masks from HyperKvasir [3] segmentation images.
Figure 3: Distribution of true pixel percentages of the polyp masks of HyperKvasir [3] dataset.

3.2 Training Generators

To use SinGAN-Seg to generate synthetic segmentation datasets to represent real segmentation datasets, we first trained SinGAN-Seg models one by one for each image in the training dataset. In our case study, there were polyp images and corresponding ground truth masks. Therefore, SinGAN-Seg models were trained. To train these SinGAN-Seg models, we have followed the same SinGAN settings used in the vanilla SynGAN paper [26]. Despite using the original training process, the input and output of SinGAN-Seg are four channels. After training each SinGAN-Seg by iterating epochs per scale of pyramidal gan structure (see four channels SinGAN-Seg architecture in Figure 1 to understand this pyramidal gan structure), we stored final checkpoints to generate synthetic data in the later stages from the each scale. The resolution of the training image of the SinGAN-Seg model is arbitrary because it depends on the size of the real polyp image. This input image is resized according to the pyramidal re-scaling structure introduced in the original implementation of SinGAN [26]. This rescaling pattern is depicted in the four channels SinGAN architecture in Figure 1. The re-scaling pattern used to train SinGAN-Seg models is used to change the randomness of synthetic data when pre-trained models are used to generate synthetic data. The models were trained on multiple computing nodes such as Google Colab with Tesla P100 16GB GPUs and a DGX-2 GPU server with 16 V100 GPUs because training gan architectures one by one is a tremendous task. The average training time per SinGAN-Seg model was around minutes.

Figure 4: Sample real images and corresponding SinGAN generated synthetic GI-tract images with corresponding masks. The first column is illustrated with real images and masks. All other columns represent randomly generated synthetic data from SinGANs which were trained from the image on the first column.

After training SinGAN-Seg models, we have generated random samples per real image using the input scale , which is the lowest scale that use a random noise input instead of a rescaled input image. For more details about these scaling numbers and corresponding output behaviors, please refer to the vanilla SinGAN paper [26]. Randomly selected three training images and the corresponding first synthetic images generated using scale are depicted in Figure 4. The first column of the figure represents the real images and the ground truth mask annotated from experts. The rest of the columns represents randomly generated synthetic images, and the corresponding generated mask.

In total, we have generated synthetic polyp images and the corresponding masks. SinGAN-Seg generates random samples with high variations when the input scale is

. This variation can be easily recognized using the standard deviation (std) and the mean mask images presented in Figure 

5. The mean and std images were calculated by stacking the generated mask images corresponding to the

synthetic images related to a real image and calculating pixel-wise std and mean. Bright color in std images and dark color in mean images mean low variance of pixels. In contrast, dark color in std and bright color in mean images reflect high variance in pixel values. By investigating Figure 

5, we can notice that small polyp masks have high variance compared to the large polyp mask as presented in the figure.

To understand the difference between the mask distribution of real images and synthetic images, we plotted pixel distribution of masks of synthetic images in Figure 6. This plot is comparable to the pixel distribution presented in Figure 3. The randomness of generations made differences in the distribution of true pixel percentages compared to the true pixel distribution of real masks of real images. However, the overall shape of synthetic data mask distribution shows a more or less similar distribution pattern to the real true pixel percentage distribution.

Figure 5: Mean and standard deviation calculated from random mask generated from SinGAN-Seg. The corresponding real mask annotated from expertes can be seen in Figure 4.
Figure 6: Distribution of masks of the synthetic generations. This represent the real polyp images. From each real image, synthetic samples were generated. The synthetic dataset can be downloaded from

3.3 Style Transferring

After finishing the training of SinGAN-Seg models, the style transfer algorithm [9] was applied to every synthetic sample generated from SinGAN-Seg. In the style-transferring algorithm, we can change several parameters such as the number of epochs to transfer style from an image to another and the weight ratio. This paper used a epoch to transfer style from a style image (real polyp image) to a content image (generated synthetic polyp). For performance comparisons, two ratios, and were used. An NVIDIA GeForce RTX 3080 GPU took around seconds to transfer style for a single image.

Figure 7: Direct generations of SinGAN-Seg versus Style transferred samples. The style transferring was performed using content to style ratio.

We have depicted visual comparison between pure generated synthetic images and style transferred images ( = ) in Figure 7. Samples with the style transfer ratio are not depicted here because it is difficult to see the differences between and visually. The first column of Figure 7 shows the real images used as content images to transfer styles. The rest of the images in the first row of each image shows synthetic images generated from SinGAN-Seg before applying the style transferring algorithm. Then, the second row of each image shows the style transferred synthetic images. Differences of the synthetic images before and after applying the style transfer method can be easily recognized from images of the second reference image (using and rows in Figure 7).

3.4 Python package and synthetic data

Using all the pre-trained SinGAN-Seg checkpoints, we have published a pypi package and the corresponding GitHub repository to make all the experiments reproducible. Additionally, we have published the first synthetic polyp dataset to demonstrate how to share synthetic data instead of a real dataset that may have privacy concerns. The synthetic dataset is available at Moreover, this is an example synthetic dataset generated using the SinGAN-Seg pipeline. Furthermore, this dataset is an example showing how to increase a segmentation dataset size without using the time-consuming and costly medical data annotation process that needs experts’ knowledge.

We named this pypi package as singan-seg-polyp (pip install singan-seg-polyp) and it can be found here: To the best of our knowledge, this is the only pypi package to generate an unlimited number of synthetic polyps and corresponding masks. The corresponding GitHub repository is available at A set of functionalities were introduced in this package for end-users. Generative functions can generate random synthetic polyp data with their corresponding mask for a given image id from to or for the given checkpoint directory, which is downloaded automatically when the generative functions are called. The style transfer function is in this package to transfer style from the real polyp images to the corresponding synthetic polyp images. In both functionalities, the relevant hyper-parameters can be changed as needed to end-users of this pypi package.

3.5 Baseline experiments

Two different sets of baseline experiments were performed for two different objectives. The first objective was to compare the quality of generated synthetic data over the real data. Using these baseline experiments, we can identify the capability of sharing SinGAN-Seg synthetic data instead of the real datasets for omitting privacy concerns. The second objective was to test how to use SinGAN-Seg pipeline to improve the segmentation performance when the size of training dataset of real images and masks are small. For all the baseline experiments, we selected Unet++ [42] as the main segmentation model according to the performance comparison done by the winning team at EndoCV 2021 [31]

. The single-channel dice loss function used in the same study was used to train Unet++ polyp segmentation models. The

se_resnext50_32x4d network as the encoder of the UNet++ model and softmax2d

as the activation function of the last layer were used according to the result of the winning team at EndoCV 2021 


Pytorch deep learning library was used as the main development framework for the baseline experiments also. Training data stream was handled using PYRA [32] data loader with Albumentations augmentation library [4]. The real images and the synthetic images were resized into using this data handler for all the baseline experiments to save training time because we had to train multiple models for fair comparisons. We have used an initial learning rate of for epochs and then change it to for the rest of the training epochs for all the training processes of UNet++. The UNet++ models used to compare real versus synthetic data were trained epochs in total. On the other hand, the UNet++ models used to measure the effect of using SinGAN-Seg synthetic data for small segmentation datasets were trained only

epochs because the size of the data splits used to train the models are getting bigger when increasing the training data. In all the experiments, we have selected the best checkpoint using the best validation IOU score. Finally, dice loss, IOU score, F-score, accuracy, recall, and precision were calculated for comparisons using validation folds.

3.5.1 Synthetic data vs real data for segmentation

We have performed three-folds cross-validation to compare polyp segmentation performance using UNet++ when using real and synthetic data. First, we divided the real dataset ( polyp images and the corresponding segmentation masks) into three folds. Then, the trained SynGAN-Seg generative models and the corresponding generated synthetic data were also divided into the same three folds. These three folds are presented using three colors in Step I of Figure 1. In any of the experiments, training data folds and corresponding synthetic data folds were not mixed with the validation data folds. If mixed, it leads to a data leakage problem.

Figure 8: Three step experiment setup to analyze the quality of SinGAN output.

Then, the baseline performance of the UNet++ model was evaluated using the three folds of the real data. In this experiment, the UNet++ model was trained using two folds and validated using the remaining fold of the real data. In total, three UNet++ models were trained and calculated the average performance using dice loss, IOU score, F-score, accuracy, recall, and precision only for the polyp class because the most important class of this dataset is the polyp class. This three-fold baseline experiment setup is depicted on the left side of Figure 8.

The usability of synthetic images and corresponding masks generated from SinGAN-Seg was investigated using three-fold experiments as organized in the right side of Figure 8. In this case, UNet++ models were trained only using synthetic data generated from pre-trained generative models and tested using the real data folds, which were not used to train the generative models used to generate the synthetic data. Five different amount of synthetic data per image were used to train UNet++ models. This data organization process can be identified easily using the color scheme of the figure. To test the quality of pure generations, first, we used the direct output from SinGAN-Seg to train UNet++ models. Then, the style transfer method was applied with content to style ratio for all the synthetic data. These style transferred images were used as training data and tested using the real dataset. In addition to the ratio, was tested as a style transfer ratio for the same set of experiments.

Train data ST (cw:sw) dice_loss iou_score fscore accuracy recall precision
REAL NA 0.1123 0.8266 0.8882 0.9671 0.8982 0.9161
FAKE-1 No ST 0.1645 0.7617 0.8357 0.9531 0.863 0.8793
1:1 0.1504 0.7782 0.85 0.9572 0.8672 0.8917
1:1000 0.1473 0.782 0.853 0.9591 0.8624 0.9005
FAKE-2 No ST 0.1549 0.7729 0.8453 0.9561 0.8692 0.8895
1:1 0.155 0.7765 0.8453 0.9575 0.8729 0.8852
1:1000 0.1477 0.7854 0.8525 0.9609 0.8647 0.9038
FAKE-3 No ST 0.161 0.7683 0.8391 0.9556 0.8568 0.8945
1:1 0.1475 0.7845 0.8525 0.9585 0.8723 0.8936
1:1000 0.1408 0.7923 0.8593 0.9629 0.8693 0.9078
FAKE-4 No ST 0.1649 0.7638 0.8352 0.9525 0.8669 0.878
1:1 0.1464 0.7848 0.8537 0.9594 0.8713 0.8921
1:1000 0.137 0.7983 0.863 0.9636 0.8653 0.9185
FAKE-5 No ST 0.1654 0.7668 0.8345 0.9563 0.8565 0.8919
1:1 0.1453 0.7887 0.8547 0.961 0.8703 0.9
1:1000 0.1458 0.7889 0.8543 0.962 0.8527 0.9211
Table 1: Three-fold average of basic metrics to compare real vs synthetic performance with UNet++ and the effect of style-transfers performance
Figure 9: Real versus synthetic data performance comparison with UNet++ and the effect of applying the style-transferring post processing.

Table 1 shows the results collected from the UNet++ segmentation experiments for the baseline experiment and the experiments conducted with synthetic data, which contains pure generated synthetic data and style transferred data using and . Differences in IOU scores of these three experiments are plotted in Figure 9 for easy comparison.

3.5.2 Synthetic segmentation data for real small datasets

samples samples samples samples samples
samples samples samples samples samples
Figure 10: Distribution comparison between real and synthetic mask. Synthetic mask were generated using the SinGAN-Seg.

The main purpose of these experiments are to find the effect of using synthetic data generated from the SinGAN-Seg pipeline instead of small real datasets because the SinGAN-Seg pipeline can generate an unlimited number of synthetic samples per real image. A synthetic sample consists of a synthetic image and the corresponding ground truth mask. Therefore, experts’ knowledge is not required to annotate the ground truth mask. For these experiments, we have selected the best parameters of the SinGAN-Seg pipeline from the experiments performed under Section 3.5.1. First, we created small real polyp datasets from the fold one such that each dataset contains number of images and can be one of the values of . The corresponding synthetic dataset was created by generating synthetic images and corresponding masks per real image. Then, our synthetic datasets consist of number of images such that . Then we have compared true pixel percentages of real masks and synthetic masks generated from SynGAN-Seg pipeline using histograms of bin size of . The histograms are depicted in Figure 10. The first row represents the histograms of real small detests, and the second row represents the histograms of corresponding synthetic datasets. Compare pairs (one from the top row and the corresponding one from the bottom) to get a clear idea of how the generated synthetic data improved the distribution of masks.

dice_loss iou_score fscore accuracy recall precision
Real 5 0.4662 0.4618 0.5944 0.8751 0.7239 0.6305
Fake 50 0.3063 0.5993 0.7048 0.9211 0.7090 0.8133
Real 10 0.3932 0.5969 0.7079 0.9164 0.7785 0.7516
Fake 100 0.2565 0.6478 0.7457 0.9259 0.7911 0.7970
Real 15 0.2992 0.6431 0.7402 0.9322 0.7388 0.8602
Fake 150 0.2852 0.6559 0.7624 0.9329 0.8172 0.7833
Real 20 0.3070 0.6680 0.7668 0.9328 0.7771 0.8566
Fake 200 0.2532 0.6569 0.7544 0.9342 0.7317 0.8827
Real 25 0.2166 0.6995 0.7929 0.9405 0.7955 0.8804
Fake 250 0.2182 0.6961 0.7860 0.9418 0.7690 0.8957
Real 30 0.2100 0.7037 0.7971 0.9417 0.8005 0.8758
Fake 300 0.2228 0.6843 0.7797 0.9388 0.7683 0.8810
Real 35 0.2164 0.6955 0.7889 0.9398 0.8157 0.8456
Fake 350 0.2465 0.6677 0.7543 0.9346 0.7385 0.8933
Real 40 0.2065 0.7085 0.7974 0.9417 0.7881 0.8947
Fake 400 0.2194 0.6894 0.7816 0.9305 0.8276 0.8219
Real 45 0.1982 0.7188 0.8062 0.9441 0.8120 0.8839
Fake 450 0.2319 0.6794 0.7697 0.9341 0.7859 0.8633
Real 50 0.2091 0.7115 0.7948 0.9418 0.7898 0.8932
Fake 500 0.2255 0.6896 0.7756 0.9380 0.7961 0.8644
Table 2: Real vs Fake comparisons for small datasets. The fake images were generated using style tranfer ration .
Figure 11: Real versus Fake performance comparison with small training datasets

UNet++ segmentation models were trained using these real and synthetic datasets separately. Then we have compared the performance differences using validation folds. In this experiments, the training datasets were prepared using the fold one. The remaining two folds were used as the validation dataset. The collected results from UNet++ models trained with the real datasets and the synthetic datasets are tabulated in Table 2. A comparison of the corresponding IOU scores are plotted in Figure 11.

4 Discussion

The SinGAN-Seg pipeline has two steps. The first one is generating synthetic polyp images and the corresponding ground truth masks. The second is transferring style from real polyp images to synthetic polyp images to make them more realistic than the pure generations from the first step. We have developed this pipeline to achieve the main two goals. The first one is for sharing medical data when privacy concerns are to share real data. The second one uses is to improve the polyp segmentation performance when the size of training datasets are small.

4.1 SinGAN-Seg as data sharing technique

The SinGAN-Seg can generate unlimited synthetic data with the corresponding ground truth mask, representing real datasets. This SinGAN-Seg pipeline is applicable for any dataset with segmentation masks, particularly when the dataset is not sharable due to privacy concerns. However, in this study, we applied this pipeline to a public polyp dataset with segmentation masks as a case study. Assuming that the polyp dataset is private, we used this polyp dataset as a proof of concept medical dataset. In this case, we published PyPI package, singan-seg-polyp which can generate an unlimited number of polyp images and corresponding ground truth masks. If the real polyp dataset is restricted for public use, then this type of pip package can be published as an alternative dataset to represent the real dataset. Alternatively, we can publish a pre-generated synthetic dataset using the SinGAN-Seg pipeline, such as the synthetic polyp dataset published as a case study at

According to the results presented in Table 1, the UNet++ segmentation network perform better when the real data is used as training data compared to using synthetic data as training data. However, the small performance gap between real and synthetic data as training data implies that the synthetic data generated from the SinGAN-Seg can use as an alternative to sharing segmentation data instead of real datasets, which are restricted to share. The style-transferring step of the SinGAN-Seg pipeline could reduce the performance gap between real and synthetic data as training data for UNet++. The performance gap between real data and the synthetic data as training data for segmentation models is negotiable because the primary purpose of producing the synthetic data is not to improve the performance of segmentation models but to introduce an alternative data sharing which are practically applicable when datasets have privacy concerns to share.

4.2 SinGAN-Seg with small datasets

In addition to using the SinGAN-Seg pipeline as a data-sharing technique when the real datasets are restricted to publish, the pipeline can improve the performance of segmentation tasks when a dataset is really small. In this case, the SinGAN-Seg pipeline can generate synthetic data to overcome the problem associated with the small dataset. In other words, the SinGAN-Seg pipeline act as a data augmentation technique. The SinGAN-Seg-based data augmentation acts as an unlimited number of stochastic augmentation techniques due to the randomness of the synthetic data generated from this model. For an example, consider a manual segmentation process such as cell segmentation in any medical laboratory experiment. This type of task is really hard to perform for experts as well. As a result, the amount of data collected with manually annotated masks are limited. Our SinGAN-Seg pipeline can improve these datasets by generating an unlimited number of random samples from a single manually annotated image. This study showed that these synthetic data generated from a small real dataset can improve the performance of segmentation machine learning models. For example, when the real polyp dataset size is to train our UNnet++ model, the synthetic dataset with samples showed improvement over the IOU score of using the real data samples.

5 Conclusions and future work

This paper presented a four-channel SinGAN-Seg model and the corresponding SinGAN-Seg pipeline with a style transfer method to generate realistic synthetic polyp images and the corresponding ground truth masks. This SinGAN-Seg pipeline can be used as an alternative data sharing method when real datasets are restricted to share. Moreover, this pipeline can be used for improving the segmentation performance when we have small segmentation real datasets. The conducted three-folds cross-validation experiments and collected results show that synthetic data can achieve very close performance for segmentation tasks when we use only synthetic images and corresponding masks compared to the segmentation performance if the real data and experts annotated data is used when the real dataset has a considerable amount of data. On the other hand, we show that SinGAN-Seg pipeline can achieve better segmentation performance when training datasets are very small.

In future studies, researchers can combine super-resolution gan model 

[16] to this pipeline to improve the quality of the output after the style transfer step. When we have high-resolution images, machine learning algorithms show better performance than algorithms trained using low-resolution images [33].

6 acknowledgments

The research has benefited from the Experimental Infrastructure for Exploration of Exascale Computing (eX3), which is financially supported by the Research Council of Norway under contract 270053.


  • [1] (2003) Act on the protection of personal information. External Links: Link Cited by: §1.
  • [2] J. Bernal, F. J. Sánchez, G. Fernández-Esparrach, D. Gil, C. Rodríguez, and F. Vilariño (2015) WM-dova maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians. Computerized Medical Imaging and Graphics 43, pp. 99–111. Cited by: §1.
  • [3] H. Borgli, V. Thambawita, P. H. Smedsrud, S. Hicks, D. Jha, S. L. Eskeland, K. R. Randel, K. Pogorelov, M. Lux, D. T. D. Nguyen, D. Johansen, C. Griwodz, H. K. Stensland, E. Garcia-Ceja, P. T. Schmidt, H. L. Hammer, M. A. Riegler, P. Halvorsen, and T. de Lange (2020) HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Scientific Data 7 (1), pp. 283. Cited by: §1, Figure 2, Figure 3, §3.1.
  • [4] A. Buslaev, V. I. Iglovikov, E. Khvedchenya, A. Parinov, M. Druzhinin, and A. A. Kalinin (2020) Albumentations: fast and flexible image augmentations. Information 11 (2). External Links: Link, ISSN 2078-2489, Document Cited by: §3.5.
  • [5] (2018) California consumer privacy act. External Links: Link Cited by: §1.
  • [6] S. E. Dilsizian and E. L. Siegel (2013) Artificial intelligence in medicine and cardiac imaging: harnessing big data and advanced computing to provide personalized medical diagnosis and treatment. Current Cardiology Reports 16 (1), pp. 441. External Links: Document, ISBN 1534-3170, Link Cited by: §1.
  • [7] P. Edemekong, P. Annamaraju, and M. Haydel (2020) Health insurance portability and accountability act. StatPearls. Cited by: §1.
  • [8] I. N. Figueiredo, S. Prasath, Y. R. Tsai, and P. N. Figueiredo (2010) Automatic detection and segmentation of colonic polyps in wireless capsule images. ICES REPORT, pp. 10–36. Cited by: §1.
  • [9] L. A. Gatys, A. S. Ecker, and M. Bethge (2015) A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576. Cited by: Figure 1, §2, §3.3.
  • [10] Haifeng Wang and Dejin Hu (2005) Comparison of svm and ls-svm for regression. In

    2005 International Conference on Neural Networks and Brain

    Vol. 1, pp. 279–283. External Links: Document Cited by: §1.
  • [11] M. A. Hearst (1998-07) Support vector machines. IEEE Intelligent Systems 13 (4), pp. 18–28. External Links: ISSN 1541-1672, Link, Document Cited by: §1.
  • [12] D. Jha, S. Ali, N. K. Tomar, H. D. Johansen, D. Johansen, J. Rittscher, M. A. Riegler, and P. Halvorsen (2021) Real-time polyp detection, localization and segmentation in colonoscopy using deep learning. Ieee Access 9, pp. 40496–40510. Cited by: §1.
  • [13] D. Jha, N. K. Tomar, S. Ali, M. A. Riegler, H. D. Johansen, D. Johansen, T. de Lange, and P. Halvorsen (2021) NanoNet: real-time polyp segmentation in video capsule endoscopy and colonoscopy. arXiv preprint arXiv:2104.11138. Cited by: §1.
  • [14] S. Jha and E. J. Topol (2016) Adapting to artificial intelligence: radiologists and pathologists as information specialists. Jama 316 (22), pp. 2353–2354. Cited by: §1.
  • [15] F. Jiang, Y. Jiang, H. Zhi, Y. Dong, H. Li, S. Ma, Y. Wang, Q. Dong, H. Shen, and Y. Wang (2017) Artificial intelligence in healthcare: past, present and future. Stroke and vascular neurology 2 (4). Cited by: §1.
  • [16] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. (2017)

    Photo-realistic single image super-resolution using a generative adversarial network


    Proceedings of the IEEE conference on computer vision and pattern recognition

    pp. 4681–4690. Cited by: §5.
  • [17] L. K. Lee, S. C. Liew, and W. J. Thong (2015) A review of image segmentation methodologies in medical image. Advanced computer and communication engineering technology, pp. 1069–1080. Cited by: §1.
  • [18] W. S. McCulloch and W. Pitts (1943) A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics 5 (4), pp. 115–133. Cited by: §1.
  • [19] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala (2019) PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett (Eds.), pp. 8024–8035. Cited by: §3.
  • [20] V. L. Patel, E. H. Shortliffe, M. Stefanelli, P. Szolovits, M. R. Berthold, R. Bellazzi, and A. Abu-Hanna (2009) The coming of age of artificial intelligence in medicine. Artificial Intelligence in Medicine 46 (1), pp. 5–17. Note: Artificial Intelligence in Medicine AIME’ 07 External Links: Document, ISSN 0933-3657, Link Cited by: §1.
  • [21] (2011) Personal information protection commission. External Links: Link Cited by: §1.
  • [22] D. L. Pham, C. Xu, and J. L. Prince (2000) Current methods in medical image segmentation. Annual review of biomedical engineering 2 (1), pp. 315–337. Cited by: §1.
  • [23] V. Prasath (2017) Polyp detection and segmentation from video capsule endoscopy: a review. Journal of Imaging 3 (1), pp. 1. Cited by: §1.
  • [24] F. Renard, S. Guedria, N. D. Palma, and N. Vuillerme (2020) Variability and reproducibility in deep learning for medical image segmentation. Scientific Reports 10 (1), pp. 13724. External Links: Document, ISBN 2045-2322, Link Cited by: §1.
  • [25] L. F. Sánchez-Peralta, J. B. Pagador, A. Picón, Á. J. Calderón, F. Polo, N. Andraka, R. Bilbao, B. Glover, C. L. Saratxaga, and F. M. Sánchez-Margallo (2020) PICCOLO white-light and narrow-band imaging colonoscopic dataset: a performance comparative of models and datasets. Applied Sciences 10 (23). External Links: Document, ISSN 2076-3417, Link Cited by: §1.
  • [26] T. R. Shaham, T. Dekel, and T. Michaeli (2019) Singan: learning a generative model from a single natural image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4570–4580. Cited by: Figure 1, §2, §2, §3.2, §3.2.
  • [27] J. Silva, A. Histace, O. Romain, X. Dray, and B. Granado (2014) Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer. International journal of computer assisted radiology and surgery 9 (2), pp. 283–293. Cited by: §1.
  • [28] A. Suárez Sánchez, P.J. García Nieto, P. Riesgo Fernández, J.J. del Coz Díaz, and F.J. Iglesias-Rodríguez (2011) Application of an svm-based regression model to the air quality study at local scale in the avilés urban area (spain). Mathematical and Computer Modelling 54 (5), pp. 1453–1466. External Links: Document, ISSN 0895-7177, Link Cited by: §1.
  • [29] A. A. Taha and A. Hanbury (2015) Metrics for evaluating 3d medical image segmentation: analysis, selection, and tool. BMC Medical Imaging 15 (1), pp. 29. External Links: Document, ISBN 1471-2342, Link Cited by: §1.
  • [30] N. Tajbakhsh, S. R. Gurudu, and J. Liang (2015) Automated polyp detection in colonoscopy videos using shape and context information. IEEE transactions on medical imaging 35 (2), pp. 630–644. Cited by: §1.
  • [31] V. Thambawita, S. A. Hicks, P. Halvorsen, and M. A. Riegler (2021) DivergentNets: medical image segmentation by network ensemble.. In EndoCV@ ISBI, pp. . Cited by: §1, §3.5.
  • [32] V. Thambawita, S. Hicks, P. Halvorsen, and M. A. Riegler (2020) Pyramid-focus-augmentation: medical image segmentation with step-wise focus. arXiv preprint arXiv:2012.07430. Cited by: §1, §3.5.
  • [33] V. L. Thambawita, S. Hicks, I. Strümke, M. A. Riegler, P. Halvorsen, and S. Parasa (2021)

    Fr615 impact of image resolution on convolutional neural networks performance in gastrointestinal endoscopy

    Gastroenterology 160 (6), pp. S–377. Cited by: §5.
  • [34] The norwegian data protection authority. Note: Accessed: 2021-04-25 External Links: Link Cited by: §1.
  • [35] The personal data act. Note: Accessed: 2021-04-25 External Links: Link Cited by: §1.
  • [36] (2018) THE personal data protection bill. External Links: Link Cited by: §1.
  • [37] L. A. Torre, F. Bray, R. L. Siegel, J. Ferlay, J. Lortet-Tieulent, and A. Jemal (2015) Global cancer statistics, 2012. CA: a cancer journal for clinicians 65 (2), pp. 87–108. Cited by: §1.
  • [38] P. Voigt and A. Von dem Bussche The eu general data protection regulation (gdpr). Cited by: §1.
  • [39] M. J. Willemink, W. A. Koszek, C. Hardell, J. Wu, D. Fleischmann, H. Harvey, L. R. Folio, R. M. Summers, D. L. Rubin, and M. P. Lungren (2020) Preparing medical imaging data for machine learning. Radiology 295 (1), pp. 4–15. Cited by: §1.
  • [40] S. Yu, M. Chen, E. Zhang, J. Wu, H. Yu, Z. Yang, L. Ma, X. Gu, and W. Lu (2020-08) Robustness study of noisy annotation in deep learning based medical image segmentation. Physics in Medicine & Biology 65 (17), pp. 175007. External Links: Document, Link Cited by: §1.
  • [41] S. Yue, P. Li, and P. Hao (2003) SVM classification:its contents and challenges. Applied Mathematics-A Journal of Chinese Universities 18 (3), pp. 332–342. External Links: Document, ISBN 1993-0445, Link Cited by: §1.
  • [42] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang (2018) Unet++: a nested u-net architecture for medical image segmentation. In Deep learning in medical image analysis and multimodal learning for clinical decision support, pp. 3–11. Cited by: §3.5.