Unsupervised Instance Segmentation in Microscopy Images via Panoptic Domain Adaptation and Task Re-weighting

Unsupervised domain adaptation (UDA) for nuclei instance segmentation is important for digital pathology, as it alleviates the burden of labor-intensive annotation and domain shift across datasets. In this work, we propose a Cycle Consistency Panoptic Domain Adaptive Mask R-CNN (CyC-PDAM) architecture for unsupervised nuclei segmentation in histopathology images, by learning from fluorescence microscopy images. More specifically, we first propose a nuclei inpainting mechanism to remove the auxiliary generated objects in the synthesized images. Secondly, a semantic branch with a domain discriminator is designed to achieve panoptic-level domain adaptation. Thirdly, in order to avoid the influence of the source-biased features, we propose a task re-weighting mechanism to dynamically add trade-off weights for the task-specific loss functions. Experimental results on three datasets indicate that our proposed method outperforms state-of-the-art UDA methods significantly, and demonstrates a similar performance as fully supervised methods.



page 1

page 4

page 7

page 8


Spatial Attention Pyramid Network for Unsupervised Domain Adaptation

Unsupervised domain adaptation is critical in various computer vision ta...

Edge-preserving Domain Adaptation for semantic segmentation of Medical Images

Domain Adaptation is a technique to address the lack of massive amounts ...

Cell R-CNN V3: A Novel Panoptic Paradigm for Instance Segmentation in Biomedical Images

Instance segmentation is an important task for biomedical image analysis...

DARCNN: Domain Adaptive Region-based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images

In the biomedical domain, there is an abundance of dense, complex data w...

Context-Aware Mixup for Domain Adaptive Semantic Segmentation

Unsupervised domain adaptation (UDA) aims to adapt a model of the labele...

User-Guided Domain Adaptation for Rapid Annotation from User Interactions: A Study on Pathological Liver Segmentation

Mask-based annotation of medical images, especially for 3D data, is a bo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Nuclei instance segmentation in histopathology images is an important step in the digital pathology workflow. Pathologists are able to diagnose and prognose cancers according to mitosis counts, the morphological structure of each nucleus, and spatial distribution of a group of nuclei [elston1991pathological, le1989prognostic, clayton1991pathologic, basavanhally2011multi, nawaz2016computational]

. Currently, supervised learning-based methods for nuclei instance segmentation are prevalent as they are efficient while preserving high accuracy

[kumar2017dataset, naylor2018segmentation, chen2017dcan, graham2019hover, mahmood2018deep, zhang2018panoptic, liu2019nuclei, liu2020cell]. However, their performance heavily relies on large-scale training data, which requires expertise for annotation. This process is time-consuming and labor-intensive due to the complicated cellular structures, as shown in Fig. 1(b), and large image sizes. For example, annotating a histopathology dataset with images and M pixels costs a pathologist to hours [hou2019robust]. Moreover, in real clinical studies, even one whole slide image in objective magnification contains B pixels [gutman2013cancer]. Therefore, investigating methods without depending on histopathology annotations is necessary. It can help pathologists to reduce the workload, and tackle the issue of lacking histopathology annotations.

X[c] X[c] X[c] X[c] X[c] (a) & (b) & (c) & (d) & (e)

Figure 1: Example images of our proposed framework. (a) fluorescence microscopy images; (b) real histopathology images; (c) our synthesized histopathology images; (d) nuclei segmentation generated by our proposed UDA method; (e) ground truth.

The recently proposed unsupervised domain adaptation (UDA) methods tackle this issue by conducting supervised learning on the source domain and obtain a good performance model for the target domain without annotations [pan2009survey, ganin2014unsupervised, tzeng2017adversarial]. Currently, UDA reduces distances between the distribution of feature maps of the source and target domains. In addition, some other methods focus on the pixel-to-pixel translation from the source domain images to the target ones, for aligning cross-domain image appearances [isola2017image, zhu2017unpaired]. For these methods, there still remain some differences in the distributions between the synthesized and real images, due to the imperfect translations [hoffman2017cycada, chen2019synergistic, kim2019diversify].

To incorporate the benefits of the image translation and the UDA methods, several works have been proposed to learn the domain-invariant features between the target and the synthesized target-like images [hoffman2017cycada, kim2019diversify, chen2019synergistic]. Such methods achieve state-of-the-art performance on UDA classification, object detection, and semantic segmentation tasks. However, currently there is a lack of UDA methods specifically designed for instance segmentation, and directly extending the existing UDA methods on object detection [chen2018domain, kim2019diversify, he2019multi] to the UDA nuclei instance segmentation task still suffers from challenges. First, existing UDA object detection methods focus on alleviating the domain bias at the image level (image contrast, brightness, etc.) and the instance level (object scale, style, etc.) [kim2019diversify, chen2018domain, he2019multi]. They ignore the domain shift at the semantic level, such as the relationship between the foreground and background, and the spatial distribution of the objects. Second, these UDA object detection methods are multi-task learning paradigms, which optimize different loss functions simultaneously. If the feature extractors fail to generate domain-invariant features in some training iterations, then back-propagating the weights according to the task loss functions in these iterations causes the model bias towards the source domain.

To solve the aforementioned problems in UDA nuclei instance segmentation tasks in histopathology images, we propose a Cycle-Consistent Panoptic Domain Adaptive Mask R-CNN (CyC-PDAM) model. As none of the previous UDA methods are specially designed for instance segmentation, we extend the CyCADA [hoffman2017cycada] to an instance segmentation version based on Mask R-CNN [he2017mask], as our baseline. In our CyC-PDAM, we firstly propose a simple nuclei inpainting mechanism to remove the auxiliary nuclei in the synthesized histopathology images. Second, inspired by the panoptic segmentation architectures [kirillov2019panoptic, kirillov2019panopticfpn], we propose a semantic-level adaptation module for domain-invariant features based on the relationship between the foreground and the background. By reconciling the domain-invariant features at the semantic and instance levels, our proposed CyC-PDAM achieves panoptic-level domain adaptation. Furthermore, a task re-weighting mechanism is proposed to reset the importance for each task loss. During training, the specific task losses are down-weighted if the features for task predictions are not domain-invariant and source-biased, and up-weighted if the features are hard to differentiate.

To prove the effectiveness of our proposed CyC-PDAM architecture, experiments have been conducted on three public datasets for unsupervised nuclei instance segmentation of histopathology images on two different datasets by unsupervised domain adaptation from a fluorescence microscopy image dataset. Unlike histopathology images, no structures are similar to the nuclei in the background of fluorescence microscopy images, due to the differences between image acquisition techniques, as shown in Fig. 1(a). It is much easier to obtain manual annotation for the fluorescence microscopy images compared with histopathology images, therefore it is chosen as our source domain.

Our contribution is summarized as follows: (1) We propose a CyC-PDAM model for UDA nuclei instance segmentation in histopathology images. To our best knowledge, this is the first UDA instance segmentation method. (2) A simple nuclei inpainting mechanism is proposed to remove false-positive objects in the synthesized images. (3) Our CyC-PDAM produces domain-invariant features at the panoptic level, by integrating the instance-level adaptation with a newly proposed semantic-level adaptation module. (4) A task re-weighting mechanism is proposed to alleviate the domain bias towards the source domain. (5) Compared with state-of-the-art UDA methods, our proposed CyC-PDAM paradigm outperforms them by a large margin. Moreover, it achieves competitive performance compared with state-of-the-art fully supervised methods for nuclei segmentation.

2 Related Work

Figure 2: Overall architecture for our proposed CyC-PDAM architecture. The annotations of the real histopathology patches are not used during training.

2.1 Domain Adaptation for Natural Images

Domain adaptation aims at transferring the knowledge learned from one labeled domain to another without annotation [pan2009survey]. Recently, UDA methods have reduced the cross-domain discrepancies based on the content in the feature level and the appearance in the pixel level. For the feature-level adaptation, adversarial learning for domain-invariant features [ganin2014unsupervised, tzeng2017adversarial], Maximum Mean Discrepancy minimization (MMD) [long2015learning], local pattern alignment [wen2019exploiting], and cross-domain covariance alignment [sun2016return] are widely employed for classification tasks. In addition, domain adaptation is further employed for other tasks such as semantic segmentation [vu2019advent, li2019bidirectional] and object detection [chen2018domain, kim2019diversify, inoue2018cross, wang2019few]. In semantic segmentation tasks, the segmentation results are forced to be domain-invariant, together with intermediate feature maps [li2019bidirectional, vu2019advent, tsai2018learning]. Additionally, ADVENT [vu2019advent] further minimized the Shannon entropy for the semantic segmentation predictions in source and target domains to alleviating the cross-domain discrepancy. For object detection, a domain adaptive Faster R-CNN [ren2015faster], consisting of the image- and instance-level adaptions, was usually proposed for domain-invariant features of the whole image and each object [chen2018domain, kim2019diversify, he2019multi]

. On the other hand, image-to-image translation addresses the domain adaptation problems in the pixel level by generating target-like images and training task-specific fully supervised models on them

[liu2017unsupervised, huang2018multimodal, isola2017image, zhu2017unpaired, mahmood2018deep, park2019semantic]. However, domain bias still exists because of imperfect translation. Moreover, several methods have been proposed to align the feature-level adaptation with the pixel-level one, by learning domain-invariant features between the target images and the synthesized images [hoffman2017cycada, kim2019diversify, chen2019synergistic].

2.2 Domain Adaptation for Medical Images

Unsupervised domain adaptation for medical image analysis has rarely been explored [ren2018adversarial, zhang2018task, chen2019synergistic, huang2017epithelium, hou2019robust]. [ren2018adversarial] and [huang2017epithelium] solve the UDA histopathology images classification problems with GAN based architectures. In addition, DAM [dou2018unsupervised] is proposed to generate domain-invariant intermediate features and model predictions, for UDA semantic segmentation in CT images. With the help of cycle-consistency reconstruction, TD-GAN [zhang2018task] and SIFA [chen2019synergistic] are proposed for semantic segmentation on different medical images, with both pixel- and feature-level adaptations. However, none of them is designed for UDA nuclei instance segmentation. Even though Hou et al[hou2019robust] proposed to train a GAN based refiner and a nuclei segmentation model with the synthesized histopathology images for unsupervised nuclei instance segmentation, their paradigm only contains pixel-level adaptation and is still not capable for minimizing the domain gap in the feature level. In this work, we therefore propose a CyC-PDAM paradigm for UDA nuclei instance segmentation, which alleviates the domain bias issue in the pixel and feature levels.

3 Methods

Our proposed architecture is based on CyCADA and we fuse CyCADA with the instance segmentation framework Mask R-CNN. Furthermore, we improve it with nuclei inpainting mechanism, panoptic-level domain adaptation, and task re-weighting mechanism. Fig. 2 illustrates the overall architecture of our approach.

3.1 CyCADA with Mask R-CNN

Name Hyperparamaters Output size
Table 1: The parameters for each block in the image-level discriminator for PDAM. , , and

denote the kernel size, stride, and padding of the convolution operation, respectively.

As there is no UDA architectures targeting instance-level segmentation, we firstly design a domain adaptive Mask R-CNN. The backbone of the Mask R-CNN in this work is constructed with ResNet101 [he2016deep] and Feature Pyramid Network (FPN) [lin2017feature]. Inspired by the previous UDA methods for object detection [chen2018domain, kim2019diversify], we add one discriminator after FPN for the image-level adaptation, and the other after the instance branch for instance-level adaptation, as shown in Fig. 3. For the image-level adaptation, the multi-resolution feature maps of the FPN output are firstly down-sampled to the size with average pooling, and then summed together for the image-level discriminator. The image-level discriminator consists of convolutional layers (details in Table 1) and a gradient reversal layer (GRL) for adversarial learning. In the instance-level adaptation, the feature map in the mask branch is down-scaled to the size with average pooling and then resized to , to sum with the feature from the bounding box branch. The instance-level discriminator consists of fully connected layers and a GRL, whose input is the summation of features mentioned above.

Figure 3: Detailed illustration of Panoptic Domain Adaptive Mask R-CNN (PDAM). and represent a convolution layer, and a fully connected layer, respectively. and mean the first and second convolutional layers in the residual block, respectively. and normalization layers after each convolutional block are omitted for brevity.

3.2 Nuclei Inpainting Mechanism

Even though CycleGAN is effective for synthesizing histopathology-like images, due to the large domain gap and nuclei number incompatibility between the source and target domains, the label space for the generated images sometimes changes after transferring from the source domain. For example, there are redundant and undesired nuclei in the synthesized images shown in Fig. 4. If these images are directly used to train the task-specific CNN with the original labels, the model is forced to regard redundant nuclei as background, even though they appear as real nuclei.

Figure 4: Visual results for the effectiveness of nuclei inpainting mechanism. (a) original fluorescence microscopy patches; (b) corresponding nuclei annotations; (c) initial synthesized images from CycleGAN; (d) final synthesized images after nuclei inpainting mechanism.

Therefore, we propose an auxiliary nuclei inpainting mechanism to remove the nuclei which only appear in the synthesized images without corresponding annotations. Denoting a raw synthesized histopathology image by CycleGAN as and its corresponding mask as , we first obtain the mask predictions of all the auxiliary generated nuclei, formulated as:


where represents a binary segmentation method for based on Otsu threshold. In , only auxiliary nuclei without annotation is labeled. Then, we get the newly synthesized image after removing these nuclei, which can be represented as:


where is a fast marching based method for inpainting objects [telea2004image], by replacing the pixel values for the auxiliary nuclei labeled in with them for the unlabeled background. Fig. 4 illustrates the visual effectiveness of our proposed nuclei inpainting mechanism. However, some background materials are labeled as false positive predictions in . Directly inpainting them makes the texture and appearance of synthesized images unrealistic, and enlarges the domain gap between the synthesized and real images. However, the image-level adaptation is able to address this issue by alleviating the domain bias on global visual information, such as curve, texture, and illumination. Our nuclei inpainting mechanism is time-efficient, which takes second to process one single synthesized histopathology patch, on average.

3.3 Panoptic Level Domain Adaptation

We define the semantic-level features of an image as the relationship between its foreground and background. In addition to the image- and feature-level domain bias, the domain shift at the semantic level also exists. Due to the differences in the nuclei objects and background between the synthesized and real histopathology images, domain adaptive Mask R-CNN mentioned in Sec. 3.1 suffers from domain bias in the semantic-level features, as the Mask R-CNN only focuses on the local features for each object and lacks a semantic view of the whole image. Inspired by the previous panoptic segmentation architecture, which unified the semantic and instance segmentation to process the global and local features of the images, we propose a semantic-level adaptation to induce the model to learn domain-invariant features based on the relationship between the foreground and background. By incorporating the semantic- and instance-level adaptation, our panoptic domain adaptive method reduces the cross-domain discrepancies in a global and local view.

As shown in Fig. 3, a semantic branch for semantic segmentation prediction is added to the output of the FPN. Our semantic branch has the same implementation as [kirillov2019panopticfpn]. As the fluorescence microscopy images and histopathology images can both be acquired from tissue samples and they can show complementary and correlated information, the semantic segmentation label spaces of the synthesized and real histopathology images have a strong similarity. In addition, aligning the cross-domain entropy distributions helps to minimize the entropy prediction in the target domain, which makes the model suitable for the target images [vu2019advent]. Therefore, we use the Shannon entropy [shannon1948mathematical] of the softmax semantic predictions to induce the domain-invariant features to learn at the semantic level. Denoting the softmax semantic prediction as and , its Shannon entropy is defined as: .

Fig. 3 and Table 2

indicate the detailed structure of the discriminator for semantic level adaptation. We employ residual connected CNN blocks to avoid gradient vanishing

[he2016deep, he2016identity]

. To make the adversarial learning more stable, instead of bilinear interpolation, we use stride convolutional layers for upsampling. Finally, the domain label is predicted as a

patch. Due to the small mini-batch size, the patch-based domain label prediction increases the number of training samples, to avoid overfitting.

Name Hyperparamaters Output size
R11 and R12
R21 and R22
R31 and R32
R41 and R42
Table 2: The parameters for each block in the semantic-level discriminator for PDAM. , , and follow the same convention as in Table 1.

3.4 Task Re-weighting Mechanism

In the previous UDA methods, the task-specific loss functions (segmentation, classification, and detection) are based on the source domain predictions. Even though several adversarial domain discriminators are employed to ensure the predicted feature maps are domain-invariant, the cross-domain discrepancies of these feature maps are still large in some training iterations, where the features are far from the decision boundaries of the domain discriminators. If the task-specific losses are updated to optimize the models with these easily-distinguished features, the models will bias towards the source images when testing it with the target data. To this end, we propose a task re-weighting mechanism to add a trade-off weight for each task-specific loss function according to the prediction of the domain discriminator. Denote the probability of the feature map before the final task prediction belonging to the source and target domains as

and , respectively, and the task-specific loss function as , then the re-weighted task-specific loss is:


where is a threshold value to avoid the becoming large and making the model collapse, when . According to Eq. 3, if the feature map deciding the task prediction belongs to the source domain (), the loss function is then down-weighted, to alleviate the source-bias feature learning of the model. As illustrated in Fig. 3, the loss function for the region proposal network (RPN), semantic branch, and the instance branch are re-weighted by the prediction at the image-, semantic-, and instance-level domain discriminators, respectively.

3.5 Network Overview and Training Details

In our proposed CyC-PDAM, the CycleGAN has the same implementation as its original work [zhu2017unpaired]. When training the CycleGAN, the initial learning rate was set to for the first of the total training iterations, and linearly decayed to for the other .

The PDAM is trained with a batch size of and each batch contains

images, one from the source and the other from the target domain. Due to the small batch size, we replace traditional batch normalization layers with group normalization

[wu2018group] layers, with the default group number as in [wu2018group].

The overall loss function of PDAM is defined as:


where is the loss function for the RPN, is the loss of class, bounding box, and instance mask prediction of Mask R-CNN, is the cross entropy loss for semantic segmentation, , and are cross entropy losses for domain classification at image, semantic and instance levels. , , and are calculated according to Eq. 3 for task re-weighting. In our experiment, we set as . is updated as:


where is the training progress and . Thus is gradually changed from to , to avoid the noise from the unstable domain discriminators in the early training stage.

During training, the PDAM is optimized by SGD, with a weight decay of and a momentum of . The initial learning rate is , with linear warming up in the first iterations. The learning rate is then decreased to when it reaches

of the total training iteration. During inference, only the original Mask R-CNN architecture is used with the adapted weight and all of the hyperparameters for testing are fine-tuned on the validation set. All of our experiments were implemented with Pytorch

[paszke2017automatic], on two NVIDIA GeForce 1080Ti GPUs.

4 Experiments

Methods AJI Pixel-F1 Object-F1 AJI Pixel-F1 Object-F1
CyCADA [hoffman2017cycada]
Chen et al. [chen2018domain]
SIFA [chen2019synergistic]
DDMRL [kim2019diversify]
Hou et al. [hou2019robust]
Table 3: In comparison with other unsupervised methods on both two histopathology datasets.

4.1 Datasets Description and Evaluation Metrics

Our proposed architecture was validated on three public datasets, referred to as Kumar [kumar2017dataset], TNBC [naylor2018segmentation], and BBBC039V1 [ljosa2012annotated], respectively. Among them, Kumar and TNBC are histopathology datasets, while BBBC039V1 is a fluorescence microscopy dataset. Kumar was acquired from The Cancer Genome Atlas (TCGA) at magnification, containing annotated patches from whole slide images of different patients. All these images are from different hospitals and different organs (breast, liver, kidney, prostate, bladder, colon, and stomach). In contrast to the disease variability in Kumar, the TNBC dataset especially focuses on Triple-Negative Breast Cancer (TNBC) [naylor2018segmentation]. In TNBC, there are annotated patches from different patients from the Curie Institute at magnification. BBBC039V1 is about U2OS cells under a high-throughput chemical screen [ljosa2012annotated]. It contains images about bioactive compounds, with the DNA channel staining of a single field of view.

For evaluation, we employ three commonly used pixel- and object-level metrics. Aggregated Jaccard Index (AJI) is an extended Jaccard Index for object-level evaluation


, and object-level F1 score is the average harmonic mean between the precision and recall for each object. For pixel-level evaluation, we employ pixel-level F1 score for binarization predictions.

4.2 Experiment Setting

We conducted our experiments on two nuclei segmentation tasks: adapting from BBBC039V1 to Kumar, and from BBBC039V1 to TNBC. As the source domain in two experiments, training images and validation images from BBBC039V1 are used, following the official data split111 https://data.broadinstitute.org/bbbc/BBBC039/. The annotations for Kumar and TNBC are not used during training the UDA architecture, only for evaluation.

The preprocessing for source fluorescence microscopy images has steps. First, all images are normalized into range . Second, patches in size are randomly cropped from the training images, with data augmentation including rotation, scaling, and flipping to avoid overfitting. Third, the patches with fewer than objects are removed. For better synthesizing target-like histopathology images, we finally inverse the pixel value of foreground nuclei and background for all source fluorescence microscopy patches. For validation, images in the BBBC039V1 validation set are transferred to synthesized histopathology images by CycleGAN and nuclei inpainting mechanism.

For the Kumar dataset as the target domain, we have the same data split as previous work in [kumar2017dataset, naylor2018segmentation], with images for training, and for testing. When training the model, totally patches in size are randomly cropped from the training histopathology images, with basic data augmentation including flipping and rotation, to avoid overfitting. As for TNBC, we use cases with images for training, and the remaining cases with images for testing. To train the model with TNBC, patches are randomly extracted from the training images with basic data augmentation including flipping and rotation.

4.3 Comparison Experiments

4.3.1 Comparison with Unsupervised Methods

In this section, our proposed CyC-PDAM is compared with several state-of-the-art UDA methods, including CyCADA [hoffman2017cycada], Chen et al. [chen2018domain], SIFA [chen2019synergistic], and DDMRL [kim2019diversify]. As the original CyCADA focuses on classification and semantic segmentation, we extend it with Mask R-CNN for UDA instance segmentation, as described in Sec. 3.1. Chen et al. [chen2018domain] are originally for UDA object detection based on Faster R-CNN, by adapting the features at the image and instance levels. For UDA instance segmentation, we replace the original VGG16 based Faster R-CNN with the same Mask R-CNN in our architecture, and the original image- and instance-level adaptation in [chen2018domain] with ours in Sec. 3.1. SIFA [chen2019synergistic] is a UDA semantic segmentation architecture for CT and MR images, with a pixel- and feature-level adaptation. In our experiment, we add the watershed algorithm to separate the touching objects in the semantic segmentation prediction of SIFA, for a fair comparison. DDMRL [kim2019diversify] learns multi-domain-invariant features from various generated domains for UDA object detection and it is extended for instance segmentation, in a similar way as CyCADA [hoffman2017cycada] and Chen et al. [chen2018domain]. In addition, we also compared with Hou et al. [hou2019robust], which is particularly designed for unsupervised nuclei segmentation in histopathology images. They trained a multi-task (segmentation, detection, and refinement) CNN architecture with their synthesized histopathology images from randomly generated binary nuclei masks.

Figure 5: Visualization result for the comparison experiments experiment. The first rows are from Kumar dataset, and the last rows are from TNBC.

Table 3

shows that our proposed method outperforms all the comparison methods by a large margin, on different histopathology datasets. In addition, the one-tailed paired t-test is employed to prove that all of our improvements are statistically significant, with all the p-values under

. Chen et al. [chen2018domain] learns the domain-invariant features at the image and instance levels. However, due to the large differences between the fluorescence microscopy and real histopathology images, feature-level adaptation only is not enough to reduce the domain gap. With pixel-level adaptation on appearance, all the other methods achieve better performance. Compared with the baseline method CyCADA [hoffman2017cycada], our CyC-PDAM has a large improvement of , due to the effectiveness of our proposed nuclei inpainting mechanism, panoptic-level adaptation, and task re-weighting mechanism. SIFA [chen2019synergistic] focuses on domain-invariant features in the image and semantic levels, with a UDA semantic segmentation structure. As there exists a large number of nuclei objects in the histopathology images, the effectiveness of SIFA is still limited without any instance-level learning or adaptation. Although DDMRL [kim2019diversify] only adapts the features at the image level, its performance is still at the same level as CyCADA, by adapting knowledge across various domains. Among all the comparison methods, Hou et al. [hou2019robust] achieves the second-best performance. Due to the effectiveness of panoptic-level feature adaptation and task re-weighting mechanism, our method still outperforms it under all three metrics, in both two experiments. Fig. 5 are visualization examples of all the comparison methods.

4.3.2 Ablation Study

AJI Pixel-F1 Object-F1
w/o NI
w/o TR
w/o SEM
Table 4: Ablation study on BBBC039V1 to Kumar experiment. NI, TR, and SEM represent the nuclei inpainting mechanism, task re-weighting mechanism, and semantic branch, respectively.
AJI Pixel-F1
Methods seen unseen all seen unseen all
CNN3 [kumar2017dataset]
DIST [naylor2018segmentation]
Upper bound [kirillov2019panopticfpn]
Table 5: Comparison experiments between our UDA method and fully supervised methods, for BBBC039V1 to Kumar experiment. For CNN3 and DIST, the results of object-level F1 are unknown.

In order to test the effectiveness of each component in our proposed CyC-PDAM, ablation experiments are conducted on the Kumar dataset. Based on our CyC-PDAM, we remove the nuclei inpainting mechanism, task re-weighting mechanism, and semantic branch for panoptic-level adaptation and train the ablated models with the same setting and dataset as Sec. 4.3.1. Table 4 and Fig. 6 show the detailed results of the ablation experiment. As shown in Fig. 6, the method without nuclei inpainting mechanism (w/o NI) tends to ignore some nuclei, which increases the false-negative predictions. Moreover, we notice that there are also false split and merged predictions for w/o NI model. It is because the increasing false negative predictions are harmful to the spatial distribution of all the objects, which further affects the effectiveness of the semantic-level adaptation. Among the predictions of the method without task re-weighting mechanism (w/o TR), there exist some objects with irregular sizes. The task re-weighting mechanism prevents the model from being influenced by the domain-specific features in the source domain, and removing it, therefore, incurs source-biased predictions. Compared with our method, the model without semantic-branch (w/o SEM) is not able to learn domain-invariant features at the semantic level, including the spatial distribution of the nuclei objects and the detailed information in the background. Therefore, there not only remain falsely split and merged predictions, but also false-positive and imperfect segmentation results. As shown in Table 4, the segmentation accuracy under three metrics decreases by after removing each module. In addition, the one-tailed paired t-test is employed to calculate the p-value between our proposed method and the other ablated methods. After adding each of the three modules, the improvements are statistically significant (), which further demonstrates the effectiveness of our proposed method.

Figure 6: Visualization results for the ablation experiment. NI: nuclei inpainting mechanism; TR: task re-weighting mechanism; SEM: semantic branch.

4.3.3 Comparison with Fully Supervised Methods

As our data split in Kumar dataset is the same as several state-of-the-art methods for fully supervised nuclei segmentation, we compare their original reported results with ours. Table 5 illustrates the comparison results between our proposed UDA architecture and other fully supervised methods. CNN3 [kumar2017dataset] is a contour-based nuclei segmentation architecture, which considers nuclei boundaries as the third class, in addition to the foreground and background classes. DIST [naylor2018segmentation] is a regression model based on the distance map. For Panoptic FPN [kirillov2019panopticfpn], we directly train it using the same set of real histopathology patches as CNN3 and DIST and it is employed as the upper bound of our unsupervised method. The testing images for Kumar are divided into two subsets: one contains images from organs known to training set, referred to as seen, and the other contains images from organs unknown to the training set, referred to as unseen.

As shown in Table 5, the performance of our proposed UDA architecture is superior to the fully supervised CNN3 and DIST. It is because our proposed method is able to process each ROI on the local level, while CNN3 and DIST only process the image at a global semantic level. By adapting the semantic-level features of the foreground and the background, the performance of our method is at the same level as the fully supervised Panoptic FPN for the pixel-level F1-score. Even though our AJI is slight lower than the fully supervised Panoptic FPN, we notice that our method works better when tested on the unseen testing set. This is because our proposed CyC-PDAM focuses on learning the domain-invariant features and avoids being influenced by the domain bias of testing images from unseen organs. These results show that, although there remains large differences between the fluorescence microscopy images and histopathology images, our proposed UDA architecture still successfully narrows the domain gap between them, and achieves even better performance compared with fully supervised methods requiring histopathology nuclei annotations.

5 Conclusion

In this work, we propose a CyC-PDAM architecture for UDA nuclei segmentation in histopathology images. We firstly design a baseline architecture for UDA instance segmentation, including appearance-, image-, and instance-level adaptation. Next, a nuclei inpainting mechanism is designed to remove the auxiliary objects in the synthesized images, to further avoid false-negative predictions. In the feature-level adaptation, a semantic branch is proposed to adapt the features with respect to the foreground and background, and incorporating semantic- and instance-level adaptation enables the model to learn domain-invariant features at the panoptic level. In addition, a task re-weighting mechanism is proposed to reduce the bias. Extensive experiments on three public datasets indicate our proposed method outperforms the state-of-the-art UDA methods by a large margin and reaches the same level as the fully supervised methods. From a larger perspective, the UDA instance segmentation problems are not limited to histopathology image analysis. With the promising performance close to fully supervised methods in this work, we suggest that our proposed method can also contribute to other general image analysis applications.