Domain Adversarial RetinaNet as a Reference Algorithm for the MItosis DOmain Generalization (MIDOG) Challenge

by   Frauke Wilm, et al.
Technische Hochschule Ingolstadt

Assessing the Mitotic Count has a known high degree of intra- and inter-rater variability. Computer-aided systems have proven to decrease this variability and reduce labelling time. These systems, however, are generally highly dependent on their training domain and show poor applicability to unseen domains. In histopathology, these domain shifts can result from various sources, including different slide scanning systems used to digitize histologic samples. The MItosis DOmain Generalization challenge focuses on this specific domain shift for the task of mitotic figure detection. This work presents a mitotic figure detection algorithm developed as a baseline for the challenge, based on domain adversarial training. On the preliminary test set, the algorithm scores an F_1 score of 0.7514.



There are no comments yet.


page 1


Stain-Robust Mitotic Figure Detection for the Mitosis Domain Generalization Challenge

The detection of mitotic figures from different scanners/sites remains a...

Domain Adaptive Cascade R-CNN for MItosis DOmain Generalization (MIDOG) Challenge

We present a summary of the domain adaptive cascade R-CNN method for mit...

Domain-adversarial neural networks to address the appearance variability of histopathology images

Preparing and scanning histopathology slides consists of several steps, ...

MitoDet: Simple and robust mitosis detection

Mitotic figure detection is a challenging task in digital pathology that...

Cascade RCNN for MIDOG Challenge

Mitotic counts are one of the key indicators of breast cancer prognosis....

Unseen Target Stance Detection with Adversarial Domain Generalization

Although stance detection has made great progress in the past few years,...

Domain-Robust Mitotic Figure Detection with StyleGAN

We propose a new training scheme for domain generalization in mitotic fi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


A well-established method of assessing tumor proliferation is the Mitotic Count (MC[meuten2016] - a quantification of mitotic figures in a selected field of interest. Identifying mitotic figures, however, is prone to a high level of intra- and inter-observer variability [aubreville2020]

. Recent work has shown that deep learning-based algorithms can guide pathologists during

MC assessment and lead to faster and more accurate results [aubreville2020]. These algorithmic solutions, however, are highly domain-dependent and performance significantly decreases when applying these algorithms to data from unseen domains [lafarge2017]. In histopathology, domain shifts are oftentimes attributed to varying sample preparation or staining protocols used at different laboratories. These sources of domain shift have been approached with a wide range of strategies, e.g. stain normalization [macenko2009], stain augmentation [tellez2018] and domain adversarial training [lafarge2017]. Domain shifts, however, cannot only be attributed to staining variations but can also include variations induced by different slide scanners [aubreville2021]. The MItosis DOmain Generalization (MIDOG) challenge [midog], hosted as a satellite event of the 24 International Conference at Medical Image Computing and Computer Assisted Intervention (MICCAI) 2021, addresses this topic in the form of assessing the MC on a multi-scanner dataset. This work presents the reference algorithm developed out-of-competition as a baseline for the MIDOG challenge. The RetinaNet-based architecture was trained in a domain adversarial fashion and scored an F score of 0.7514 on the preliminary test set.

Material and Methods

The reference algorithm was developed on the official training subset of the MIDOG dataset. We did not use any additional datasets and had no access to the preliminary test during method development. The algorithm is based on a publicly available implementation of RetinaNet [marzahl2020] which was extended by a domain classification path to enable domain adversarial training.


The MIDOG training subset consists of 200 Whole Slide Images from human breast cancer tissue samples stained with routine Hematoxylin & Eosin (H&E) dye. The samples were digitized with four slide scanning systems: the Hamamatsu XR NanoZoomer 2.0, the Hamamatsu S360, the Aperio ScanScope CS2 and the Leica GT450, resulting in 50 WSIs per scanner. For the slides of three scanners, a selected field of interest sized approximately (equivalent to ten high power fields) was annotated for mitotic figures and hard negative look-alikes. These annotations were collected in a multi-expert blinded set-up. For the Leica GT450, no annotations were available. The preliminary test set consists of five WSIs each for four undisclosed slide scanning systems of which only two were also part of the training set. This preliminary test set was used for evaluating the algorithms prior to submission and publishing preliminary results on a leaderboard. The final test set consists of 20 additional WSIs from the same four scanners used for the preliminary test set. The evaluation through a Docker-based submission system ensured that the participants had no access to the (preliminary) test images during method development.

Domain Adversarial RetinaNet

For the domain adversarial training, we customized a publicly available RetinaNet implementation [marzahl2020] by adding a Gradient Reverse Layer (GRL

) and a domain classifier. For our encoder, we used a ResNet18 backbone pre-trained on ImageNet. For the domain discriminator we were inspired by the work of Pasqualino

et al[pasqualino2021]

and likewise chose a sequence of three blocks consisting of a convolutional layer, batch normalization, ReLU activation and Dropout, followed by an adaptive average pooling and a fully connected layer. We experimented with varying the number and positions of the domain classifier but ultimately decided for positioning a single discriminator at the bottleneck of the encoding branch.

Figure 1 schematically visualizes the modified RetinaNet architecture.

max width=0.5

ResNet Encoder

Feature PyramidNetwork

class + boxsubnets


Figure 1: Domain adversarial RetinaNet architecture.

Network Training

We split our training data into 40 training and ten validation WSIs per scanner and ensured a similar distribution of samples with a high and a low density of mitotic figures in each subset. For network training, we used a patch size of 512  512 pixels and a batch size of 12. Each batch contained three images of each scanner. To overcome class imbalance, we employed a custom patch sampling, where half of the training patches were sampled randomly from the slides and the other half was sampled in a 512-pixel radius around a randomly chosen mitotic figure. Furthermore, we performed online data augmentation with random flipping, affine transformations and random lightning and contrast change. We trained the network with a cyclical maximal learning rate of

for 200 epochs until convergence. For loss computation, we calculated the standard RetinaNet loss as the sum of the bounding box regression loss and instance classification loss and added the domain classification loss. Both classification losses (instance and domain) were calculated using the Focal Loss 


. For patches of the Leica scanner, which were not annotated, only the domain classification loss was considered. During backpropagation, the

GRL negates the gradient and multiplies it with , a weighting factor which was gradually increased from 0 to 1 during training. Model selection was guided by the highest performance on the validation set as well as the highest domain confusion, i.e. highest domain classification loss, to ensure domain independence of the computed features.

Evaluation and Results

The training procedure elaborated above was repeated three times and the validation slides of the three annotated scanners were used for performance assessment. To compare results across different model operating points, we constructed precision-recall curves and compared the area under the precision-recall curves averaged over all three scanners for which mitotic figure annotations were available. As our final model, we selected the model with the highest mean AUCPR on the validation set and selected the operating point according to the highest mean F score. fig. 2 shows the AUCPRs of the final model with a mean AUCPR of 0.7964 and an F score of 0.7533 at an operating point of 0.62. When integrating the selected model into a submission docker container and evaluating it on the preliminary test set, we scored a mean F score of 0.7514 resulting from a 0.6939 precision and a 0.8193 recall.

Figure 2: Validation area under the precision-recall curve per scanner.

Discussion and Conclusion

In this work, we presented our baseline algorithm for the MIDOG challenge, based on domain adversarial training. With a validation F score of 0.7514, the algorithm is in line with previous mitotic figure algorithms trained and tested on breast cancer images from the same domain [bertram2020]. The similar F scores on the validation and preliminary test set indicate a successful domain generalization of the proposed network. The code used for training the network will be made publicly available in our GitHub111 repository after the final submission deadline.