In computer vision, data shift (i.e. a shift in the data distribution) has proven to be a major barrier for safe and robust deep learning real-world applications, such as computer vision for autonomous vehicles  
, pose estimation, medical image segmentation and classification  .
Among medical images, histopathological images are tissue sections stained and analyzed with a microscope by a pathologist to highlight features from the tissue related with diseases. These images are the gold standard for cancer diagnosis and still have a huge potential to improve clinical practices . Some data shifts for histopathological images are known such as differences in the acquisition device parameters, differences in the staining and the multitude of parameters in the different steps of the histopathology slide preparation. However, risk may also come from a type of data shift that is not yet known.
Domain Adversarial (DA)  training has proven to be effective against data shifts notably in histopathological images . However, classic DA training configuration requires a sample from the targeted source of data (e.g. data from a new laboratory). This is generally not a problem as it only requires to pick samples from the new environment and train with the DA without requiring to label these new data. However, in clinical application, it is generally not possible to fine-tune a posteriori, as it would require clinically validating the model again. But clinical applications need a proven robustness to satisfy regulatory requirements.
In light of this DA technique and the robustness requirement of medical applications, an important question is whether we can use data with different semantics (e.g. flowers images and vehicle images have different semantics, prostate cancer and lung cancer images have close but also different semantics) as DA data. For example, can we use a multi-source dataset of prostate cancer for DA training while running a lung cancer classification task. In order words, we question the transferability of the domain adversarial process across tasks and image semantics. Transferability of DA training would imply that any task could gain generalization from a large dataset not only through classical transfer learning but also through DA transfer.
In this paper, we first investigate to what extent DA methods are beneficial and whether it can be deleterious. We then investigate to what extent DA methods can be transferred across datasets of different semantics (inter-semantic domain adversarial). Our contribution is :
We analyze the DA efficiency by describing 3 effects (Figure 4) : the cost (i.e. negative difference of accuracy with baseline due to DA training), the degradation (i.e. negative difference of accuracy with baseline due to data shift), and the gain (i.e. positive difference with baseline after estimation of cost and degradation due to the consistency between data shift and DA). We further combine these effects in a regression model (see Overview of our approach) and show that a misuse of artificial domain shift such as color shift can be deleterious.
We test to what extent DA training can be effective when the main task datasets and domain adversarial datasets are of different semantics. We show that DA can be transferred inter-semantically and that a small intensity of shift is sufficient to prevent most of the performance degradation due to the data shift.
Ii Background and Related Work
In this Section, we review methods that were described to increase robustness and generalization over data shift.
Among these methods, we find :
Stain and brightness normalization    : the technique requires to infer a source and a target stain, then to transform the input image. It is easy to implement but can hurt performance if the stain inference step for source image is not robust enough, Ren at al. provided a solution using ensembles . Stain normalization can provide precise information about the stainings that can be later used for quality controls.
Domain Adversarial : a domain adaptation method where a DA branch is added after a feature extractor using a Gradient Reversal Layer, which prevents the top layer of the feature extractor from containing information about the domain considered irrelevant for the main task. DA can be seen as a disentanglement method of data shift over informative features. The DA training is directly targeting the model, therefore prediction time is not modified. Visual domain adaptation methods are reviewed in .
Iii Material and Methods
We used MNIST and Fashion-MNIST datasets, both composed of 60,000 images with dimensions 28x28 (Figure6).
We used 2 datasets of histopathological images. The first is CAMELYON , composed of 327,680 color images with dimensions 96x96x3 extracted from histopathological analyses of lymph node sections. Each image is annotated with a binary label indicating the presence of metastatic tissue. The second dataset, which we call TissueNet, is the dataset from the TissueNet challenge . This dataset is composed of more than 5,000 images of uterine cervical tissue from 18 medical centers across France. The images are labelled on 4 levels depending on the grade of the cancer as follows :
0 : benign
1 : low malignant potential
2 : high malignant potential
3 : invasive cancer
We used the RandomResizedCrop transformation of the torchvision python library to generate 60,000 images of dimensions 96x96x3.
Iii-B Domain Adversarial
The proposed architecture is based on the one described in Ganin et al.
It includes a deep feature extractor and a deep label predictor, which together constitute a standard feed-forward architecture like a CNN. Domain adaptation is achieved by adding a domain classifier connected to the feature extractor via a gradient reversal layer (Figure1
). By adding a domain classifier after the feature extractor architecture, we build the domain adversarial neural network (DANN). The domain classifier is trained with a mix of datasets from different domains (i.e. with different data shifts) labeled with their domain (Figure3).
Iii-C Data shift
We tested both destructive shift (e.g. blur or noise), and domain shift (e.g. color shift and luminosity shift). The former aim to provide robustness against degraded input that can drastically hurt performance in a scenario where the model was trained on a curated dataset with high-quality images. The latter aim to provide invariance, therefore robustness, against input of the same quality but with natural data shift.
Blur is done by convolving an image with a normalized uniform filter. The kernel is obtained by the following equation :
with a matrix of 1 of dimension by .
Iii-C3 Color shift
The color shift is based on the stain normalization method of reinhard 
. We convert the image in the LAB format, then we fit the mean and the standard deviation of the 3 channels.
is the original image, the target image , is the average function and is the standard deviation. .
More precisely, we use this normalization process as a color shift according to a reference image (Figure 2).
The default training configuration requires 4 datasets (only 2 if we don’t run DA) :
the train and test datasets as for main classification task
the two datasets used to train the DA (generally there is a data shift between these two DA datasets)
Note that the semantic of datasets for the main classification task and for the DA learning can be different. The training of the classification task and training of the DA are simultaneous (Figure 3). Test shift refers to the data shift in the test dataset, and DA shift refers to the data shift between the two DA datasets.
Iii-E Regression model
We used a regression model such that the data shift and DA effects are defined by 4 terms (Figure 3):
The reference is the performance of a raw CNN tested without data shift in the test set.
The degradation term corresponds to the difference of accuracy with the reference due to a data shift in the test dataset.
The cost term corresponds to the difference of accuracy with the reference due to the DA training without any degradation.
The gain term corresponds to the difference of accuracy with a model after applying degradation and cost to the reference, such that the previous equation is satisfied.
Iv Experiments and Analyses
Iv-a Characterization of the effect of domain adversarial training over model performance against noise perturbation
We first characterized the influence of data shift and DA training over the model performance. We used a regression model to link model performance to the data shift and DA training. This model is composed of 4 terms, the reference, the degradation due to data shift, the cost due to DA and the gain due to DA after degradation and cost (see Materials and Methods).
Using MNIST dataset and noise as data shift, the reference model reached 0.98 accuracy. We found a sigmoidal relation between noise intensity and performance degradation and the accuracy falls to 0.20 (degradation of 0.78) with a noise intensity of 1.2. We found that DA training had no deleterious impact over the model accuracy, therefore DA has no cost. We model the bivariate DA gain function as the product of two univariate functions: a gaussian function dependent on the noise intensity, and a quadratic function dependent on the DA intensity (Figure 5).
Interestingly, gain depends heavily on the data shift intensity but is almost invariant to DA intensity, suggesting that most of the DA benefits can be reached at very low DA intensity, this is further discussed in the Discussion. The maximum gain of 0.50 accuracy is reached with an intermediate noise intensity (1.1 ) and a low DA intensity (0.5).
Iv-B Inter-semantic transferability of the domain adversarial training in the MNIST dataset against noise perturbation
Similar tests were performed using Fashion-MNIST instead of MNIST for the DA training, noise has been applied as previously. In this configuration, the DA training runs inter-semantically.
Training using DA training on Fashion-MNIST shows results that are close to the previous experiment (Figure 7). We find a reference accuracy of 0.98 and a sigmoidal relation between noise intensity and performance degradation, and a maximum degradation of 0.78 with a maximum noise of 1.2. We find no cost associated with DA training. The gain is also the same as the configuration with MNIST in the DA datasets, modeled by the product of a gaussian function and a quadratic function, and mainly sensitive to noise intensity. The maximum gain of 0.45 accuracy is reached with an intermediate noise intensity as test shift (0.9) and an intermediate noise intensity as DA shift (0.9).
This experiment shows that using DA inter-semantically shows similar DA gain compared to intra-semantical DA. It suggests that the DA training for noise is entirely transferable between MNIST and Fashion-MNIST, such that Fashion-MNIST can be used equivalently to MNIST for the DA training in a MNIST classification task.
We performed similar tests using black images (Figure 6) instead of the Fashion-MNIST dataset as DA datasets and showed it has significant DA gain of 0.27. However, the gain is smaller in this experiment compared to previous experiments. Therefore the DA training is partially transferable from black images to MNIST.
Iv-C Characterization of the effect of domain adversarial against blur perturbation
In a second step we replaced noise by blur as data shift. Blur is a perturbation that often occurs in digitized histopathological samples. Here, we use MNIST for the train and test datasets, and also MNIST for the DA. The reference accuracy is 0.98 and there is no DA cost. There is a sigmoidal relation between blur intensity and performance degradation and accuracy falls to a minimum accuracy of 0.23 (degradation of 0.75) for a kernel size of 9. Once more, the gain is mainly sensitive to the blur intensity as a Gaussian function. The maximum gain is 0.25 for a kernel size of 7 in the dataset test and of 3 for the DA (Figure 7).
We then replaced the MNIST dataset by a Fashion-MNIST dataset as DA datasets. We still have a degradation that can be modeled by a sigmoid function, we find a constant cost equal to zero and an important gain of 0.26. With similar results using Fashion-MNIST in the DA rather than MNIST, we showed that the inter-semantical DA transferability is applicable also with another data shift than noise, here blur.
Iv-D Characterization of the effect of domain adversarial training on histopathological datasets against blur perturbation and color shift
We next used the CAMELYON dataset for both the main task and the DA and used blur as data shift as it is often found as natural perturbation in histopathological images. The reference accuracy is 0.85 and DA has a significant cost of 0.05 while there was no significant gain. This shows that DA can be deleterious depending on the processed dataset, and that the gain of DA is not ubiquitous. A possible explanation could be that blur erases important information from the histopathological data, therefore the model cannot recover from this kind of data degradation. Maximum blur with kernel size of 9x9 showed a degradation of 0.18 making accuracy fall to 0.67 (Figure 8).
We next studied color shift, another common shift from histopathological images, as data shift with the CAMELYON dataset. As color shift depends on more than one parameter, it is difficult to introduce a consistent intensity for a color shift. Thus, we create datasets of different domains of color shift that show different degradation by normalizing the dataset according to different reference images (Figure 2).
The reference has an accuracy of 0.85. The degradation varies from 0.03 to 0.26 depending on the domain shift. The cost varies from 0.0 to 0.10 with a small correlation with the amplitude of the degradation. As the cost is not null, it is important to carefully design the DA architecture and training process in order to prevent a loss of accuracy for the main task. The gain varies from 0.04 to 0.28, the gain is well correlated with the degradation. This is intuitive because the more performance degrades, the more DA training can be helpful.
Iv-E Characterization and inter-semantic transferability of the domain adversarial training on histopathological datasets against color shift
We next used the TissueNet dataset for the DA training instead of the CAMELYON dataset, while keeping CAMELYON dataset for the main classification task and using color shift as data shift. This configuration is similar to the previous experiment using MNIST for the main task and Fashion-MNIST for the DA (Figure 9). However, there is no notion of data shift intensity for color shift because color shift is multidimensional.
In this configuration, the reference accuracy is 0.82. The degradation varies from 0.03 to 0.26 depending on the domain shift. The cost varies from 0.0 to 0.16 with a small correlation with the amplitude of the degradation. The gain varies from 0.02 to 0.28, the gain is well correlated with the degradation. Values of cost, degradation and gain are very close to the previous intra-semantic configuration, showing that DA training can be efficiently transferred inter-semantically in real histopathological images. Further investigations will be needed in order to understand if DA training can be applied to histopathological data with a gain greater than a cost universally, therefore increasing robustness of the model.
V Discussion and Conclusion
Histopathological data is highly heterogeneous due to the diversity of acquisition devices and the lack of standard, while histopathological data is hardly available because many regulatory requirements are necessary to get access to clinical data. Together, lack of availability and heterogeneity are a major barrier for the development of safe and robust models. Therefore, we developed here a strategy to increase robustness using all available data diversity using domain adversarial methods.
By systematic analysis of DA effect over the model performance, we found that when DA is efficient, a low intensity in DA shift is sufficient to provide most of the possible gain from DA. But DA is not always efficient, blur degradation on histopathological datasets could not be retrieved using DA methods, this could be explained by two reasons: first, blur is already present in the original CAMELYON dataset therefore DA has no effect, and second, blur is a destructive noise which quickly makes classification impossible because relevant information may be found in high resolution patterns. Finally, inter-semantic DA transferability is an efficient strategy as it works using different dataset and with non-real data as the DA with black images showed an effective performance improvement.
In conclusion, DA training is transferable inter-semantically and the robustness of clinical algorithms can be increased by taking advantage of the heterogeneity of available datasets, whatever their semantic content is. Further investigation will be needed to understand when DA training is beneficial and when it is deleterious. Another remaining question is whether DA training can be done with inner inter-semantic datasets (data of different domains are also of different semantic). In this configuration, DA might erase features that are relevant for the main task. However, use of DA should be careful as it can significantly and negatively affect model performance.
-  (2020-10) IDDA: a large-scale multi-domain dataset for autonomous driving. IEEE Robotics and Automation Letters 5 (4), pp. 5526–5533. Note: arXiv: 2004.08298Comment: Accepted at IROS 2020 and RA-L. Download at: https://idda-dataset.github.io/home/ External Links: Cited by: §I.
-  TissueNet: Detect Lesions in Cervical Biopsies. (en). External Links: Cited by: §III-A.
-  Tailoring automated data augmentation to H&E-stained histopathology. pp. 11 (en). Cited by: 1st item.
Can Autonomous Vehicles Identify, Recover From, and Adapt to Distribution Shifts?.
Proceedings of the 37th International Conference on Machine Learning, pp. 3145–3153 (en). Note: ISSN: 2640-3498 External Links: Cited by: §I.
-  (2016-05) Domain-Adversarial Training of Neural Networks. arXiv:1505.07818 [cs, stat]. Note: arXiv: 1505.07818Comment: Published in JMLR: http://jmlr.org/papers/v17/15-239.html External Links: Cited by: §I.
Image-to-Image Translation with Conditional Adversarial Networks.
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp. 5967–5976 (en). External Links: Cited by: 3rd item.
-  (2014-06) A Nonlinear Mapping Approach to Stain Normalization in Digital Histopathology Images Using Image-Specific Color Deconvolution. IEEE Transactions on Biomedical Engineering 61 (6), pp. 1729–1738. Note: Conference Name: IEEE Transactions on Biomedical Engineering External Links: Cited by: 2nd item.
-  (2017) Domain-adversarial neural networks to address the appearance variability of histopathology images. arXiv:1707.06183 [cs] 10553. Note: arXiv: 1707.06183Comment: MICCAI 2017 Workshop on Deep Learning in Medical Image Analysis External Links: Cited by: §I.
-  (2018-06) 1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset. GigaScience 7 (6) (en). External Links: Cited by: §III-A.
-  (2009-06) A method for normalizing histology slides for quantitative analysis. In 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Boston, MA, USA, pp. 1107–1110 (en). External Links: Cited by: 2nd item.
-  (2019-09) Can we trust deep learning models diagnosis? The impact of domain shift in chest radiograph classification. (en). External Links: Cited by: §I.
-  (2020-08) Abstract 2105: HE2RNA: A deep learning model for transcriptomic learning from digital pathology. Cancer Research 80 (16 Supplement), pp. 2105–2105 (en). Note: Publisher: American Association for Cancer Research Section: Clinical Research (Excluding Clinical Trials) External Links: Cited by: §I.
-  (2001) Color Transfer between Images. IEEE Computer Graphics and Applications, pp. 8 (en). Cited by: §III-C3.
-  (2019) Unsupervised Domain Adaptation for Classification of Histopathology Whole-Slide Images. Frontiers in Bioengineering and Biotechnology 7, pp. 102. External Links: Cited by: 2nd item.
-  (2020-02) Pix2Pix-based Stain-to-Stain Translation: A Solution for Robust Stain Normalization in Histopathology Images Analysis. In 2020 International Conference on Machine Vision and Image Processing (MVIP), Iran, pp. 1–7 (en). External Links: Cited by: 3rd item.
Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology. Medical Image Analysis 58, pp. 101544. Note: arXiv: 1902.06543Comment: Accepted in the Medical Image Analysis journal External Links: Cited by: 1st item.
-  (2016-08) Structure-Preserving Color Normalization and Sparse Stain Separation for Histological Images. IEEE Transactions on Medical Imaging 35 (8), pp. 1962–1971. Note: Conference Name: IEEE Transactions on Medical Imaging External Links: Cited by: 2nd item.
-  (2018-10) Deep Visual Domain Adaptation: A Survey. Neurocomputing 312, pp. 135–153. Note: arXiv: 1802.03601Comment: Manuscript accepted by Neurocomputing 2018 External Links: Cited by: 4th item.
-  (2020-06) Cross-denoising Network against Corrupted Labels in Medical Image Segmentation with Domain Shift. (en). External Links: Cited by: §I.
Keypoint-Graph-Driven Learning Framework for Object Pose Estimation. pp. 1065–1073 (en). External Links: Cited by: §I.
-  (2017-10) Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In 2017 IEEE International Conference on Computer Vision (ICCV), Venice, pp. 2242–2251 (en). External Links: Cited by: 3rd item.