Histopathology involves staining patient biopsies for microscopic inspection to identify visual evidence of diseases. The most widely used stains in histopathology are Hematoxylin and Eosin . Hematoxylin has a deep blue-purple color and stains acidic structures such as nucleic acids (DNA in cell nuclei). While Eosin is red-pink and stains basic structures such as nonspecific proteins in the cytoplasm and the stromal matrix. Staining, therefore, increases the visibility of the structural artifacts present in the biopsy, making it easier to examine. Further, these biopsies are digitized using slide scanning machines for future analysis and electronic transmission.
Computer vision is becoming increasingly useful in the field of histology for computed-aided diagnosis and discovering information about microscopic cell structures contained within histopathological images 
. Digitized biopsies as high-dimensional datasets have proven to have tremendous potential in training deep learning algorithms for diagnosis and visual understanding of diseases. Convolution Neural Networks (CNNs) have been applied for the prediction of the presence of disease using patches of whole slide images to diagnose breast cancer, enteropathies and other gastrointestinal diseases[14, 24, 33]
. The performance and fairness of such data driven methods is dependent on the data that they are trained on. Therefore, it is of critical importance that the training data be free of any bias that might skew the models. A common source of such bias is significant variation in the stain color distribution from image to image. This is due to the discrepancies in the staining process that include tissue preparation, raw materials, manufacturing protocol, and digital scanners across different sites where these slides are prepared. Multiple H&E stain distributions within CNN input data can lead to biased predictions in which the results are influenced by color differences and not the cell structures of interest for clinical diagnostic interpretation. Additionally, it causes difficulty for a trained model to make predictions on a tissue with a new stain appearance that the model has not been trained on.
To overcome these issues, researchers have developed normalization techniques to convert all input images to an equivalent color distribution. Some of the most popular stain normalization techniques depend on a qualitatively chosen target image that represents an ideal color appearance [15, 9, 28]. The input (source) image is normalized to match the stain profile of the chosen target image. The obvious downside to this approach is that the normalization is highly dependent on the color distribution of a single image. Rather than using just one target image to represent an entire stain distribution, an alternative approach is to consider an entire set of images that share the same stain distribution as the target domain. A mapping functions can then be learned to translate images from a particular source domain to the target domain. The problem can be modelled as an unsupervised image to image translation task .
Recently, Generative Adversarial Networks (GANs) have been shown to demonstrate exceptional results in unpaired image translation tasks [34, 37, 10]. The challenge posed by stain normalization task, however, is that during image translation it has to be made sure that the resulting image is as finely detailed as the input image, and also that the microscopic structural properties of the image are preserved in the process. Additionally, since the biopsy slides can be sourced from multiple sites, the framework should be capable of mapping different stain distributions to a common target distribution.
In this paper, we propose a novel adversarial approach that can execute many-to-one
domain stain normalization. A custom loss function, structural cycle consistency loss, is designed to make sure that the structure of the image is preserved. Self-attention is used to ensure that highly detailed microscopic features can be synthesized in the image. Our approach and other leading stain normalization techniques are compared on duodenum biopsy image data that was used to diagnose Celiac or Environmental Enteropathy disease in children. SAASN demonstrated superior performance in preserving the structural integrity of images while transferring the stain distribution from one domain to the other.
The earliest methods that attempted stain normalization were primarily simple style transfer techniques. Histogram specification mapped the histogram statistics of the target image with the histogram statistics of the source . This approach only works well if the target and source images have similar color distributions. Forcing the normalization of the source image to match the histogram statistics of the target can create artifacts which alter the structural integrity. Color transfer with histogram specification can also be performed in a decorrelated CIELAB (LAB) color space . The LAB color space is designed to approximate the human visual system. For H&E stained histology images, the presence of each stain or the lack thereof at each pixel should represent the most appropriate color space. Considering this, researchers developed stain normalization methods that outperformed the histogram specification technique by leveraging stain separation.
These techniques start with converting RGB image into Optical Density () as , where is the total possible illumination intensity of the image and is the RGB image. Color Deconvolution (CD) is made easier in the OD space, because the stains now have a linear relationship with the OD values. The CD is typically expressed as , where
is the matrix of stain vectors andis the stain density map. The stain density map can preserve the cell structures of the source image, while the stain vectors are updated to reflect the stain colors of the target image.
In macenko2009method (Macenko), stain separation is computed using Singular Value Decomposition on the OD tuples. Planes are created from the two largest singular values to represent H&E stains. One useful assumption with this approach is that the color appearance matrix is non-negative which makes sense, because a stain value of zero would refer to the stain not being present at all. The approach by vahadane2016structure (Vahadane) also includes the non-negative assumption as well as sparsity, which assumes that each pixel is characterized by an effective stain that relates to a particular cell structure (nuclei cells, cytoplasm, etc.). The stain separation is generated with Sparse Non-negative Matrix Factorization (SNMF) where the sparsity acts as a constraint to greatly reduce the solution space. SNMF is calculated using dictionary learning via the SPAMS package.
Macenko and Vahadane are both unsupervised techniques but there are also supervised approaches to this problem. khan2014nonlinear applies a relevance vector machine or a random forest model to classify each pixel as hematoxylin, eosin or background. The authors provide a pre-trained model for cases where the stain color distribution of the source images is close to their training data. Training a new model would require a training set with pixel level annotations for each stain. After the stain separation, the color of the target image is mapped with a non-linear spline. The non-linear mapping approach can lead to undesirable artifacts and this normalization approach is more computationally costly than the unsupervised approaches.
Recently, techniques for stain normalization have progressed to include deep learning approaches such as autoencoders and GANs[8, 23]. The StainGAN approach  applied the CycleGAN framework for one-to-one domain stain transfers. In a one-to-one stain transfer situation, the cycle-consistency loss is calculated by taking the distance between the cycled image and the ground truth. In a many-to-one situation, the cycled image will likely have a different color appearance than original image. Therefore, a new loss function that focuses on image structure and not the color differences is required.
Biopsy images contain a lot of repetitive patterns across the image in form of the recurring cell structures, stain gradients and background alike. While translation, these spatial dependencies can be used to synthesize realistic images with finer details. Self-attention  exhibits impressive capability in modelling long range dependencies in images. SAGAN  demonstrated the use of self-attention mechanism into convolutional GANs to synthesize images in a class conditional image generation task. We incorporate these advances in SAASN to enable it to efficiently find spatial dependencies in different areas of the image.
The general objective of the proposed framework is to learn the mapping between stain distributions represented by domains and . Since the aim of the approach is to make the stain patterns normalized over the entire dataset, one of these domains can be considered as the target domain (say ). The task is then to generate images that are indistinguishable from the images in the target domain based on stain differences. Stain normalization is a task that desires translation of images to a singular domain of stain distribution. This allows us to have multiple sub-domains in domain representing different stain patterns. The overall objective then becomes to learn mapping functions and given unpaired training samples , , where denotes the number of sub-domains in and , . The distribution of the training dataset is denoted as and . Additionally, two discriminator functions and are used. is employed to distinguish mapped images from while in a similar fashion is used to distinguish from . As illustrated in the Figure 1, the mapping function will map images from domain to a previously undefined sub-domain whose boundary is defined by the optimization function and the train data distributions in domain . The overall optimization function used to train the designed framework includes a combination of adversarial loss , cycle consistency loss , identity loss , structural cycle consistency loss based on the structural similarity index  and a discriminator boundary control factor.
Adversarial loss is used to ensure that the distribution of the generated images matches the distribution of the real (ground truth) images in that domain. The objective for the mapping function and the corresponding discriminator is defined as:
Here tries to generate images that are indistinguishable from images in domain and consequently fool the discriminator , i.e. the generator tries to minimize the given objective function while the discriminator tries to maximize it. Similarly the objective for the reverse mapping function is defined. The presence of multiple distinct stain distributions in the domain can make it challenging for the discriminator to learn the decision boundary surrounding the domain . This can especially pose a challenge when there is an overlap or proximity in the stain distribution of one of the sub-domains of and the target domain in the high-dimensional space. Therefore, to make sure that the decision boundary learned by does not include sections of the target domain , a discriminator boundary control factor is added to the optimization function as follows:
Cycle consistency loss  is implemented to reconcile with the unpaired nature of the task. To overcome the lack of a ground truth image for a fake image generated in a particular domain, the image is mapped back to its original domain using the reverse mapping function. The reconstructed image is then compared to the original source image to optimize the mapping function as follows:
Structural cycle consistency loss is added to the objective function to alleviate the shortcomings of the cycle consistency loss for many-to-one translation. In a many-to-one situation the cycled images are likely to have a distinct color distribution than any of the sub-domains. Therefore minimizing the distance between original and the cycled image alone is not an effective way to ensure cycle consistency. We use a color agnostic structural dissimilarity loss based on the Structural Similarity (SSIM) index  as follows:
Additionally, we need to make sure that the the mapping learnt by the generator does not result in the loss of biological artifacts in the images. The structural dissimilarity loss is also computed between the mapped and the original image:
are the respective means and standard deviations of the windows (and ) of the fixed size
that strides over the input image.and are stabilizing factors that prevent the denominator from disappearing. These measures are calculated for multiple corresponding windows of gray-scaled input images and aggregated to get the final measure. Gray-scaled inputs are used to focus on structural differences between images and not changes in color.
Identity loss  is utilized to regularize the generator and preserve the overall composition of the image. The generators are rewarded if a near identity mapping is produced when an image from the respective target domain is provided as an input image. In other words, when an image is fed into a generator of its own domain, the generator should produce an image that is nearly identical to the input. This is enforced by minimizing the distance of the resulting image with the input image as follows:
The overall objective function then becomes:
where parameters , , and manage the importance of different loss terms. The parameters in the generators and the discriminators are tuned by solving the above objective as:
In the following sections, we describe the implementation and compare our results with other current state-of-the-art methods of color normalization with both multiple () and single () sub-domains in .
Dataset and Implementation
For this paper, duodenal biopsy patches were extracted from 465 high resolution WSIs from 150 H&E stained duodenal biopsy slides (where each glass slide could have one or more biopsies). The biopsies were from patients with Celiac Disease (CD) and Environmental Enteropathy (EE). The biopsies were from children who underwent endoscopy procedures at either Aga Khan University Hospital in Pakistan (10 children <2 years with growth faltering, EE diagnosed on endoscopy, n = 34 WSI), University Teaching Hospital in Zambia (16 children with severe acute malnutrition, EE diagnosed on endoscopy, n = 19 WSI), or the University of Virginia Children’s Hospital (63 children <18 years old with CD, n = 236 WSI; and 61 healthy children <5 years old, n = 173 WSI). It was observed that there was a significantly large stain variation between images originating from different sites. While images from Pakistan were different tones of dark blue, images from USA were more pink with images from Zambia lying somewhere in the middle of this spectrum.
There is always some degree of physical variation between histological sections from different sites. In this study, our approach and other competing methods were performed on pixel patches generated from the images, which were further resized to pixel to marginally reduce the resolution. In the multi-sub-domain setup, patches from Pakistan (sub-domain ) and Zambia (sub-domain ) were both considered to be in domain and patches from USA to be in domain . While in single sub-domain training setup, patches from Pakistan were considered to be in domain and USA to be in domain . For training both and had patches where contributed and patches. Testing metrics were computed on patches in each sub-domain.
The generator network is a modified U-Net  which has been shown to generate excellent results in image translation tasks . U-Net is encoder-decoder network  that uses skip connections between layers and where is the total number of layers in the network. In previous encoder-decoder architectures [19, 31, 35]. The input is passed through a series of convolutional layers that downsample the input until a bottleneck is reached after which the information is upsampled to generate an output of the desired dimensions. Therefore, by design all information passes through the bottleneck. In a stain normalization task, the input and the output of the network share a lot of general information that might get obscured through the flow of such a network. Skip connections in a U-Net solve this problem by circumventing the bottleneck and concatenating the output from the encoder layers to the input of the corresponding decoder layers.
The discriminator is a
block convolutional neural network, which eventually outputs the decision for each image. Every convolutional block in both the generator and the discriminator is a module consisting of a convolution-normalization-ReLU layers in that order. Both instance and batch 
normalization were used; and batch normalization was empirically chosen for the final network. The convolutional layers have kernel size ofand stride , with the exception of the last layer in the discriminator which operates with stride .
Self-attention layers  were added after every convolutional block in both the generator and the discriminator network. The self-attention mechanism complements the convolutions by establishing and leveraging long range dependencies across image regions. It help the generator synthesize images with finer details in regions based on a different spatial region in the image. Additionally the discriminator with self-attention layers is able to enforce more complex structural constraints on input images while making a decision. As described in SAGAN , a non-local network  was used to apply the self-attention computation. The input features are transformed using three different learnable functions analogous to query, key and value setup in  as follows:
where , , and . Also, is the number of channels, of the feature map from the previous layer and is an adjustable parameter. For our model, was set as . The attention map is further calculated as:
where represents the attention placed on location while synthesizing location . The ouput is calculated as:
The output is then scaled and added to the initial input to give the final result,
where is a learnable parameter that is initialized to 0.
Spectral normalization when applied on the layers of the discriminator network has been shown to stabilize the training of a GAN . Moreover, based on the findings about the effect of a generator’s conditioning on its performance, sagan argue that while training a self-attention based GAN, both the generator and the discriminator can benefit from using spectral normalization. Therefore, a spectral normalization (with spectral norm of all weight layers as ) was added to all the networks.
The parameter values of , , and were empirically chosen after experimentation for the evaluation model. Across all experiments, we used the Adam optimizer  with a learning rate of . The model was trained for the first epochs with a fixed learning rate and the next epochs while linearly decaying the learning rate to . Instead of updating the discriminator with an image generated form the latest generator, a random image selected from a buffer of previously generated images was used to perform the update cycle . Least-squares adversarial loss inspired from LSGAN  was used instead of the described cross-entropy loss for some experiments. The least-squares loss stabilized the training but there was no significant visual difference in the results produced.
Results and Evaluation
The SAASN approach is compared to two of the most popular unsupervised stain normalization techniques, Macenko  and Vahadane . The popular supervised approach by Khan  could not be tested, because the default pre-trained classifier performed poorly when transferring stains for these stain domains. We did not have access to data with stain labels per pixel to use for us to train our own classifier for this method. Two different SAASN networks are tested: the many-to-one and a one-to-one stain transfer approaches. The many-to-one network demonstrates that the SAASN approach can transfer multiple stain domains to a single domain. The to and to stain transfers are evaluated from this network. The one-to-one network is utilized to evaluate the to stain transfer.
To evaluate the stain transfer, the Structural Similarity (SSIM) index is again utilized. SSIM is calculated by comparing the normalized image with the original. Both images are converted to gray-scale before beginning SSIM calculations. For color images, SSIM is influenced greatly by the differences in color for each image. A change in color is expected when performing stain transfers, so SSIM on RGB images is not an effective measure in this case. The main concern is whether the structural integrity of the image is maintained after transferring stains which is why gray-scaled images are used. Additionally, visual comparisons are provided to evaluate stain transfer capability.
SSIM scores were calculated on test sets for each of the three stain normalization techniques. The results are compiled and displayed as box plots in Figure 4. For the to and the to stain transfers, the median values for SAASN are higher than the other two normalizations and the interquartile ranges are much smaller. This demonstrates that SAASN not only is better at preserving structure, but also consistently transfers stain without many anomalies. The unsupervised approaches can struggle if the source has a much different stain distribution than the target. This can lead to the stains appearing in the wrong areas on the normalized image. SAASN is able to leverage information from entire stain domains and therefore is not as affected by this issue. Patches with erroneous stain transfers can hinder the development of computer-aided diagnosis models. These results demonstrate that SAASN can be trusted to produce consistent stain transfers on a robust set of stain patterns in WSI patches.
In addition to assessing the structure-preserving ability of the stain normalization methods, visual comparisons are useful to ensure that the stains have transferred properly. In Figure 5, results are displayed for the three stain transfers. The images with the smallest -norm for combined Macenko and Vahadane SSIM values were selected to demonstrate the performance of SAASN. For to and to , the same target image from domain is used. For to , a target image from domain is used. The three selected source images are similar in that they all have a large majority of pixels containing connective tissue or background. The unsupervised approaches can struggle executing color deconvolution on these types of images. This is apparent in the Macenko and Vahadane normalizations shown in Figure 5. The stains are either inverted (hematoxylin-like color transferred to the background) or confusing connective tissue as an actual cell structure. Meanwhile, SAASN did not have difficulty identifying the connective tissue or background pixels in the source image. SAASN excelled at preserving structural similarity with the source image and successfully transferred the stain to a new domain without error. Trained model, code and extra results have been made publicly available111Redacted for blind review.
The proposed framework was successful in effective translation of images from one stain appearance to a desired one while preserving the biological artifacts in the process. This setup was specifically designed to accommodate a many-to-one stain transfer situation in which multiple stains are converted to a common domain. SAASN is compared to other leading stain normalization techniques using duodenal biopsy image data originating from three sites with different stain variations. Results showed that SAASN performed the best at preserving the structure of the source image after the stain transfer. SAASN consistently performed successful stain transfers even when the other techniques failed due to large variations between the source and target image stains and unconventional input image structures. We contend that the proposed unsupervised image to image translation can be successfully applied to general many-to-one image translation problems outside the medical domain as well.
Research reported in this publication was supported by the University of Virginia Translational Health Research Institute of Virginia (THRIV) Mentored Career Development Award (SS), the Bill and Melinda Gates Foundation (AA, OPP1138727; SRM, OPP1144149; PK, OPP1066118) and by the National Institute Of Diabetes And Digestive And Kidney Diseases of the National Institutes of Health under Award Number K23DK117061. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
-  (2014) Quantitative analysis of stain variability in histology slides and an algorithm for standardization. In Medical Imaging 2014: Digital Pathology, Vol. 9041, pp. 904108. Cited by: Introduction.
-  (2006) Exact histogram specification. IEEE Transactions on Image Processing 15 (5), pp. 1143–1152. Cited by: Related Work.
-  (2008) Hematoxylin and eosin staining of tissue and cell sections. Cold Spring Harbor Protocols 2008 (5), pp. pdb–prot4986. Cited by: Introduction.
-  (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: Approach.
-  (2006) Reducing the dimensionality of data with neural networks. science 313 (5786), pp. 504–507. Cited by: Network Architecture.
-  (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Cited by: Network Architecture.
Image-to-image translation with conditional adversarial networks.
Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134. Cited by: Network Architecture.
-  (2017) Stain normalization using sparse autoencoders (stanosa): application to digital pathology. Computerized Medical Imaging and Graphics 57, pp. 50–61. Cited by: Related Work.
-  (2014) A nonlinear mapping approach to stain normalization in digital histopathology images using image-specific color deconvolution. IEEE Transactions on Biomedical Engineering 61 (6), pp. 1729–1738. Cited by: Introduction, Results and Evaluation.
Learning to discover cross-domain relations with generative adversarial networks.
Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1857–1865. Cited by: Introduction.
-  (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: Training Details.
-  (2017) A survey on deep learning in medical image analysis. Medical image analysis 42, pp. 60–88. Cited by: Introduction.
-  (2017) Unsupervised image-to-image translation networks. In Advances in neural information processing systems, pp. 700–708. Cited by: Introduction.
-  (2017) Detecting cancer metastases on gigapixel pathology images. Technical report arXiv. External Links: Cited by: Introduction.
-  (2009) A method for normalizing histology slides for quantitative analysis. In 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 1107–1110. Cited by: Introduction, Results and Evaluation.
-  (2017) Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802. Cited by: Training Details.
-  (2018) Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957. Cited by: Network Architecture.
A decomposable attention model for natural language inference. arXiv preprint arXiv:1606.01933. Cited by: Introduction, Related Work, Network Architecture.
-  (2016) Context encoders: feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2536–2544. Cited by: Network Architecture.
-  (2001) Color transfer between images. IEEE Computer graphics and applications 21 (5), pp. 34–41. Cited by: Related Work.
-  (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: Network Architecture.
-  (2018) A study about color normalization methods for histopathology images. Micron 114, pp. 42–61. Cited by: Related Work.
-  (2019) Staingan: stain style transfer for digital histological images. In 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), pp. 953–956. Cited by: Related Work.
-  (2019) Deep learning for visual recognition of environmental enteropathy and celiac disease. arXiv preprint arXiv:1908.03272. Cited by: Introduction.
-  (2017) Learning from simulated and unsupervised images through adversarial training. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2107–2116. Cited by: Training Details.
-  (2016) Unsupervised cross-domain image generation. arXiv preprint arXiv:1611.02200. Cited by: Approach, Approach.
-  (2016) Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022. Cited by: Network Architecture.
-  (2016) Structure-preserving color normalization and sparse stain separation for histological images. IEEE transactions on medical imaging 35 (8), pp. 1962–1971. Cited by: Introduction, Results and Evaluation.
-  (2017) Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008. Cited by: Network Architecture.
-  (2018) Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803. Cited by: Network Architecture.
-  (2016) Generative image modeling using style and structure adversarial networks. In European Conference on Computer Vision, pp. 318–335. Cited by: Network Architecture.
-  (2004) Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13 (4), pp. 600–612. Cited by: Approach, Approach.
-  (2019) Automated detection of celiac disease on duodenal biopsy slides: a deep learning approach. Journal of pathology informatics 10. Cited by: Introduction.
-  (2017) Dualgan: unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE international conference on computer vision, pp. 2849–2857. Cited by: Introduction.
-  (2016) Pixel-level domain transfer. In European Conference on Computer Vision, pp. 517–532. Cited by: Network Architecture.
-  (2018) Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318. Cited by: Related Work, Network Architecture.
-  (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 2223–2232. Cited by: Introduction, Approach, Approach.