Deep learning-based algorithms have been used for a variety of diagnostic tasks for histopathology images of tissue samples stained using hematoxylin and eosin (H&E) [golatkar, gleason, nature_lung]. Most of these algorithms yield expected results when training and testing data have a similar color appearance. However, the performance of these algorithms drops drastically when tested with images from other labs [sethi2016empirical], primarily due to variations in stain appearance that result from differences in scanner sensor responses, H&E staining protocols, reagents, and habits of technicians. Such variations in colors are not an issue for expert pathologists, as they are trained to prioritize morphological features over strict color definitions of H&E stained slides. However, stain colors of training data create bias for a deep learning algorithm, which can cause problems at the time of deployment.
Color normalization is used to transform input samples in computational pathology to a pre-defined color space to aid deep learning algorithms [sethi2016empirical, khan, reinhard, macenko, vahadhane, gautam]. Most of the practical color normalization approaches depend on matrix factorization for color deconvolution. Such approaches take the reference color matrix from a small patch of the H&E stained image, which causes problems when the source image is from significantly different tissue region (anatomically speaking) than the target patch [gautam]
. Additionally, in order to estimate a robust factorization for the entire whole slide image (WSI), such methods often become computationally expensive. On the other hand, more recent color normalization techniques have turned to deep learning, particularly generative adversarial networks[isbigan, staingan, miccai19, inceptionlab], although such methods are computationally quite expensive to train, and cannot be easily integrated with downstream deep learning pipelines.
Our innovation lies in departing from matrix factorization as well as GANs. Instead, we use self-supervised learning, where a lightweight neural network is trained to estimate the color shift needed in each channel to match a pre-determined target stain in appearance. Compared to the state-of-the-art color normalization algorithms, the advantages of our algorithm are that it:
Trains fast (Section 3),
Tests faster (Table 2),
Has a more positive impact on the accuracy of downstream tasks, such as classification on CAMELYON17 dataset [camelyon] and segmentation on MoNuSeg dataset [monuseg] (Table 1),
Has fewer post-normalization image artifacts (Figure 1), and
Integrates easily with deep learning pipelines (Figure 2).
2 Related Work
Most of the color normalization methods compute a stain color matrix and stain density maps of source and target images and try to project the color matrix of the source image to that of the target image. Matrix factorization is a popular technique used for color deconvolution. Mecenko et al. [macenko]
had framed color deconvolution as singular value decomposition (SVD) problem to guarantee the solution for an optimization problem. Reinhard et al.[reinhard] used methods to match histograms for color densities in source and target images. Vahadane et al. [vahadhane] added sparsity to non-negative matrix factorization (NMF) to achieve better results. These approaches are patch-based. That is, the reference color matrix is taken from a small patch of the H&E stained image. This causes problems when the source image is from significantly different tissue region than the target patch [gautam]. The workaround in these methods against getting a biased estimate of the stain color matrix of source image is to sample the matrix from the entire image, which is computationally expensive. The proposed method does not compute the stain color matrix.
Deep learning-based approaches have recently been employed for color normalization, especially those using generative adversarial networks (GANs). For example, Aicha et al. [aicha] proposed an end-to-end deep learning method to learn stain normalization along with specific diagnosis tasks, where stain normalization is learned adversarially. Farhad et al. [isbigan] and Shaban et al. [staingan] use info-GAN [infogan] and cycleGAN [cyclegan] respectively in their color normalization techniques. Niyun et al. [miccai19] uses a color matrix of the source image as auxiliary information for a generator based on cycleGAN for better learning. GANs are hard to train and require higher compute at inference time. Moreover, GANs can change microscopic morphological features which may important for diagnosis. Dwarikanath et al. [inceptionlab] tries to resolve this problem by adding auxiliary feature from CNN trained for nucleus segmentation task, while SAASN [saasn] uses self-attention to preserve the local context, which makes their training process even more complicated. Additionally, Stanosa et al.’s model [stanosa] learns to pick a target template for normalization of H&E image according to the tissue type of the source image.
3 Proposed Method
The proposed method alleviates several problems in the existing factorization-based and deep learning-based color normalization methods based on the following features:
It uses a self-supervised fully convolutional neural network that is lightweight (2,352 trainable parameters) and easy to train. Training the proposed CNN architecture takes only half an hour on Nvidia 1080Ti GPU, and inference takes about a minute and a half per gigapixel. By comparison, GAN-based color normalization model can increase the inference time drastically as they have millions of parameters. Also, for template-based methods computation is performed on CPUs which makes inference time slower.
In the proposed method, WSI or set of WSIs with target color appearance acts as target domain, which can include a variety of tissue types. This eliminates problems caused by template-based methods such as the introduction of artifacts.
We can change the target color appearance by training proposed architecture on patches from new target WSIs easily, which is difficult in GAN based color normalization methods.
The overall block diagram is shown in Figure 2. We next describe the self-supervision, the architecture, and the training details.
Self-supervision: In self-supervision, we use label that comes for ”free” with data. There are several self-supervised approaches to learn representation of unlabeled data [selfsup]. In our setting, we train a deep learning model to predict the offset which is added synthetically to an unaltered patch sampled from WSI, . Target domain is a set of images, such that is the patch in . We perturb the and channel of a target histopathology tissue image by adding a random offset to it to get a synthetic source image, as shown in following equation.
are random numbers drawn independently from uniform distribution. We do not add offset to channel of because stain variation generally do not effect intensity of channel of image. Figure 2 shows sample target image and synthetically generated source image corresponding to it.
Model Architecture: Our model uses a modified version of DenseNet blocks. We have used four DenseNet blocks with only three convolutional filters at each layer. We hypothesize that learning the color offset
does not require to learn too many features as long as the prediction is obtained using a large enough receptive field. To increase the receptive field, we have used dilated convolutional layers in Dense Block 2 and Dense Block 3. Each convolutional layer in Dense Block is followed by leaky ReLU and a batch normalization layer. We have not used any pooling operation in our architecture. The size of feature map remains equal to size of the input image throughout the model. As we have used only three filters in every convolutional layer, the number of trainable parameters is also very small. Our model, which we callColorNormNet, contains only 2,352 trainable parameters.
Training details: We optimize parameters of a DenseNet [densenet] inspired fully convolutional neural network
to predict offset tensorsuch that
. We train ColorNormNet by optimizing following loss function
where is a hyper-parameter which balances and loss terms. We used for training ColorNormNet. We train this model with the patches from a target WSI. We observed that even patches are sufficient to train a color normalization network. Training the proposed CNN architecture takes about half an hour (800-1000 iterations) to converge on a Nvidia 1080Ti GPU with a batch size of 128 and patches of size , and Adam optimizer with learning rate 0.001. Figure 2 shows training and inference pipeline along with architecture of CNN model. We can also train a model with arbitrary patch size as the proposed model is fully convolutional.
|Classification Results (AUC)||Segmentation Results|
|Method||Center 1||Center 2||Center 3||Center 4||Center 5||Average||Dice||AJI||PQ|
4 Experiments and Results
We trained ColorNormNet on patches extracted from a randomly selected WSIs from the CAMELYON17 dataset [camelyon]. We compared the inference times for ColorNormNet with four state of the art methods – by Vahadane et al. [vahadhane], Reinhard et al. [reinhard], Macenko et al. [macenko], and Ramakrishna et al. [gautam]. We extracted the patches from WSI with more than one giga-pixel in size and observed the inference time for each method. Not only is ColorNormNet faster than the other methods, but it also does not produce any artifacts, as it estimates a single offset for the entire image. ColorNormNet also generalizes on the dataset from other centers and different organs. We have normalized WSIs from the TCGA dataset and observed that ColorNormNet yields better normalization on it. Figure 1 shows color normalization results on a sample WSI each from CAMELYON17 lymph node and TCGA lung adenocarcinoma datasets. A comparison of test time per giga pixel for different techniques, including ours, is shown in Table 2.
We used CAMELYON17 [camelyon] and MoNuSeg [monuseg] to evaluate ColorNormNet. CAMELYON17 contains a total of 500 WSIs, out of which 400 WSIs are of normal tissue while remaining 100 WSIs contain tumor region within them. This dataset is well-suited for the evaluation since the data is taken from five different centers and there is a significant staining variation amongst them. Out of 100 positive WSIs, only 50 WSIs (10 WSIs from each center) are annotated with the tumor regions. We used these 50 positive WSIs in our training and testing data along with the available negative WSIs. We split data from each center into training, validation, and testing with about 50%, 30%, and 20% slides in each split respectively. We got following train-validation-test split for the centers 1: 33(4)-22(4)-19(2), center 2: 33(4)-20(3)-15(3), center 3: 41(5)-24(2)-19(3), center 4: 31(4)-19(3)-20(3), and center 5: 32(3)-20(4)-19(3) (Numbers in parentheses indicate the number of positive WSIs in split). We trained a ResNet50 [resnet] architecture for the classification task. We used batch size of 32 with a learning rate of 0.001 and weight decay of 0.01 for the training. The classification model was trained and validated with data from one center and tested with test data from each center separately. We report the average of AUC obtained on test data of five data centers in Table 1. ColorNormNet achieves better average AUC across the centers by a large margin.
We also evaluated ColorNormNet on nucleus segmentation task. We used MoNuSeg [monuseg] for evaluation. MoNuSeg dataset contains data from nine different organs with structural and stain variation across different images. It contains a total of 30 annotated images. We split these images in training, validation and testing datasets with 20, 5 and 5 images in each split respectively. We trained HoVer-Net [hovernet]
five times using the normalized images with different methods and averaged the Dice score, aggregated Jaccard index (AJI), and panoptic quality (PQ) across the five runs for each normalization method. Table1 shows that ColorNormNet gives better AJI and PQ than other normalization methods.
Color normalization is an often-used image pre-processing method for deep learning based computational pathology tasks. We have proposed a color normalization method based on a fully convolutional neural network, which is trained in a self-supervised manner. The proposed ColorNormNet is computationally fast during both training and inference, and it yields a neural network that can be attached as pre-processing block to task specific CNNs. We have also validated ColorNormNet on tumor classification and nucleus segmentation task and shown that the accuracy of these downstream tasks improve more using ColorNormNet as compared to other color normalization methods.
6 Compliance with Ethical Standards
This research study was conducted retrospectively using human subject data made available in open access by [camelyon, monuseg]. Ethical approval was not required as confirmed by the license attached with the open access data.
No funding was received for conducting this study. The authors have no relevant financial or non-financial interests to disclose.