Learning a Deep Convolution Network with Turing Test Adversaries for Microscopy Image Super Resolution

01/18/2019 ∙ by Francis Tom, et al. ∙ SigTuple 0

Adversarially trained deep neural networks have significantly improved performance of single image super resolution, by hallucinating photorealistic local textures, thereby greatly reducing the perception difference between a real high resolution image and its super resolved (SR) counterpart. However, application to medical imaging requires preservation of diagnostically relevant features while refraining from introducing any diagnostically confusing artifacts. We propose using a deep convolutional super resolution network (SRNet) trained for (i) minimising reconstruction loss between the real and SR images, and (ii) maximally confusing learned relativistic visual Turing test (rVTT) networks to discriminate between (a) pair of real and SR images (T1) and (b) pair of patches in real and SR selected from region of interest (T2). The adversarial loss of T1 and T2 while backpropagated through SRNet helps it learn to reconstruct pathorealism in the regions of interest such as white blood cells (WBC) in peripheral blood smears or epithelial cells in histopathology of cancerous biopsy tissues, which are experimentally demonstrated here. Experiments performed for measuring signal distortion loss using peak signal to noise ratio (pSNR) and structural similarity (SSIM) with variation of SR scale factors, impact of rVTT adversarial losses, and impact on reporting using SR on a commercially available artificial intelligence (AI) digital pathology system substantiate our claims.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

(a) Overview of the adversarial learning process with Turing tests
(b) Real
(c) Bicubic
(d) SRNet
(e) T1
(f) T1+T2
(g) Real
(h) Bicubic
(i) SRNet
(j) T1
(k) T1+T2
Figure 1: Learning a super-resolution CNN for microscopy using relativistic visual Turing tests (T1 and T2), the results obtained for super resolving by 16

and comparison with bicubic interpolation. Recovery of cytoplasmic texture and nuclear chromatin is evident in WBCs (b-f) while does not significantly impact relatively smooth textured RBCs (g-k).

Single image super resolution (SISR) aims at estimating a high resolution (HR) image from a low resolution (LR) image. Image super resolution (SR) techniques can be used in microscopy to enhance the resolution of images acquired at a lower magnification thereby to reveal fine structures that could otherwise only be observed using a higher magnification lens. Accordingly SR images can be used to diagnose from images captured using a lower magnification, with diagnostic precision matching to that of images at a higher magnification, thereby facilitating to reduce the image acquisition time per slide significantly. SR being an ill-posed inverse problem gets challenging for high scaling factors, often affecting diagnostically relevant details such as texture in the SR images being absent. Fig. 

1 illustrates a simple example of failure to reproduce such fine details when learning a SR network (SRNet) using only distortion losses like the mean squared error (MSE), which fail to capture intricate details.

(a) Stage 1: Training SRNet to minimize reconstruction error.
(b) Stage 2: Training rVTT (T1) network on whole image.
(c) Stage 3: Training rVTT (T2) network on region of interest.
(d) Stage 4: Training SRNet with adversarial loss.
Figure 2: Framework for microscopy image super resolution involving two relativistic visual Turing test (rVTT) networks.

2 Prior Art

SISR has been traditionally solved using analytic or iterative methods that are physics-driven [1]

. Recent works use deep learning as a data driven mean to enhance optical microscopy resolution 


by employing mean square error with an edge weighting term as the loss function to be minimized. Another approach 


employs a convolutional neural network (CNN) for the purpose but also suffers from the same limitation. Recent attempts via introduction of adversarial learning working in tandem with minimization of reconstruction loss have been able to recover fine texture details in natural images 

[4, 5]. However clinical grade microscopy being susceptible to region of interest (ROI) requires learning of specific grades of texture representation to be restored by the network within different regions in the image. Here we present its possibility.

3 Method

The goal is to train a super resolution neural network (SRNet) () that estimates for a given LR input image ( ) its corresponding SR counterpart () that closely resembles the real HR image (). To achieve this, we propose a four stage learning process (Stage 1) train while optimizing its parameters with the objective to minimize reconstruction loss utilizing to update as presented in Fig 1(a). Subsequently, (Stage 2) another CNN termed as the relativistic visual Turing test (rVTT) () is trained with the objective to discriminate between the SR vs. Real HR image when presented with a matching pair of such images with shuffled order with the objective to be able to identify them correctly thereby minimizing while updating parameters of T1 with as illustrated in Fig. 1(b). This is used to quantify pair wise subtle difference in image perception which is a key factor different from distortion loss quantified in  [6] and is inline with the philosophy in [7, 5]. Next (Stage 3) we train another rVTT () with the objective to be able to quantify the perception difference between SR vs. Real HR in diagnostically relevant ROI such as white blood cells (WBC) in peripheral blood smears or epithelial cells in histopathology of tissue biopsy colected from metaplastic regions. T2 is trained to minimizing while updating its parameters with as illustrated in Fig. 1(c). The region proposal finder relies on ROI masks provided as ground truth along with the images. Here we differ from [5] since in pathological investigation microscopy the quantum of texture details varies with cells and tissue structure and in general the density of pathologically alarming cells being low [8], a single rVTT such as alone is not able to properly encapsulate texture perception for such trace occurring cells. Finally (Stage 4) the objective being to update such that it can mimic in SR images the relativistic perception of global and ROI specific texture evident in Real HR images, we once again update with derived from the adversarial loss as presented in Fig. 1(d). On achieving this ability of , it would essentially lead to maximization of and which forms the essence of adversarial learning.

Architecture: is similar to the one in [5]

which features residual-in-residual dense blocks followed by strided convolutions for upsampling.

and are also similar to the one in  [5] and are modified versions of the VGG architecture [9]

, with leaky ReLU non-linearity.

Loss function: The Stage 1 loss is defined as


where is defined as the VGG perception loss detailed in [10]. The Stage 2 and Stage 3 loss functions are similar to as proposed in  [7].


where denotes expectation over real HR images used in a mini-batch and denotes expectation over SR images in a mini-batch.


where and are the image patches corresponding to ROI selected as in Fig.1(c). The adversarial loss in Stage 4 is defined in Fig. 1(d).

3.0.1 Experiments, Results and Discussion

Dataset: We evaluate the performance on three datasets:

ALL-IDB [11] where first 33 out of 108 images in ALL-IDB1 belonging to the same magnification are used, with 30 images used for training and 3 for testing.

CRCHistoPhenotypes [12] has 100 H&E stained images of colorectal adenocarcinoma histology with nuclei annotated on them. We use 80 images for training and 20 for testing.

Sigtuple WBC dataset [13] contains images of WBCs randomly selected from more than normal and abnormal peripheral blood smears prepared using May Grunwald Giemsa and Leishman stains [8] imaged using a objective magnification in brightfield microscope. Here WBC patches of size were used for training and patches of same size were used for testing.

Training: Adam optimizer with a learning rate of is used. The network was trained over iterations updating Stages 1-4 per mini-batch. Learning rate decay by factor of 0.5 in intervals of iterations. Experiments were performed on a server with Intel Xeon 4110 CPU, GB DDR4 ECC Regd. RAM,

Nvidia Tesla V100 GPU with 16GB HBM, with software implementation on Ubuntu 16.04 LTS OS, Python 3.6, PyTorch 0.5, Nvidia CUDA 9.2 and cuDNN 7.1 for acceleration.

Impact of introducing rVTT observed in the three datasets is presented in Table 1, where minimum loss in perceptual index111https://www.pirm2018.org/PIRM-SR.html [6] with inclusion of the rVTTs is evident, along with the perception distortion tradeoff [6] inline with observations in [5]. Also observed in Fig. 1 and Fig. 3.

(a) Real (20x objective magnification)
(b) Bicubic (21.27 dB/0.49/6.81)
(c) SRNet (22.17 dB/0.59/6.38)
(d) SRNet-w (22.62 dB/0.62/6.69)
(e) T1 (20.58 dB/0.52/3.17)
(f) T1+T2 (19.33 dB/0.44/2.64)
Figure 3: Illustration of the performance at SR factor of on a sample from CRCHistoPhenotypes dataset. Corresponding PSNR, SSIM and PI values mentioned in brackets.
ALL Nearest Bicubic SRNet SRNet-w T1 T1+T2
PSNR 32.83 37.65 38.64 43.03 32.77 37.98
SSIM 0.88 0.94 0.95 0.97 0.92 0.96
PI 7.27 8.15 7.2 7.32 5.95 5.31
PSNR 22.41 25.26 25.91 26.31 24.74 23.34
SSIM 0.57 0.63 0.69 0.71 0.64 0.59
PI 13.07 7.38 6.53 7.19 3.71 3.27
PSNR 24.83 30.1 36.5 36.33 34.93 34.61
SSIM 0.78 0.88 0.96 0.95 0.94 0.94
PI 7.34 8.02 7.19 7.28 7.07 6.52
Table 1: Performance comparison over different datasets at SR factor of . Higher values of PSNR and SSIM are good, lower value of Perceptual Index (PI) is good, best case marked in bold. SRNet-w has edge weighting in .

Role of rVTTs across scale of super resolution is visible as the scale increases from to , in Fig. 4.

(a) Original Image (40x
objective magnification)
(b) 4x
(41.65 dB/0.98/6.35)
(c) 9x
(33.42 dB/0.93/6.05)
(d) 16x
(34.41 dB/0.94/5.95)
(e) 64x
(27.86 dB/0.75/6.19)
Figure 4: Effect of scale of SR on image appearance for a sample in Sigtuple WBC Dataset. Corresponding PSNR, SSIM and PI.

Equivocal diagnosis with use of SR is demonstrated using a commercially available artificial intelligence (AI) digital pathology system222https://sigtuple.com/#s-solutions [13] for inferring with the Sigtuple WBC Dataset with its results presented in Table 2. This justifies role of and in restoring texture of diagnostic importance beyond what can be achieved using simple interpolation on learning a SRNet without rVTT adversarial framework. This proves diagnostic equivocality of SR with rVTT to serve matching purpose as Real HR images while reducing the image acquisition time, speeding up the diagnosis delivery time.

Scale Nearest Bicubic SRNet SRNet-w T1 T1 + T2
4x 97.56 98.15 97.79 97.81 99.36 99.50
9x 85.55 96.11 97.79 97.44 98.29 98.58
16x 81.43 93.01 97.26 96.53 98.15 97.81
Table 2: Overlap of AI based diagnosis [13] (in %) using interpolated and SR images against using Real HR ground truth.

4 Conclusion

Here we have proposed using two rVTTs for enhancing the performance of a SRNet for microscopy with a specific focus on being able to restore the texture within diagnostically relevant nucleus and around cytoplasm in cells. We demonstrate the marked rise in performance of the SRNet with this arrangement in line with philosophy of [5] also proving its equivocal response similar to a Real HR image when used for inferencing with an AI based digital pathology system. The quality of the super-resolved images evaluated using recent advancement in understanding of distortion and perception [6] based measures [10] also advocates in support of our claim to be able to super resolve with pathorealism retained.


  • [1] S.C. Park, M.K. Park, and M.G Kang, “Super-resolution image reconstruction: a technical overview,” IEEE Signal Process. Mag., vol. 20, no. 3, pp. 21–36, 2003.
  • [2] Y. Rivenson, Z. Göröcs, H. Günaydin, Y. Zhang, H. Wang, and A. Ozcan, “Deep learning microscopy,” Optica, vol. 4, no. 11, pp. 1437–1443, 2017.
  • [3] Y. Rivenson, H.C. Koydemir, H. Wang, Z. Wei, Z. Ren, H. Günaydin, Y. Zhang, Z. Gorocs, K. Liang, and D. Tseng, “Deep learning enhanced mobile-phone microscopy,” ACS Photonics, vol. 5, no. 6, pp. 2354–2364, 2018.
  • [4] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A.P. Aitken, A. Tejani, J. Totz, and Z. Wang, “Photo-realistic single image super-resolution using a generative adversarial network.,” in Proc. IEEE/CVF Conf. Comp. Vis., Patt. Recog., 2017, vol. 2, p. 4.
  • [5] Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy, “Esrgan: Enhanced super-resolution generative adversarial networks,” in

    The European Conference on Computer Vision Workshops (ECCVW)

    , September 2018.
  • [6] Y. Blau and T. Michaeli, “The perception-distortion tradeoff,” in Proc. IEEE/CVF Conf. Comp. Vis., Patt. Recog., 2018, pp. 6228–6237.
  • [7] A. Jolicoeur-Martineau, “The relativistic discriminator: a key element missing from standard GAN,” International Conference on Learning Representations, 2019.
  • [8] J.D. Bancroft and M. Gamble, Theory and practice of histological techniques, Elsevier Health Sciences, 2008.
  • [9] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Rep., 2015.
  • [10] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in Proc. Eur. Conf. Comp. Vis., 2016, pp. 694–711.
  • [11] R.D. Labati, V. Piuri, and F. Scotti, “ALL-IDB: The acute lymphoblastic leukemia image database for image processing,” in Proc. Int. Conf. Image Process., 2011, pp. 2045–2048.
  • [12] K. Sirinukunwattana, S.E.A. Raza, Y.W. Tsang, D.R.J. Snead, I.A. Cree, and N.M. Rajpoot, “Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images,” IEEE Trans. Med. Imaging, vol. 35, no. 5, pp. 1196–1206, 2016.
  • [13] D. Mundhra, B. Cheluvaraju, J. Rampure, and T.R. Dastidar, “Analyzing microscopic images of peripheral blood smear using deep learning,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 178–185. 2017.