Inferring a Third Spatial Dimension from 2D Histological Images

01/10/2018 ∙ by Maxime W. Lafarge, et al. ∙ 0

Histological images are obtained by transmitting light through a tissue specimen that has been stained in order to produce contrast. This process results in 2D images of the specimen that has a three-dimensional structure. In this paper, we propose a method to infer how the stains are distributed in the direction perpendicular to the surface of the slide for a given 2D image in order to obtain a 3D representation of the tissue. This inference is achieved by decomposition of the staining concentration maps under constraints that ensure realistic decomposition and reconstruction of the original 2D images. Our study shows that it is possible to generate realistic 3D images making this method a potential tool for data augmentation when training deep learning models.



There are no comments yet.


page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In clinical context, pathological diagnosis and prognosis commonly results from the analysis of bright-field microscopy images of histological slides. These 2D images are obtained by transmitting light through the histological specimens, stained beforehand, in order to attenuate light and produce contrast. To quantify biomarkers of interest in 2D images, pathologists rely on their experience and knowledge of the 3D context of the objects they observe, when 3D microscopy techniques are not considered.

Taking inspiration from the image formation process of bright-field microscopy, we propose a method to infer a realistic decomposition of hematoxylin and eosin (H&E) stained histological slides along the axis of their thickness (-axis), resulting in 3D images. The decomposition of a given histological image is achieved by generating a volume of its underlying stain concentrations, such that new images obtained by simulating transmitted light along other directions are realistic according to a trained discriminative deep learning model.

This study is motivated by the recent developments in deep generative models [1], in particular for generating biological microscopy images [2]. In [3] the authors showed that it is possible to train a generative adversarial network to infer 3D volumes from 2D training images only, without having to rely on 3D training data. Likewise, our method trains a discriminator from 2D training images only, but can generate 3D volumes that correspond to the decomposition of 2D images, and therefore does not require a generator drawing samples from a latent space.

The proposed algorithm can be seen as generating realistic 3D scenarios for the 2D observed scenes. As an example of a possible application, the generated 3D volumes can be used for data augmentation as they allow to create new “views” of the same data. Generalization of deep learning models is a known problem in automated histopathology image analysis, and new augmentation methods can help improving generalization [4]. The 3D information inferred by our method can also be used for analysis by synthesis strategies [5], to improve histopathology image analysis models, as it is a way to include the prior that processed objects have a 3D structure.

2 Method

Histological images can be modeled as a set of stain concentrations at every pixel location [6] as illustrated in Fig. 1

. Thus, our method aims at solving the inverse problem of estimating the volume of stain concentrations that produced the original histological image, for a chosen model of light absorption. We hypothesize that decomposition in depth is possible since the thickness of the histological specimens is of the order of the image resolution. Such a volume is generated under two constraints: (C1) the reconstruction of the original image must be possible from the estimated volume, and (C2) new images produced from the volume must be realistic.

2.1 Model of Stain Concentration Volume

The RGB pixel intensities can be modeled according to the Beer-Lambert law of light absorption [6], such that the image intensity at each pixel location can be decomposed as with the color-channel index, the matrix of absorption coefficients specific to the current image, and the H&E stain concentrations. We used the method of [7] to achieve unsupervised staining unmixing of the images.

Based on the same model, the stain concentrations can be discretized along the -axis in parts, such that .

Figure 1: Decomposition of the estimated stain concentration values of a digital slide along the -axis at .

The constraint (C1) can be enforced by reducing the problem to finding the vectors

, with and describing how the concentrations are distributed along the -axis (the operator is the element-wise multiplication).

2.2 Simulation of Transmitted Light

For a given volume of concentrations, new images can be generated by simulating transmitted light from different directions, using the same model of light absorption. In particular, new projection images and are generated by simulating light transmission along the -axis and -axis through the slices and , as shown in Fig. 2. The pixel intensities of these images are expressed in equation (1) as the sum of stain concentrations in the direction of projection.


is carefully chosen such that the pixel resolution in the -slices and -slices is the same as in the original -plane.

Figure 2: Illustration of an inferred concentration volume block of size pixels. (C1) is respected by enforcing the -projection to reconstruct the original image patch. (C2) requires the /-projections, obtained by simulated transmitted light (red arrows), to be realistic.

2.3 Realism Constraint

A convolutional neural network can be trained to discriminate “fake” generated projection images that result from an underlying unrealistic concentration volume and “real” images that are assumed to be the result of realistic volume of concentration distributions.

For a given image patch, 3D volume inference starts from a 4D tensor

initialized with uniform concentration distributions. Then, the trained discriminative model (discriminator) can be used to update by gradient descent, so that the generated projections of the updated volume appear slightly more realistic. The gradient of the loss of the discriminator with respect of the input is computed via back-propagation.

This update process (Fig. 3

) is iterated until convergence: when the discriminator classifies the generated projections as realistic with small error.

This image generation approach via optimization of the loss function of a neural network is similar to the methods developed in

[8, 9], and plays a role comparable to the generator of standard generative adversarial networks [1] in the way how generated images are used as input to a discriminator.

Figure 3: Iterative process of generating a volume of concentrations constrained by an original image. The stain concentration volume is updated by gradient descent in order to produce projection images that “fool” the fixed trained discriminator.

2.4 Discriminator Training

The discriminator is trained using two sets: a set of random real image patches , and a set of adversarial examples that are generated during training, using the projections computed with (1), from previous states of the trained model.

The training procedure alternates between two steps. First, the current state of the model is used to infer volumes from real images via gradient update using the process presented in Sect. 2.2, and /-axis projections produced from this volume are added to . Secondly, a batch of image patches balanced between samples of and is used to train the discriminator. Images in

are sampled according to their misclassification probability such that the model learns from the “fake” generated images that are the most realistic and that are more challenging to classify.

3 Experiments and Results

3.1 Dataset

We used the high power field images of H&E stained slides of the public AMIDA13 dataset [10] for the experiments. 232 images of size pixels from 8 different breast cancer cases were used for training and the remaining images were used to generate test examples.

3.2 Discriminator Architecture and Training Procedure

We implemented the discriminator that can classify input patches as “fake” or realistic as a 6-layer convolutional neural network. The network takes image patches transformed to H&E concentration maps as input. Kernels of size

, batch normalization, average-pooling and leaky ReLU non-linearities were used throughout. The network was trained by minimizing the cross-entropy loss using the Adam optimizer.

3.3 Generative Process and Extension to Large Images

We set the -axis discretization to pixels as we considered micrometers as the maximum thickness of a tissue slice, in which case the -axis pixel resolution of the inferred volumes can be the same as in the -plane ( micrometers).

The discriminator, as such, can only infer volumes from images of size . To overcome this limitation, volumes of larger images can be inferred by optimizing overlapping sub-volumes in parallel. This solution was used to produce stain concentration volumes from images.

The generated projections presented in Fig. 4 indicate that the optimization process is able to distribute the stain concentrations of unseen images across the -axis, and is able to create new tissue structures that are realistic for the trained discriminator.

Figure 4: Examples of projection images from generated volumes of stain concentration. The first row of each block shows the real image patches the volumes were inferred from. The other rows show the projections obtained by simulating light transmission in different oriented slices as indicated in the left column. The top block shows results on mitotic figures that were annotated by expert pathologists, and the bottom block includes non-mitotic figures only.

4 Discussion and Conclusions

We proposed a method for inferring the 3D structure of 2D histological images. The method showed good qualitative performance when applied to an image dataset of mitoses and non-mitosis objects extracted from breast cancer histology slides. Although the volumes generated by our method cannot be considered as representing the actual tissue structure, the generated projections can still be considered as a likely scenario and thus used as a data augmentation tool.

In addition to being driven by the image formation process of bright-field microscopy, our method has the property that the generated images are directly produced from the available data, the same way transformation-based augmentation methods work. In contrast, generators drawing inputs from a latent space, such as generative adversarial networks, do not have this property.

Directions of future work include, further research to assess the realism quality of the generated images, and application of the generated 3D representation for data augmentation.


  • [1] I Goodfellow, J Pouget-Abadie, M Mirza, B Xu, D Warde-Farley, S Ozair, A Courville, and Y Bengio, “Generative adversarial nets,” in NIPS, 2014, pp. 2672–2680.
  • [2] A Osokin, A Chessel, RE Carazo Salas, and F Vaggi, “GANs for biological image synthesis,” in ICCV, 2017.
  • [3] M Gadelha, S Maji, and R Wang, “3D shape induction from 2D views of multiple objects,” arXiv:1612.05872, 2016.
  • [4] MW Lafarge, JPW Pluim, KAJ Eppenhof, P Moeskops, and M Veta, “Domain-adversarial neural networks to address the appearance variability of histopathology images,” in MICCAI-DLMIA, 2017, pp. 83–91.
  • [5] M Hejrati and D Ramanan, “Analysis by synthesis: 3D object recognition by object reconstruction,” in IEEE CVPR, 2014, pp. 2449–2456.
  • [6] AC Ruifrok, DA Johnston, et al., “Quantification of histochemical staining by color deconvolution,” Anal. Quant. Cytol., vol. 23, no. 4, pp. 291–299, 2001.
  • [7] M Macenko, M Niethammer, JS Marron, D Borland, JT Woosley, X Guan, C Schmitt, and NE Thomas, “A method for normalizing histology slides for quantitative analysis,” in IEEE ISBI, 2009, pp. 1107–1110.
  • [8] L Gatys, AS Ecker, and M Bethge, “Texture synthesis using convolutional neural networks,” in NIPS, 2015, pp. 262–270.
  • [9] L Gatys, AS Ecker, and M Bethge, “Image style transfer using convolutional neural networks,” in IEEE CVPR, 2016, pp. 2414–2423.
  • [10] M Veta, PJ van Diest, SM Willems, H Wang, A Madabhushi, A Cruz-Roa, F Gonzalez, ABL Larsen, JS Vestergaard, AB Dahl, et al., “Assessment of algorithms for mitosis detection in breast cancer histopathology images,” Med. Image Anal., vol. 20, no. 1, pp. 237–248, 2015.