Learning a low dimensional manifold of real cancer tissue with PathologyGAN

04/13/2020 ∙ by Adalberto Claudio Quiros, et al. ∙ University of Glasgow 7

Application of deep learning in digital pathology shows promise on improving disease diagnosis and understanding. We present a deep generative model that learns to simulate high-fidelity cancer tissue images while mapping the real images onto an interpretable low dimensional latent space. The key to the model is an encoder trained by a previously developed generative adversarial network, PathologyGAN. We study the latent space using 249K images from two breast cancer cohorts. We find that the latent space encodes morphological characteristics of tissues (e.g. patterns of cancer, lymphocytes, and stromal cells). In addition, the latent space reveals distinctly enriched clusters of tissue architectures in the high-risk patient group.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 11

page 12

page 13

page 15

page 17

page 19

page 20

Code Repositories

Learning-a-low-dimensional-manifold-of-realcancer-tissue

Corresponding code of 'Quiros A.C., Murray-Smith R., Yuan K. Learning a low dimensional manifold of real cancer tissue with PathologyGAN 2020'.


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Diagnosis and treatment of cancer are commonly based on assessment of histopathological images such as the haematoxylin and eosin (H&E) stained tissue images. The clinical utility of H&E images is due to the rich information about the tumor microenvironment recorded in them, such as the phenotype of cancer cells, immune cells, tissue architecture, and how they interact. Recently, advanced machine learning and deep learning approaches have been developed to improve our understanding of the tumor microenvironment

[6]. A common theme of these approaches is to correlate the quantification of tumor microenvironment to a known clinically significant phenotype [2, 35] and molecular characteristics [14, 9]. The quality of such correlation-based studies largely depends on the heterogeneity within the response and explanatory variables. Large cancer genome sequencing projects have revealed substantial diversities of molecular and clinical characteristics within and between patients [8]. Although it has been studied in breast and ovarian cancers [29, 36], the heterogeneity of tumor microenvironment is largely unknown.

Here, we propose a representation learning and disentanglement framework for unsupervised quantification and clustering of tissue architectures, which relates phenotype and patient survival. We use Generative Adversarial Networks (GANs) as a tool to find useful representation of cancer tissue architectures. We summarize our contributions as:

  1. Based on PathologyGAN model [30], we introduce an encoder that can be trained to act as an inverse function of the generator, taking advantage of the generator’s ability to capture tissue characteristics. This allows us to project real tissue onto the generative model’s latent space.

  2. We demonstrate that the encoder is able to interpret the morphological attributes of the cancer tissue and place the tissue images in distinct regions of the latent space.

  3. We capture the change of cancer tissue morphology densities between patients with survival times greater and lesser than five years, aligning with with previous findings [2].

2 Background

Generative Adversarial Networks [17] are models that are able to learn diverse and faithful data representations from a given distribution. This is done with a generator, , that maps random noise, , to samples that resemble the target data, , and a discriminator, , whose goal is to distinguish between real and generated samples. The goal of a GAN is find the equilibrium in the min-max problem:

(1)

GANs have since improved in image resolution, quality, and diversity with models as SNGAN [28], BigGAN [3], ProGAN [21], RealsnessGAN [33], and StyleGAN [22, 23]. There has been an increased focus on improving GANs for disentanglement and representation learning, such as InfoGAN [7], BiGAN [11], StyleGAN [22, 23], and BiBigGAN [12]. These models allow certain control in image generation with specific feature properties. Simultaneously, projecting real images onto a GAN’s latent space has also gained interest in the literature. Some works have used pre-trained generators and to find the real image projections through an iterative process [1, 25, 22, 23], yet these methods are usually costly and image-by-image based. Alternatively, other models include an encoder with different optimization goals like gradual latent space [31], representation learning [11, 12, 26], or disentanglement [26]. Given the computational capacity and current state of generative models for representation learning, GANs or VAEs can make an impact on real-task applications such as histopathology.

Machine learning and especially deep learning approaches have shown early success in digital pathology, not only on achieving high accuracy classification [13, 32, 18], but also in assisting in the decision process with computer-human interaction [5]. These methods are usually either supervised or weakly supervised, which require previous knowledge to train the models. On the other hand, unsupervised models only require the data samples (cancer tissue images) to find common attributes or properties that can explain the data. Unsupervised models are gaining interest in histopathology and they have been applied to tasks including tissue or nuclei segmentation [34, 19, 10, 15, 16], classification [4], high resolution image generation [24, 30], and representation learning [19, 30]). Our work focuses on building a GAN with an encoder that has disentanglement and representation learning properties, and that way providing a framework for unsupervised quantification and clustering of tissue architectures.

Figure 1: High level architecture of our GAN model.

3 PathologyGAN Encoder

We build upon PathologyGAN [30], which used techniques from BigGAN [3] and StyleGAN [22] to successfully reproduce cancer tissue images while having an interpretable latent space. In our model, the encoder learns to interpret tissue morphology through generated images, effectively acting as the inverse of the generator . In PathologyGAN, the generator has disentanglement and representation learning properties. We take advantage of this fact by forcing the encoder to learn to place generated images back to the latent space. This process trains an encoder that is able to map tissue with different properties (e.g. cancer cell, lymphocyte, and stromal density) to distinct regions of the latent space. Figure 1 captures the high level network architecture of our model. After training, the encoder can be used independently to map real images to their representations in the latent space.

We define the loss functions for the discriminator as

and the generator as , which remain the same as in the GAN model (Equations 2 and 3):

(2)
(3)

The encoder loss function,

, is defined to minimize the mean square error of the latent vectors

and its reconstruction through generated images (Equation 4):

(4)

Although the encoder is simultaneously trained with the GAN model, we can separate the model training into two parts: the mapping network , generator , and discriminator that are trained as a GAN with Relativistic Average Discriminator [20], and the encoder , which is trained to project back the generated cancer tissue images onto the latent space. In practice, the encoder learns simultaneously with the Generator .

We trained our encoder based on the assumption that the generator is successful in reproducing real cancer tissue. Therefore the encoder will learn to project real tissue images if it is able to do so with generated ones. Based on this logic, we use only generated images to train the encoder.

The encoder is only updated when the generator is not trained with style mixing regularization [22]. Style mixing regularization uses two latent vectors and to force disentanglement in the generator. It becomes impractical to train the encoder in these steps because these images have no clear assignation in the latent space. The style mixing regularization is only preformed % of times in the generator training, so our encoder is updated every two steps per the generator.

To train our model we used two haematoxylin and eosin (H&E) breast cancer databases from the Netherlands Cancer Institute (NKI) cohort and the Vancouver General Hospital (VGH) cohort with 248 and 328 patients, respectively [2]. In total, this corresponded to a training set of images of pixels. We used an NVIDIA Titan RTX 24 GB to train the model for approximately 80 hours.

Figure 2: Real tissue images and its reconstructions. We take real tissue images and map them to the latent space with our encoder, then we use the generator with the latent vector representations to generate the image reconstructions. (a) correspond to the real tissue images and (b) to the reconstructions, the images are paired in columns. We show different examples of stromal, lymphocytes, cancer cells, and combinations of these, the reconstructions follow the real image attributes.
Figure 3:

Uniform Manifold Approximation and Projection (UMAP) representation of real tissue samples in the latent space using samples from Netherlands Cancer Institute (NKI) and Vancouver General Hospital (VGH) patient cohorts. In this Figure, we fitted a Gaussian mixture model over the complete dataset and used 100 components to cluster the latent representations. We show different tissue images belonging to various unique clusters, demonstrating how tissues with similar features get assigned to common regions in the latent space.

4 Results and discussion

Our results focus on analyzing our model’s comprehension of tissue characteristics, such as colour, texture, spatial features of cancer, lymphocytes, and stromal cells. For these results we used only real H&E breast tissue samples of VGH and NKI cohorts.

4.1 Tissue image Reconstruction

We start by analyzing how much information about the tissue the is model capturing. The assumption is that if the encoder truly finds meaningful representations of tissue morphology, the generator will reconstruct the held attributes in the latent vectors.

We use the encoder to find the latent vector of real tissue images, and then use the generator on those same vector representations to obtain the tissue image reconstruction. Figure 2 shows these reconstruction results. Although the reconstruction does not have a one-to-one match at the pixel level, we are judging our model by how it finds high level features and assigns representations based on them. The reconstructions keep the same tissue attributes whether if we analyze stromal, lymphocytes, or cancer cells.

Figure 4:

Four different linear interpolations between clusters in extreme positions of the latent space at ten equally distributed points each. In contrast to Figure

3, this figure shows the global structure of the latent space where consecutive image points have gradual morphological changes in the tissue.

4.2 Analysis of real tissue representations

We also study the latent space representations of all available real tissue images of VGH and NKI cohorts. We perform a Uniform Manifold Approximation and Projection (UMAP) [27] reduction on the latent vectors from to dimensions, and then fit a Gaussian mixture model of components to cluster the tissue points. We reason that a good representation should have the following properties: 1) points in close proximity should encode similar tissue architectures; 2) far apart points should encode drastically different tissue architectures; 3) changes in tissue architectures should correspond to smooth manifolds in the representation space.

Figure 3 shows the UMAP plot, which demonstrates that tissue points belonging to the same cluster have common characteristics, not only color and texture, but also cell types presented in the tissue. Figure 4 captures the global structure of the latent space, by displaying four different linear interpolations between clusters across extreme positions of the latent space. Each interpolation is made of ten equally distributed points. We can see that transitions between consecutive points show morphological similarities without abrupt changes. Between Figures 3 and 4, we conclude that our model learns to capture and place gradual changes in cancer tissue, where images with similar morphology can be clustered together.

4.3 Analysis of survival data using latent representations

Figure 5: Densities of tissue architectures in patients with greater (a) and lesser (b) than 5 year survival of VGH cohort. We highlight six Gaussian mixture components of tissue architectures that are enriched in high-risk (less than 5 year survival) patients. (#-A) Tissue images of in the cluster, (#-B) Percentage of patients with the tissue pattern in the survival group.

The NKI cohort consists of patients with survival times greater than five years, and patients with survival times lesser or equal to five years. In the case of VGH, , with survival times greater, and lesser or equal to 5 years respectively. We use the clustering properties over tissue morphology of our model, to show the differences in tissue density between patients with greater and lesser than 5 years of survival time. Previous literature found different tissue architectures can dictate improved or worse prognosis [2].

Figure 5 highlights tissue architectures that are enriched in patients with lesser than 5 year survival, but less frequent in cases with greater than 5 year survival. Across both VGH and NKI cohorts, we find a wide spread distinctly enriched clusters of tissue architectures. These results are included in the Appendix.

5 Conclusion and future work

We presented an improvement to PathologyGAN that includes an encoder and can integrate real tissue image data. We showed that this model distinguishes features of real tissue, such as color, texture, cancer, lymphocyte, and stromal cells; the model assigns low dimensional representations that maintain meaning associated with its morphological characteristics. Furthermore, it revealed distinctly enriched clusters of tissue architectures in the high-risk patient groups.

This model opens the door for identifying common/distinct patterns of tissue architecture. This could greatly improve our understanding of tumor mircoenvironment and its relation to patient outcome and underlying molecular characteristics. We are working towards generalizing our findings across large patient cohorts such as The Cancer Genome Atlas.

References

  • [1] R. Abdal, Y. Qin, and P. Wonka (2019-10) Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?.

    2019 IEEE/CVF International Conference on Computer Vision (ICCV)

    .
    Cited by: §2.
  • [2] A.H. Beck, A.R. Sangoi, and S. Leung (2011-01) Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Science translational medicine 3, pp. . Cited by: item 3, §1, §3, §4.3.
  • [3] A. Brock, J. Donahue, and K. Simonyan (2018) Large scale GAN training for high fidelity natural image synthesis. CoRR. Cited by: §2, §3.
  • [4] W. Bulten and G. Litjens (2018)

    Unsupervised prostate cancer detection on H&E using Convolutional Adversarial Autoencoders

    .
    External Links: 1804.07098 Cited by: §2.
  • [5] C. J. Cai, E. Reif, N. Hegde, J. D. Hipp, B. Kim, D. Smilkov, M. Wattenberg, F. B. Viégas, G. S. Corrado, M. C. Stumpe, and M. Terry (2019) Human-centered tools for coping with imperfect algorithms during medical decision-making. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI 2019, pp. 4. Cited by: §2.
  • [6] G. Campanella, M. G. Hanna, L. Geneslaw, A. Miraflor, V. Werneck Krauss Silva, K. J. Busam, E. Brogi, V. E. Reuter, D. S. Klimstra, and T. J. Fuchs (2019-08) Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature Medicine 25 (8), pp. 1301–1309. Cited by: §1.
  • [7] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel (2016) InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. External Links: 1606.03657 Cited by: §2.
  • [8] G. Ciriello, M. L. Miller, B. A. Aksoy, Y. Senbabaoglu, N. Schultz, and C. Sander (2013-10) Emerging landscape of oncogenic signatures across human cancers. Nature Genetics 45 (10), pp. 1127–1133. Cited by: §1.
  • [9] N. Coudray, P. S. Ocampo, T. Sakellaropoulos, N. Narula, M. Snuderl, D. Fenyö, A. L. Moreira, N. Razavian, and A. Tsirigos (2018-10) Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nature Medicine 24 (10), pp. 1559–1567. Cited by: §1.
  • [10] T. de Bel, M. Hermsen, B. Smeets, L. Hilbrands, J. van der Laak, and G. Litjens (2018) Automatic segmentation of histopathological slides of renal tissue using deep learning. In Medical Imaging 2018: Digital Pathology, J. E. Tomaszewski and M. N. Gurcan (Eds.), Vol. 10581, pp. 285 – 290. Cited by: §2.
  • [11] J. Donahue, P. Krähenbühl, and T. Darrell (2016) Adversarial feature learning. External Links: 1605.09782 Cited by: §2.
  • [12] J. Donahue and K. Simonyan (2019) Large scale adversarial representation learning. External Links: 1907.02544 Cited by: §2.
  • [13] A. Esteva, B. Kuprel, R. Novoa, J. Ko, S. M Swetter, H. M Blau, and S. Thrun (2017-01)

    Dermatologist-level classification of skin cancer with deep neural networks

    .
    Nature 542, pp. . Cited by: §2.
  • [14] Y. Fu, A. W. Jung, R. V. Torne, S. Gonzalez, H. Vohringer, M. Jimenez-Linan, L. Moore, and M. Gerstung (2019-10) Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. bioRxiv (0), pp. 813543. Cited by: §1.
  • [15] M. Gadermayr, L. Gupta, V. Appel, P. Boor, B. M. Klinkhammer, and D. Merhof (2019) Generative adversarial networks for facilitating stain-independent supervised and unsupervised segmentation: a study on kidney histology. IEEE Transactions on Medical Imaging 38 (10), pp. 2293–2302. Cited by: §2.
  • [16] M. Gadermayr, L. Gupta, B. M. Klinkhammer, P. Boor, and D. Merhof (2019) Unsupervisedly Training GANs for Segmenting Digital Pathology with Automatically Generated Annotations. In Proceedings of The 2nd ference on Medical Imaging with Deep Learning, M. J. Cardoso, A. Feragen, B. Glocker, E. Konukoglu, I. Oguz, G. Unal, and T. Vercauteren (Eds.), Proceedings of Machine Learning Research, Vol. 102, pp. 175–184. Cited by: §2.
  • [17] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio (2014) Generative adversarial networks. CoRR. Cited by: §2.
  • [18] Z. Han, B. Wei, Y. Zheng, Y. Yin, K. Li, and S. Li (2017-06) Breast cancer multi-classification from histopathological images with structured deep learning model. Scientific Reports 7, pp. . Cited by: §2.
  • [19] L. Hou, V. Nguyen, A. B. Kanevsky, D. Samaras, T. M. Kurc, T. Zhao, R. R. Gupta, Y. Gao, W. Chen, D. Foran, and et al. (2019-02) Sparse autoencoder for unsupervised nucleus detection and representation in histopathology images. Pattern Recognition 86, pp. 188–200. Cited by: §2.
  • [20] A. Jolicoeur-Martineau (2018) The relativistic discriminator: a key element missing from standard GAN. CoRR. Cited by: §3.
  • [21] T. Karras, T. Aila, S. Laine, and J. Lehtinen (2018) Progressive growing of GANs for improved quality, stability, and variation. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, Cited by: §2.
  • [22] T. Karras, S. Laine, and T. Aila (2018) A style-based generator architecture for generative adversarial networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4396–4405. Cited by: §2, §3, §3.
  • [23] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila (2019) Analyzing and Improving the Image Quality of StyleGAN. External Links: 1912.04958 Cited by: §2.
  • [24] A. B. Levine, J. Peng, D. Farnell, M. Nursey, Y. Wang, J. R. Naso, H. Ren, H. Farahani, C. Chen, D. Chiu, A. Talhouk, B. Sheffield, M. Riazy, P. P. Ip, C. Parra-Herran, A. Mills, N. Singh, B. Tessier-Cloutier, T. Salisbury, J. Lee, T. Salcudean, S. J.M. Jones, D. G. Huntsman, C. B. Gilks, S. Yip, and A. Bashashati (2020) Synthesis of diagnostic quality cancer pathology images. Cited by: §2.
  • [25] Z. C. Lipton and S. Tripathi (2017) Precise recovery of latent vectors from generative adversarial networks. External Links: 1702.04782 Cited by: §2.
  • [26] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey (2015) Adversarial autoencoders. External Links: 1511.05644 Cited by: §2.
  • [27] L. McInnes, J. Healy, and J. Melville (2018) UMAP: uniform manifold approximation and projection for dimension reduction. External Links: 1802.03426 Cited by: §4.2.
  • [28] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida (2018) Spectral normalization for generative adversarial networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, Cited by: §2.
  • [29] R. Natrajan, H. Sailem, F. K. Mardakheh, M. Arias Garcia, C. J. Tape, M. Dowsett, C. Bakal, and Y. Yuan (2016-02) Microenvironmental Heterogeneity Parallels Breast Cancer Progression: A Histology–Genomic Integration Analysis. PLoS Medicine 13 (2), pp. e1001961. Cited by: §1.
  • [30] A. C. Quiros, R. Murray-Smith, and K. Yuan (2019) Pathology gan: learning deep representations of cancer tissue. External Links: 1907.02644 Cited by: item 1, §2, §3.
  • [31] T. Sainburg, M. Thielk, B. Theilman, B. Migliori, and T. Gentner (2018) Generative adversarial interpolative autoencoding: adversarial training on latent space interpolations encourage convex latent distributions. External Links: 1807.06650 Cited by: §2.
  • [32] J. W. Wei, L. J. Tafe, Y. A. Linnik, L. J. Vaickus, N. Tomita, and S. Hassanpour Pathologist-level classification of histologic patterns on resected lung adenocarcinoma slides with deep neural networks. Cited by: §2.
  • [33] Y. Xiangli*, Y. Deng*, B. Dai*, C. C. Loy, and D. Lin (2020) Real or not real, that is the question. In International Conference on Learning Representations, Cited by: §2.
  • [34] J. Xu, L. Xiang, Q. Liu, H. Gilmore, J. Wu, J. Tang, and A. Madabhushi (2016) Stacked sparse autoencoder (ssae) for nuclei detection on breast cancer histopathology images. IEEE Transactions on Medical Imaging 35 (1), pp. 119–130. Cited by: §2.
  • [35] Y. Yuan, H. Failmezger, O. M Rueda, H. Ali, S. Gräf, S. Chin, R. F Schwarz, C. Curtis, M. Dunning, H. Bardwell, N. Johnson, S. Doyle, G. Turashvili, E. Provenzano, S. Aparicio, C. Caldas, and F. Markowetz (2012-10) Quantitative image analysis of cellular heterogeneity in breast tumors complements genomic profiling. Science translational medicine 4, pp. 157ra143. Cited by: §1.
  • [36] A. W. Zhang, A. McPherson, K. Milne, D. R. Kroeger, P. T. Hamilton, A. Miranda, T. Funnell, N. Little, C. P.E. de Souza, S. Laan, S. LeDoux, D. R. Cochrane, J. L.P. Lim, W. Yang, A. Roth, M. A. Smith, J. Ho, K. Tse, T. Zeng, I. Shlafman, M. R. Mayo, R. Moore, H. Failmezger, A. Heindl, Y. K. Wang, A. Bashashati, D. S. Grewal, S. D. Brown, D. Lai, A. N.C. Wan, C. B. Nielsen, C. Huebner, B. Tessier-Cloutier, M. S. Anglesio, A. Bouchard-Côté, Y. Yuan, W. W. Wasserman, C. B. Gilks, A. N. Karnezis, S. Aparicio, J. N. McAlpine, D. G. Huntsman, R. A. Holt, B. H. Nelson, and S. P. Shah (2018-06) Interfaces of Malignant and Immunologic Clonal Dynamics in Ovarian Cancer. Cell 173 (7), pp. 1755–1769.e22. Cited by: §1.

Appendix 0.A Code

We provide the code at this location.

Appendix 0.B Tissue image Reconstruction

Real tissue images and its reconstructions. We take real tissue images and map them to the latent space with our encoder, then we use the generator with the latent vector representations to generate the image reconstructions.

In all these samples we use the following labeling: (a) correspond to the real tissue images and (b) to the reconstructions, the images are paired in columns. We show different examples of stromal, lymphocytes, cancer cells, and combinations of these, the reconstructions follow the real image attributes.

Figure 6: Real tissue images and its reconstructions. (a) corresponds to the real tissue images and (b) to the reconstructions, the images are paired in columns. We show different examples of stromal, lymphocytes, cancer cells, and combinations of these, the reconstructions follow the real image attributes.
Figure 7: Real tissue images and its reconstructions. (a) corresponds to the real tissue images and (b) to the reconstructions, the images are paired in columns. We show different examples of stromal, lymphocytes, cancer cells, and combinations of these, the reconstructions follow the real image attributes.

Appendix 0.C Analysis of real tissue representations

Uniform Manifold Approximation and Projection (UMAP) representation of real tissue samples in our model’s latent space using samples from Netherlands Cancer Institute (NKI) and Vancouver General Hospital (VGH) patient cohorts. In this Figure, we placed a Gaussian mixture model over the complete dataset and used 100 components to cluster the latent representations.

We show two different type of figures for the combined datasets VGH and NKi, and for NKI and VGH independently:

  1. Clustering of tissue architectures into common regions of the latent space, Figures 8,10,12.

  2. Global structure of the latent space, Figures 9,11,13.

Figure 8: Uniform Manifold Approximation and Projection (UMAP) representation of real tissue samples in the latent space using samples from Netherlands Cancer Institute (NKI) and Vancouver General Hospital (VGH) patient cohorts. In this Figure, we fitted a Gaussian mixture model over the complete dataset and used 100 components to cluster the latent representations. We show different tissue images belonging to various unique clusters, demonstrating how tissues with similar features get assigned to common regions in the latent space.
Figure 9: Four different linear interpolations between clusters in extreme positions of the latent space at ten equally distributed points each. In contrast to Figure 8, this figure shows the global structure of the latent space where consecutive image points have gradual morphological changes in the tissue.
Figure 10: Uniform Manifold Approximation and Projection (UMAP) representation of real tissue samples in the latent space using samples from Netherlands Cancer Institute (NKI) patient cohorts. In this Figure, we fitted a Gaussian mixture model over the complete dataset and used 100 components to cluster the latent representations. We show different tissue images belonging to various unique clusters, demonstrating how tissues with similar features get assigned to common regions in the latent space.
Figure 11: Four different linear interpolations between clusters in extreme positions of the latent space at ten equally distributed points each. In contrast to Figure 10, this figure shows the global structure of the latent space where consecutive image points have gradual morphological changes in the tissue in the NKI patient cohort.
Figure 12: Uniform Manifold Approximation and Projection (UMAP) representation of real tissue samples in the latent space using samples from Vancouver General Hospital (VGH) patient cohorts. In this Figure, we fitted a Gaussian mixture model over the complete dataset and used 100 components to cluster the latent representations. We show different tissue images belonging to various unique clusters, demonstrating how tissues with similar features get assigned to common regions in the latent space.
Figure 13: Four different linear interpolations between clusters in extreme positions of the latent space at ten equally distributed points each. In contrast to Figure 12, this figure shows the global structure of the latent space where consecutive image points have gradual morphological changes in the tissue in the VGH patient cohort.

Appendix 0.D Analysis of survival data using latent representations

In this appendix we provide the collection of figures for NKI and VGH patient cohorts. We show the density difference in tissue architectures between high-risk (less than 5 year survival (a)) and low-risk patients (greater than 5 year survival (b)).

Figures 14 and 16 present tissue architectures predominant on high-risk patients, and Figures 15 and 17 on low-risk patients.

(#-A) Tissue images belonging to the cluster, (#-B) Percentage of patients with the tissue pattern in the survival group.

Figure 14: VGH Cohort, tissue architecture more predominant on high-risk patients, survival times lesser than 5 years.
Figure 15: VGH Cohort, tissue architecture more predominant on low-risk patients, survival times greater than 5 years.
Figure 16: NKI Cohort, tissue architecture more predominant on high-risk patients, survival times lesser than 5 years.
Figure 17: NKI Cohort, tissue architecture more predominant on low-risk patients, survival times greater than 5 years.

Appendix 0.E Model Architecture

Mapping Network

ResNet Dense Layer and ReLU,

ResNet Dense Layer and ReLU,
ResNet Dense Layer and ReLU,
ResNet Dense Layer and ReLU,
Dense Layer,
Table 1: Mapping Network Architecture details of Pathology GAN model.
Generator Network
Dense Layer, adaptive instance normalization (AdaIN), and leakyReLU
Dense Layer, AdaIN, and leakyReLU
Reshape

ResNet Conv2D Layer, 3x3, stride 1, pad same, AdaIN, and leakyReLU

ConvTranspose2D Layer, 2x2, stride 2, pad upscale, AdaIN, and leakyReLU
ResNet Conv2D Layer, 3x3, stride 1, pad same, AdaIN, and leakyReLU
ConvTranspose2D Layer, 2x2, stride 2, pad upscale, AdaIN, and leakyReLU
ResNet Conv2D Layer, 3x3, stride 1, pad same, AdaIN, and leakyReLU
Attention Layer at
ConvTranspose2D Layer, 2x2, stride 2, pad upscale, AdaIN, and leakyReLU
ResNet Conv2D Layer, 3x3, stride 1, pad same, AdaIN, and leakyReLU
ConvTranspose2D Layer, 2x2, stride 2, pad upscale, AdaIN, and leakyReLU
ResNet Conv2D Layer, 3x3, stride 1, pad same, AdaIN, and leakyReLU
ConvTranspose2D Layer, 2x2, stride 2, pad upscale, AdaIN, and leakyReLU
Conv2D Layer, 3x3, stride 1, pad same,
Sigmoid
Table 2: Generator Network Architecture details of Pathology GAN model.
Discriminator Network
ResNet Conv2D Layer, 3x3, stride 1, pad same, and leakyReLU
Conv2D Layer, 2x2, stride 2, pad downscale, and leakyReLU
ResNet Conv2D Layer, 3x3, stride 1, pad same, and leakyReLU
Conv2D Layer, 2x2, stride 2, pad downscale, and leakyReLU
ResNet Conv2D Layer, 3x3, stride 1, pad same, and leakyReLU
Conv2D Layer, 2x2, stride 2, pad downscale, and leakyReLU
ResNet Conv2D Layer, 3x3, stride 1, pad same, and leakyReLU
Attention Layer at
Conv2D Layer, 2x2, stride 2, pad downscale, and leakyReLU
ResNet Conv2D Layer, 3x3, stride 1, pad same, and leakyReLU
Conv2D Layer, 2x2, stride 2, pad downscale, and leakyReLU
Flatten
Dense Layer and leakyReLU,
Dense Layer and leakyReLU,
Table 3: Discriminator Network Architecture details of Pathology GAN model.
Encoder Network
Conv2D Layer, 2x2, stride 2, pad downscale, and leakyReLU
ResNet Conv2D Layer, 3x3, stride 1, pad same, and leakyReLU
Conv2D Layer, 2x2, stride 2, pad downscale, and leakyReLU
ResNet Conv2D Layer, 3x3, stride 1, pad same, and leakyReLU
Conv2D Layer, 2x2, stride 2, pad downscale, and leakyReLU
ResNet Conv2D Layer, 3x3, stride 1, pad same, and leakyReLU
Conv2D Layer, 2x2, stride 2, pad downscale, and leakyReLU
ResNet Conv2D Layer, 3x3, stride 1, pad same, and leakyReLU
Attention Layer at
Conv2D Layer, 2x2, stride 2, pad downscale, and leakyReLU
ResNet Conv2D Layer, 3x3, stride 1, pad same, and leakyReLU
Conv2D Layer, 2x2, stride 2, pad downscale, and leakyReLU
Flatten
Dense Layer and leakyReLU,
Dense Layer and leakyReLU,
Table 4: Encoder Network Architecture details of Pathology GAN model.