Learning stochastic object models from medical imaging measurements using Progressively-Growing AmbientGANs

05/29/2020 ∙ by Weimin Zhou, et al. ∙ 0

It has been advocated that medical imaging systems and reconstruction algorithms should be assessed and optimized by use of objective measures of image quality that quantify the performance of an observer at specific diagnostic tasks. One important source of variability that can significantly limit observer performance is variation in the objects to-be-imaged. This source of variability can be described by stochastic object models (SOMs). A SOM is a generative model that can be employed to establish an ensemble of to-be-imaged objects with prescribed statistical properties. In order to accurately model variations in anatomical structures and object textures, it is desirable to establish SOMs from experimental imaging measurements acquired by use of a well-characterized imaging system. Deep generative neural networks, such as generative adversarial networks (GANs) hold great potential for this task. However, conventional GANs are typically trained by use of reconstructed images that are influenced by the effects of measurement noise and the reconstruction process. To circumvent this, an AmbientGAN has been proposed that augments a GAN with a measurement operator. However, the original AmbientGAN could not immediately benefit from modern training procedures, such as progressive growing, which limited its ability to be applied to realistically sized medical image data. To circumvent this, in this work, a new Progressive Growing AmbientGAN (ProAmGAN) strategy is developed for establishing SOMs from medical imaging measurements. Stylized numerical studies corresponding to common medical imaging modalities are conducted to demonstrate and validate the proposed method for establishing SOMs.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 6

page 7

page 8

page 9

page 12

page 13

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Computer-simulation remains an important approach for the design and optimization of imaging systems. Such approaches can permit the exploration, refinement, and assessment of a variety of system designs that would be infeasible through experimental studies alone. In the field of medical imaging, it has been advocated that imaging systems and reconstruction algorithms should be assessed and optimized by use of objective measures of image quality (IQ) that quantify the performance of an observer at specific diagnostic tasks [42, 52, 8, 7, 3]. To accomplish this, all sources of variability in the measured data should be accounted for. One important source of variability that can significantly limit observer performance is variation in the objects to-be-imaged [44]. This source of variability can be described by stochastic object models (SOMs) [38]. A SOM is a generative model that can be employed to produce an ensemble of to-be-imaged objects that possess prescribed statistical properties.

Available SOMs include texture models of mammographic images with clustered lumpy backgrounds [10], simple lumpy background models [44], and more realistic anatomical phantoms that can be randomly perturbed [47]. A variety of other computational phantoms [46, 55, 58, 17, 14, 67, 47, 39], either voxelized or mathematical, have been proposed for medical imaging simulation, aiming to provide a practical solution to characterize object variability. However, the majority of these were established by use of image data corresponding to a few subjects. Therefore, they may not accurately describe the statistical properties of the ensemble of objects that is relevant to an imaging system optimization task. A variety of anatomical shape models have also been proposed to describe both the common geometric features and the geometric variability among instances of the population for shape analysis applications [19, 18, 28, 23, 48, 26, 51, 2]. To date, these have not been systematically explored for the purpose of constructing SOMs that capture realistic anatomical variations for use in imaging system optimization.

In order to establish SOMs that capture realistic textures and anatomical variations, it is desirable to utilize experimental imaging data. By definition, however, SOMs should be independent of the imaging system, measurement noise and any reconstruction method employed. In other words, they should provide an in silico

representation of the ensemble of objects to-be-imaged and not estimates of them that would be indirectly measured or computed by imaging systems. To address this need, Kupinski

et al. [38] proposed an explicit generative model for describing object statistics that was trained by use of noisy imaging measurements and a computational model of a well-characterized imaging system [38]

. However, applications of this method have been limited to situations where the characteristic function of the corresponding imaging measurements can be analytically determined, such as with lumpy and clustered lumpy object models

[36, 10]. As such, there remains a need to generalize the method so that anatomically realistic and more complicated SOMs can be established from experimental imaging measurements.

Deep generative neural networks, such as generative adversarial networks (GANs) [25], hold great potential for establishing SOMs that describe discretized objects. However, conventional GANs are typically trained by use of reconstructed images that are influenced by the effects of measurement noise and the reconstruction process. To circumvent this, an AmbientGAN has been proposed [12] that augments a GAN with a measurement operator. This permits a generative model that describes object randomness to be learned from indirect and noisy measurements of the objects themselves. In a preliminary study, the AmbientGAN was explored for the establishing SOMs from imaging measurements for use in optimizing imaging systems [64]. However, similar to conventional GANs, the process of training AmbientGANs is inherently unstable. Moreover, the original AmbientGAN cannot immediately benefit from robust GAN training procedures, such as progressive growing[32], which limits its ability to synthesize high-dimensional images that depict objects of interest in medical imaging studies.

In this work, a new AmbientGAN approach is proposed that permits the utilization of the progressive growing strategy for training. In this way, SOMs can be established from noisy imaging measurements that can yield high-dimensional images that depict objects. The new approach, referred to as a Progressive Growing AmbientGAN (ProAmGAN), can utilize the progressive growing training strategy due to augmentation of the conventional AmbientGAN architecture with an image reconstruction operator. Stylized numerical studies corresponding to X-ray computed tomography (CT) and magnetic resonance imaging (MRI) are conducted to investigate the proposed ProAmGAN for establishing SOMs. Preliminary validation studies are presented that utilize standard quantitative measures for evaluating GANs and also objective measures based on signal detection performance.

The remainder of this paper is organized as follows. In Sec. II, previous works on learning SOMs by employing characteristic functions and AmbientGANs are summarized. The progressive growing training strategy for GANs is also reviewed. The proposed ProAmGAN for learning SOMs from noisy imaging measurements is described in Sec. III. Sections IV and V describe the numerical studies and results that demonstrate the ability of the ProAmGAN to learn SOMs from stylized X-ray CT and MRI measurements. Finally, a discussion and summary of the work is presented in Sec. VI.

Ii Background

Consider a discrete-to-discrete (D-D) description of a linear imaging system given by [7]:

(1)

where

is a vector that describes the measured image data,

denotes the finite-dimensional representation of the object being imaged, denotes a D-D imaging operator that maps an object in the Hilbert space to the measured discrete data in the Hilbert space , and the random vector denotes the measurement noise. Below, the imaging process described in Eq. (1) is denoted as: . It is assumed that the D-D imaging model is a sufficiently accurate representation of the true continuous-to-discrete (C-D) imaging model that describes a digital imaging system and the impact of model error will be neglected. When optimizing imaging system performance by use of objective measures of IQ, all sources of randomness in should be considered. In diagnostic imaging applications, object variability is an important factor that limits observer performance. In such applications, the object

should be described as a random vector that is characterized by a multivariate probability density function (PDF)

that specifies the statistical properties of the ensemble of objects to-be-imaged.

Direct estimation of is rarely tractable in medical imaging applications due to the high dimensionality of . To circumvent this difficulty, a parameterized generative model, referred to throughout this work as a SOM, can be introduced and established by use of an ensemble of experimental measurements. The generative model can be explicit or implicit. Explicit generative models seek to approximate , or equivalently, its characteristic function, from which samples can subsequently be drawn. On the other hand, implicit generative models do not seek to estimate directly, but rather define a stochastic process that seeks to draw samples from

without having to explicitly specify it. Variational autoencoders and GANs are examples of explicit and implicit generative models, respectively, that have been actively explored

[24]. Two previous works that sought to learn SOMs from noisy and indirect imaging measurements by use of explicit and implicit generative models are presented below.

Ii-a Establishing SOMs by use of explicit generative modeling: Propagation of characteristic functionals

The first method to learn SOMs from imaging measurements was introduced by Kupinski et al. [38]. In that work, a C-D imaging model was considered in which a function that describes the object is mapped to a finite-dimensional image vector . For C-D operators, it has been demonstrated that the characteristic functional (CFl) describing the object can be readily related to the characteristic function (CF) of the measured data vector [16]. This provides a relationship between the PDFs of the object and measured image data. In their method, an object that was parameterized by the vector was considered and analytic expressions for the CFl were utilized. Subsequently, by use of the known imaging operator and noise model, the corresponding CF was computed. The vector was estimated by minimizing the discrepancy between this model-based CF and an empirical estimate of the CF computed from an ensemble of noisy imaging measurements. From the estimated CFl, an ensemble of objects could be generated. This method was applied to establish SOMs where the CFl of the object can be analytically determined. Such cases include the lumpy object model[36] and clustered lumpy object model [10]. The applicability of the method to more complicated object models remains unexplored.

Ii-B Establishing SOMs by use of implicit generative modeling: GANs and AmbientGANs

Generative adversarial networks (GANs) [25, 4, 6, 20, 43, 45, 40, 49, 5, 27, 13]

are implicit generative models that have been actively explored to learn the statistical properties of ensembles of images and generate new images that are consistent with them. A traditional GAN consists of two deep neural networks - a generator and a discriminator. The generator is jointly trained with the discriminator through an adversarial process. During its training process, the generator is trained to map random low-dimensional latent vectors to higher dimensional images that represent samples from the distribution of training images. The discriminator is trained to distinguish the generated, or synthesized, images from the actual training images. These are often referred to as the “fake” and “real” images in the GAN literature. Subsequent to training, the discriminator is discarded and the generator and associated latent vector probability distribution form as an implicit generative model that can sample from the data distribution to produce new images. However, images produced by imaging systems are contaminated by measurement noise and potentially an image reconstruction process. Therefore, GANs trained directly on images do not generally represent SOMs because they do not characterize object variability alone.

An augmented GAN architecture named AmbientGAN has been proposed [12] that enables learning an SOM from noisy indirect measurements of an object. As shown in Fig. 1, the AmbientGAN architecture includes the measurement operator , defined in Eqn. (1), into the traditional GAN framework. During the AmbientGAN training process, the generator is trained to map a random vector described by a latent probability distribution to a generated object , where represents the generator network that is parameterized by a vector of trainable parameters . Subsequently, the corresponding simulated imaging measurements are computed as . The discriminator neural network , which is parameterized by the vector , is trained to distinguish the real and simulated imaging measurements by mapping them to real-valued scalar . The adversarial training process can be represented by the following two-player minimax game[25]:

(2)

where

represents a loss function. When the distribution of objects

uniquely induces the distribution of imaging measurements , i.e., when the imaging operator is injective, and the minimax game achieves the global optimum, the trained generator can be employed to produce object samples drawn from  [25, 12].

Fig. 1: An illustration of the AmbientGAN architecture. The generator is trained to generate objects, which are subsequently employed to simulate measurement data. The discriminator is trained to distinguish “real” measurement data to the “fake” measurement data that are simulated by use of the generated objects.

Zhou et al. have demonstrated the ability of the AmbientGAN to learn a simple SOM corresponding to a lumpy object model that could be employed to produce small () object samples [64]. However, adversarial training is known to be unstable and the use of AmbientGANs to establish realistic and large-scale SOMs has, to-date, been limited.

Ii-C Progressively-Growing GAN Training Strategy

A novel training strategy for GANs—progressive growing of GANs (ProGANs)—has been recently developed to improve the stability of the GAN training process [32] and hence the ability to learn generators that sample from distributions of high-resolution images. GANs are conventionally trained directly on full size images through the entire training process. In contrast, ProGANs adopt a multi-resolution approach to training. Initially, a generator and discriminator are trained by use of down-sampled (low resolution) training images. During each subsequent training stage, higher resolution versions of the original training images are employed to train progressively deeper discriminators and generators, continuing until a final version of the generator is trained by use of the original high-resolution images. While this progressively growing training strategy has found widespread success with conventional GANs, as described below, it cannot generally be employed with AmbientGANs. A solution to this problem is described next.

Iii Establishing SOMs by use of Progressively-Growing AmbientGANs

As discussed above, AmbientGANs enable the learning of SOMs from noisy imaging measurements but can be difficult to train, while ProGANs can be stably trained and established by use of higher-dimensional image data that are generally affected by noise and the image formation process. Below, a novel strategy, Progressively Growing AmbientGANs (ProAmGANs), is proposed to enable progressive growing of AmbientGANs for learning realistic SOMs from noisy and indirect imaging measurements.

The ProAmGAN progressively grows the generator to establish the SOM from its low-resolution version to full-resolution version. As with the AmbientGANs, the imaging measurements are subsequently simulated by applying the measurement operator to the generator-produced objects. However, imaging measurements acquired in most medical imaging systems are indirect representations of objects to-be-imaged (e.g., Radon transform data, k-space data). In such cases, the low-resolution version of the measured image data and the low-resolution version of the objects may not be simply related because they reside in generally different Hilbert spaces. Accordingly, in these cases, the progressive growing strategy cannot be directly applied because the generator in the original ProGAN produces images that reside in the same Hilbert space as the training data employed by the discriminator. To address this issue, in addition to including the measurement operator as with the AmbientGAN training strategy, an image reconstruction operator : is included in the proposed ProAmGAN training strategy. In this way, the generator can be trained to produce images that reside in the same Hilbert space as the images employed by the discriminator and the progressive growing strategy can be subsequently employed. The ProAmGAN training strategy is illustrated in Fig. 2.

Fig. 2: An illustration of ProAmGAN training. The training starts with low image resolution (e.g., ) and the image resolution is increased progressively by adding more layers to the generator and the discriminator. The discriminator is trained to distinguish between the ground-truth and generated reconstructed objects.

Given a training dataset that comprises measured data , a set of reconstructed objects is computed by applying the operator to the measured data : . Denote the reconstructed object corresponding to the generator-produced measured data as : . The discriminator in the ProAmGAN is trained to distinguish between and , and the generator is trained to generate objects such that the corresponding reconstructed objects are indistinguishable from the reconstructed objects that were reconstructed from the provided measurement data (i.e., training data). As with the AmbientGAN, when the distribution of objects uniquely induces the distribution of reconstructed objects , and the ProAmGAN achieves the global optimal at the final full-resolution stage, the trained generator can be employed to produce object samples drawn from the distribution . In special cases where the imaging operator is full-rank and the measurement noise , ProAmGANs reduce to original ProGANs that are directly trained on objects.

Iv Numerical studies

Computer-simulation studies were conducted to demonstrate the ability of the proposed ProAmGAN to establish realistic SOMs from imaging measurements corresponding to different stylized imaging modalities. Details regarding the design of the computer-simulation studies are provided below.

Iv-a Idealized direct imaging system

An idealized direct imaging system that acquired chest radiographs, modeled as: , was considered first. By design, it was assumed that the measurement noise was the only source of image degradation. The motivation for this study was to demonstrate the ability of the ProAmGAN to learn an SOM from noisy images.

An NIH database of clinical chest X-ray images [53] was employed to serve as ground truth objects . Three thousand images were selected from this dataset. These images were centrally cropped and resized to the dimension of and were normalized to the range between 0 and 1. A collection of 3000 simulated measured images

were produced by adding independent and identically distributed (i.i.d.) Gaussian noise with zero mean and the standard deviation of

to the collection of objects . An example of the objects and the corresponding noisy imaging measurement are shown in Fig. S. 7 in the Supplementary file.

From the ensemble of simulated measured data, with the knowledge of the measurement noise model, a ProAmGAN was trained to establish a SOM that characterizes the distribution of objects . The architecture of the generator and the discriminator employed in the ProAmGAN is described in Table S. 1 in the Supplementary file. Because the idealized planar X-ray imaging system acquires direct representations of objects (i.e., ), the reconstruction operator was set to be the identity operator in the ProAmGAN training process.

For comparison, by use of the same ensemble of simulated measured images , a ProGAN was trained. In this case, the generator was trained to learn the distribution of measured images themselves, which are contaminated by measurement noise, instead of learning the distribution of objects (i.e., the SOM). The ProGAN employed a generator and discriminator with the same architectures as those employed in the ProAmGAN.

The Fréchet Inception Distance (FID) [29]

score, a widely employed metric to evaluate the performance of generative models, was computed to evaluate the performance of the original ProGAN and the proposed ProAmGAN. The FID score quantifies the distance between the features extracted by the Inception-v3 network 

[50] from the ground-truth (“real”) and generated objects (“fake”). Lower FID score indicates better quality and diversity of the generated objects. The FID scores were computed by use of 3000 ground-truth objects, 3000 ProGAN-generated objects and 3000 ProAmGAN-generated objects.

The structural similarity index (SSIM) [54]

is a figure-of-merit describing the similarity of two digital images. As another form of evaluation, SSIM values were computed for different pairs of images. First, SSIM values were computed from 500,000 random pairs of ground truth objects. Next, SSIM values were computed from 500,000 random pairs of ProAmGAN-generated and ground truth objects. Finally, as a comparison, SSIM values were computed from 500,000 random pairs of ProGAN-generated and ground truth objects. From these three collections of SSIM values, three histograms were formed. The overlap area between any two of the histograms (i.e., empirical PDFs) and the two-sample Kolmogorov-Smirnov (KS) test statistics 

[57] were computed.

Iv-B Computed tomographic imaging system

A stylized tomographic imaging system was investigated next. This imaging system was described as: , where denotes a 2D discrete Radon transform [31] that maps a 2D object to a sinogram. The angular scanning range was 180 degrees and tomographic views were evenly spaced with a 1 degree angular step.

An NIH-sponsored database of clinical chest CT images [56] was employed to serve as ground truth objects . Three thousand images of dimension of were selected from this dataset and were normalized to the range between 0 and 1. A collection of 3000 measured data were simulated by acting on each object and adding i.i.d. Gaussian noise with a standard deviation of . An example of the objects and the corresponding measured imaging data are shown in Fig. S. 8 in the Supplementary file.

From the collection of measured data , a set of reconstructed objects was generated by use of a filtered back-projection (FBP) reconstruction algorithm that employed a Ram-Lak filter. With the knowledge of the imaging operator and the measurement noise model, a ProAmGAN was subsequently trained by use of the reconstructed objects. The ProAmGAN employed the generator and discriminator with the architectures described in Table S. i@ (a) in the Supplementary file. In the ProAmGAN training process, the Radon transform and the FBP operator were applied to the generated objects as discussed in Sec. III.

As a comparison, a ProGAN was trained by use of reconstructed objects . The generator in the ProGAN was trained to learn the distribution of instead of learning the distribution of . The ProGAN employed a generator and discriminator with the same architectures as those employed in the ProAmGAN. The FID scores and empirical PDFs of SSIM values were computed as described in Sec. IV-A.

Iv-C MR imaging system with complete k-space data

A stylized MR imaging system that acquires fully-sampled k-space data was investigated. This imaging system was described as: , where

denotes a 2D discrete Fourier transform (DFT). A database of clinical brain MR images 

[15] were employed to serve as ground truth objects . Three thousand images having the dimension of were selected from this dataset and were normalized to the range between 0 and 1. A collection of 3000 measured image data were simulated by computing the 2D DFT of the objects and adding i.i.d. zero mean Gaussian noise with a standard deviation of 10 to both the real and imaginary components. An example of the objects and the corresponding magnitude of the measured k-space data are shown in Fig. S. 9 in the Supplementary file.

From the ensemble of measured images, an ensemble of reconstructed images was generated by acting a 2D inverse discrete Fourier transform (IDFT) to each measured image data . A ProAmGAN was subsequently trained to establish a SOM that characterizes the distribution of objects by use of the ensemble of reconstructed images . The ProAmGAN employed a generator and discriminator with architectures described in Table S. i@ (a) in the Supplementary file. In the training process, the 2D DFT and IDFT were applied to the generator-produced objects as discussed in Sec. III.

For comparison, a ProGAN was trained by use of reconstructed images . The ProGAN employed a generator and discriminator with the same architectures as those employed in the ProAmGAN. The FID score and empirical PDFs of SSIM values were also computed as described in Sec. IV-A.

Iv-D MR imaging system with under-sampled k-space data

MR imaging systems sometimes acquire under-sampled k-space data to accelerate the data-acquisition process. In such cases, the imaging operator has a non-trivial null space and only the measurement component can be observed through the imaging system. Here, denotes the Moore-Penrose pseudo-inverse of and can be computed by applying a 2D IDFT to the zero-filled k-space data. In this study, the impact of k-space under-sampling on images produced by the ProAmGAN was investigated.

Clinical brain MR images contained in the NYU fastMRI Initiative database [59] (https://fastmri.med.nyu.edu/) were employed to serve as ground truth objects . Three thousand images having dimension of were selected from this database for use in this study. These images were resized to the dimension of and were normalized to the range between 0 and 1. Five data-acquisition designs corresponding to different k-space sampling ratios were considered: , , , , and . Here, the k-space sampling ratio was defined as the ratio of the number of sampled k-space components to the number of complete k-space components. The sampling patterns are illustrated in the top row of Fig. 3. For each considered design, a collection of 3000 measured data were simulated by computing and sampling the k-space data and adding i.i.d. zero mean Gaussian noise with a standard deviation of 2 to both the real and imaginary components.

Fig. 3: Top: k-space sampling patterns corresponding to different sampling ratios of , , , , and from left to right; Bottom: images reconstructed by use of corresponding to the k-space sampling patterns in the top row.

For each data-acquisition design, reconstructed objects were produced by acting the pseudo-inverse operator on the given measured image data . Examples of reconstructed images using pseudo-inverse method corresponding to the considered sampling patterns are shown in the bottom row of Fig. 3. A ProAmGAN was subsequently trained to establish a SOM for each data-acquisition design. The architecture of the generator and the discriminator employed in the ProAmGAN is described in Table S. i@ (b) in the Supplementary file. In the training process, and were applied to the generator-produced objects as discussed in Sec. III. The FID score was computed by use of 3000 ground-truth objects and 3000 ProAmGAN-generated objects for each data-acquisition design. Because only the measurement component can be measured by imaging systems, the ability of ProAmGANs to learn the variation in the measurement components was investigated. Specifically, the FID score was computed by use of the ground-truth measurement components and ProAmGAN-generated measurement components for each data-acquisition design.

As a comparison, an original ProGAN was trained by use of the reconstructed objects for each data-acquisition design. The ProGAN employed the generator and the discriminator with the same architecture as those employed in the ProAmGAN. The ProGAN-produced images were compared to the ProAmGAN-produced images.

Iv-E Task-based image quality assessment

In this study, the ProAmGAN-established SOMs corresponding to fastMRI brain objects were evaluated by use of objective measures of IQ. Specifically, the ProAmGAN-established SOMs were evaluated by comparing task-specific image quality measures computed by use of generated objects to those computed by use of ground-truth objects. A signal-known-exactly binary classification task was considered in which an observer classifies noisy MR images as satisfying either a signal-absent hypothesis (

) or signal-present hypothesis (). The imaging processes under these two hypotheses can be described as:

(3a)
(3b)

where denotes a signal image and is i.i.d. zero-mean Gaussian noise. Two different noise levels with standard deviations of and , and five different signals were considered. The considered signals are shown in Fig. 4.

Fig. 4: Five signals considered in the signal detection study.

Each considered signal detection task was performed on a region of interest (ROI) of dimension of

pixels centered at the signal location. The signal-to-noise ratio of the Hotelling observer (HO) test statistic

was employed as the figure-of-merit for assessing the image quality[7]:

(4)

where denotes the vectorized signal image in the ROI, and denotes the covariance matrix corresponding to the ROIs in the noisy MR images. When computing , was calculated by use of a covariance matrix decomposition [7]. The values of computed by use of 3000 ground truth objects and 3000 generated objects were compared.

Iv-F Training details

All ProAmGANs and ProGANs were trained by use of Tensorflow

[1] by use of 4 NVIDIA Tesla V100 GPUs. The Adam algorithm [35], which is a stochastic gradient algorithm, was employed as the optimizer in the training process. The ProAmGANs were implemented by modifying the ProGAN code ( https://github.com/tkarras/progressive_growing_of_gans) according to the proposed ProAmGAN architecture illustrated in Fig. 2. Specifically, for each considered imaging system, the corresponding measurement operator and the reconstruction operator were applied to the generator-produced images, and the output images were subsequently employed by the discriminator. The training of all ProAmGANs and ProGANs started with a resolution of . During the training process, the resolution was doubled by gradually adding more layers to the generator and the discriminator until the final resolution was achieved. More details regarding the progressive training details can be found in the literature [32].

V Results

V-a Visual assessments

The ground-truth (top row) and ProAmGAN-generated objects (bottom row) corresponding to chest X-ray images are shown in Fig. 5. The ProAmGAN-generated objects have similar visual appearances to the ground-truth ones. Additional ProAmGAN-generated chest X-ray images are shown in Fig. S. 4 in the Supplementary file.

Fig. 5: Top: Ground-truth chest X-ray objects . Bottom: ProAmGAN-generated chest X-ray objects .

A ProGAN-generated and ProAmGAN-generated objects are further compared in Fig. 6. It is clear that the ProAmGAN-produced chest X-ray image contains less noise than the one produced by the ProGAN. This demonstrates the ability of the ProAmGAN to mitigate measurement noise when establishing SOMs.

Fig. 6: A ProGAN-generated (left panel) and ProAmGAN-generated (right panel) chest X-ray object.

The ground-truth (top row) and ProAmGAN-generated objects (bottom row) corresponding to chest CT and brain MR images are shown in Figs. 7 and 9. The ProAmGAN-generated objects have similar visual appearances to ground-truth ones. Additional ProAmGAN-generated chest CT images and brain MR images are shown in Figs. S. 5 and S. 6 in the Supplementary file.

Fig. 7: Top: Ground-truth chest CT objects . Bottom: ProAmGAN-generated chest CT objects .

ProGAN-generated and ProAmGAN-generated objects are shown in more detail in Figs. 8 and 10. It is clear that the ProAmGAN-produced chest CT image in Fig. 8 contains fewer artifacts than the one produced by the ProGAN. This demonstrates the ability of the ProAmGAN to mitigate reconstruction artifacts when establishing SOMs.

Fig. 8: A ProGAN-generated (left panel) and ProAmGAN-generated (right panel) chest CT object.
Fig. 9: Top: Ground-truth brain MR objects . Bottom: ProAmGAN-generated brain MR objects .

The ProAmGAN-produced brain MR image in Fig. 10 contains less noise than the one produced by the ProGAN. This demonstrates the ability of the ProAmGAN to mitigate the noise in the reconstructed images when establishing SOMs.

Fig. 10: A ProGAN-generated (left panel) and ProAmGAN-generated (right panel) brain MR object.

V-B Quantitative assessments

The FID scores corresponding to ProGANs and ProAmGANs for the idealized direct imaging system, computed tomographic imaging system and MR imaging system with complete k-space data are shown in TABLE I. The ProAmGANs had smaller FID scores than the ProGANs, which indicates that the ProAmGANs outperformed the ProGANs.

width=center ProGAN ProAmGAN X-ray CT MRI X-ray CT MRI FID score 65.583 62.385 47.247 28.798 30.616 41.637 SSIM PDF overlap area 0.164 0.523 0.721 0.957 0.960 0.980 Two-sample KS test statistic 0.837 0.477 0.279 0.043 0.038 0.017

TABLE I: FID and metrics that evaluate PDFs of SSIMs. Here, “X-ray”, “CT”, and “MRI” correspond to the idealized direct imaging system, computed tomographic imaging system and MR imaging system with complete k-space data, respectively.

The empirical PDFs of SSIMs corresponding to the idealized direct imaging system, computed tomographic imaging system and MR imaging system with complete k-space data are shown in Fig. 11, and the corresponding PDF overlap areas and two-sample KS test statistics are summarized in TABLE I. The PDFs of SSIMs corresponding to the ProAmGAN-generated and ground-truth objects largely overlap, while the one corresponding to the ProGAN-generated images had a significant discrepancy to the ground-truth PDF.

(a) Idealized direct imaging system
(b) Computed tomographic imaging system
(c) MR imaging system with complete k-space data
Fig. 11: Empirical PDFs of SSIMs corresponding to ground-truth image pairs (red curves), ground-truth and ProAmGAN-generated image pairs (blue curves), and ground-truth and ProGAN-generated image pairs (yellow curves).

V-C MR imaging system with under-sampled k-space data

The ground-truth (top row) objects and ProAmGAN-generated objects trained with k-space sampling ratio (bottom row) are shown in Fig. 12. The ProAmGAN-generated objects have similar visual appearances to the ground-truth objects.

Objects produced by ProAmGANs and ProGANs trained with different data-acquisition designs are shown in Fig. 13. It was observed that the ProAmGAN-generated objects (top row) are visually plausible for the k-space sampling ratios that range from to , while the noise and aliasing artifacts appear in the ProGAN-generated objects (bottom row).

Fig. 12: Top: Examples of ground-truth objects . Bottom: Examples of ProAmGAN-generated objects corresponding to the data-acquisition design with k-space sampling ratio.
Fig. 13: ProAmGAN-generated objects (top row) and ProGAN-generated objects (bottom row). From left to right, the ProGAN and ProAmGAN trained with the k-space sampling ratio of , , , , and .

The FID corresponding to the objects and that corresponding to the measurement components for each data-acquisition design are summarized in TABLE II. It is observed that the FID between and increased when the k-space sampling ratio decreased, while the FID between and were not significantly changed. This indicates that the SOMs established by ProAmGANs can be affected by the null space of imaging operator, while the variation in the measurement components can be reliably learned.

FID for FID for
Full k-space         30.225
4/5 k-space 38.510 24.033
1/2 k-space 65.478 20.383
1/4 k-space 105.607 19.103
1/8 k-space 144.367 20.122
TABLE II: FID scores corresponding to the objects and the measurement components.

V-D Task-based image quality assessment

The Hotelling observer performance was computed according to Eq. (4) and is shown in Fig. 14. It was observed that has a positive bias when the ProAmGAN is trained with imaging systems that have large k-space missing ratios. This is because the ProAmGAN was not able to learn the complete object variation when the imaging system has a large null-space. When the noise level was increased, the object variation became relatively less important in terms of limiting the observer performance, and the positive bias of subsequently became less significant. This is consistent with the observation in reference [37].

Fig. 14: Hotelling observer performance corresponding to different tasks with different signals, noise levels, and k-space sampling ratios.

Vi Discussion and Conclusion

Variation in the objects to-be-imaged can significantly limit the performance of an observer. When conducting computer-simulation studies, this variation can be described by SOMs. In this work, a deep learning-based method that employed ProAmGANs was developed and investigated for establishing SOMs from measured image data. The proposed ProAmGAN strategy incorporates the advanced progressive growing training procedure and therefore enables the AmbientGAN to be applied to realistically sized medical image data. To demonstrate this, stylized numerical studies were conducted in which ProAmGANs were trained on different object ensembles corresponding to common medical imaging modalities. Both visual examinations and quantitative analyses including task-specific validations indicate that the proposed ProAmGANs hold promise to establish realistic SOMs from imaging measurements.

In addition to objectively assessing imaging systems and data-acquisition designs, the ProAmGAN-established SOMs can be employed to regularize image reconstruction problems. Recent methods have been developed for regularizing image reconstruction problems based on GANs such as Compressed Sensing using Generative Models (CSGM)[11] and image-adaptive GAN-based reconstruction methods (IAGAN)[30, 9]. These methods can be readily employed with the SOMs established by use of the proposed ProAmGANs. ProAmGANs can also be used to produce clean reference images for training deep neural networks for solving other image-processing problems such as image denoising[60]

and image super-resolution

[21].

It is desirable to establish three-dimensional (3D) object models. A preliminary study developed a progressive-growing 3D GAN[22] and demonstrated its ability to generate 3D MR brain images with the dimension of . Our proposed method can be readily extended to establish 3D object models by adopting such 3D GAN training strategies. Establishing a 3D version of the ProAmGAN will be explored in the future.

There remain additional topics for future investigation. It is critical to validate the learned SOMs for specific diagnostic tasks. We have conducted preliminary task-specific validation studies by use of the Hotelling Observer [7, 66] and simple binary signal detection tasks. It will be important to validate the learned SOMs for more complicated tasks by use of other observers such as the ideal observer[61, 65, 62, 63] and anthropomorphic observers[41]. Finally, our proposed method can be readily employed with other GAN architectures such as the style-based generator architecture (StyleGAN) [33, 34] that can provide the additional ability to control certain features of generated-images and potentially can further improve the quality of generated-images.

References

  • [1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. (2016)

    Tensorflow: a system for large-scale machine learning.

    .
    In OSDI, Vol. 16, pp. 265–283. Cited by: §IV-F.
  • [2] F. Ambellan, A. Tack, M. Ehlke, and S. Zachow (2019)

    Automated segmentation of knee bone and cartilage combining statistical shape knowledge and convolutional neural networks: data from the osteoarthritis initiative

    .
    Medical Image Analysis 52, pp. 109–118. Cited by: §I.
  • [3] M. A. Anastasio, C. Chou, A. M. Zysk, and J. G. Brankov (2010) Analysis of ideal observer signal detectability in phase-contrast imaging employing linear shift-invariant optical systems. JOSA A 27 (12), pp. 2648–2659. Cited by: §I.
  • [4] M. Arjovsky and L. Bottou (2017) Towards principled methods for training generative adversarial networks. In International Conference on Learning Representations (ICLR 2017), External Links: Link Cited by: §II-B.
  • [5] M. Arjovsky, S. Chintala, and L. Bottou (2017-01) Wasserstein GAN. arXiv e-prints, pp. arXiv:1701.07875. External Links: 1701.07875 Cited by: §II-B.
  • [6] S. Arora and Y. Zhang (2017) Do GANs actually learn the distribution? an empirical study. CoRR abs/1706.08224. External Links: 1706.08224, Link Cited by: §II-B.
  • [7] H. H. Barrett and K. J. Myers (2013) Foundations of Image Science. John Wiley & Sons. Cited by: §I, §II, §IV-E, §VI.
  • [8] H. H. Barrett, J. Yao, J. P. Rolland, and K. J. Myers (1993) Model observers for assessment of image quality. Proceedings of the National Academy of Sciences 90 (21), pp. 9758–9765. Cited by: §I.
  • [9] S. Bhadra, W. Zhou, and M. A. Anastasio (2020) Medical image reconstruction with image-adaptive priors learned by use of generative adversarial networks. In Medical Imaging 2020: Physics of Medical Imaging, Vol. 11312, pp. 113120V. Cited by: §VI.
  • [10] F. O. Bochud, C. K. Abbey, and M. P. Eckstein (1999) Statistical texture synthesis of mammographic images with clustered lumpy backgrounds. Optics Express 4 (1), pp. 33–43. Cited by: §I, §I, §II-A.
  • [11] A. Bora, A. Jalal, E. Price, and A. G. Dimakis (2017) Compressed sensing using generative models. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 537–546. Cited by: §VI.
  • [12] A. Bora, E. Price, and A. G. Dimakis (2018) AmbientGAN: Generative models from lossy measurements. In International Conference on Learning Representations (ICLR), Cited by: §I, §II-B.
  • [13] A. Brock, J. Donahue, and K. Simonyan (2018) Large scale GAN training for high fidelity natural image synthesis. CoRR abs/1809.11096. External Links: 1809.11096, Link Cited by: §II-B.
  • [14] M. Caon (2004) Voxel-based computational models of real human anatomy: a review. Radiation and Environmental Biophysics 42 (4), pp. 229–235. Cited by: §I.
  • [15] J. Cheng Brain tumor dataset. External Links: Link Cited by: §IV-C.
  • [16] E. Clarkson, M. A. Kupinski, and H. H. Barrett (2002) Transformation of characteristic functionals through imaging systems. Optics express 10 (13), pp. 536–539. Cited by: §II-A.
  • [17] D. L. Collins, A. P. Zijdenbos, V. Kollokian, J. G. Sled, N. J. Kabani, C. J. Holmes, and A. C. Evans (1998) Design and construction of a realistic digital brain phantom. IEEE Transactions on Medical Imaging 17 (3), pp. 463–468. Cited by: §I.
  • [18] T. Cootes, M. Roberts, K. Babalola, and C. Taylor (2015) Active Shape and Appearance Models. In Handbook of Biomedical Imaging, pp. 105–122. Cited by: §I.
  • [19] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham (1995) Active shape models-their training and application. Computer vision and image understanding 61 (1), pp. 38–59. Cited by: §I.
  • [20] E. L. Denton, S. Chintala, A. Szlam, and R. Fergus (2015) Deep generative image models using a laplacian pyramid of adversarial networks. CoRR abs/1506.05751. External Links: 1506.05751, Link Cited by: §II-B.
  • [21] C. Dong, C. C. Loy, K. He, and X. Tang (2014) Learning a deep convolutional network for image super-resolution. In European Conference on Computer Vision, pp. 184–199. Cited by: §VI.
  • [22] A. Eklund (2019) Feeding the zombies: synthesizing brain volumes using a 3d progressive growing gan. arXiv preprint arXiv:1912.05357. Cited by: §VI.
  • [23] V. Ferrari, F. Jurie, and C. Schmid (2010) From images to shape models for object detection. International Journal of Computer Vision 87 (3), pp. 284–303. Cited by: §I.
  • [24] I. Goodfellow, Y. Bengio, and A. Courville (2016) Deep learning. The MIT Press. External Links: ISBN 0262035618, 9780262035613 Cited by: §II.
  • [25] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 2672–2680. Cited by: §I, §II-B, §II-B.
  • [26] N. Gordillo, E. Montseny, and P. Sobrevilla (2013) State of the art survey on MRI brain tumor segmentation. Magnetic resonance imaging 31 (8), pp. 1426–1438. Cited by: §I.
  • [27] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville (2017) Improved training of wasserstein gans. CoRR abs/1704.00028. External Links: 1704.00028, Link Cited by: §II-B.
  • [28] T. Heimann and H. Meinzer (2009) Statistical shape models for 3d medical image segmentation: a review. Medical Image Analysis 13 (4), pp. 543–563. Cited by: §I.
  • [29] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pp. 6626–6637. Cited by: §IV-A.
  • [30] S. A. Hussein, T. Tirer, and R. Giryes (2019) Image-adaptive gan based reconstruction. arXiv preprint arXiv:1906.05284. Cited by: §VI.
  • [31] A. C. Kak, M. Slaney, and G. Wang (2002) Principles of computerized tomographic imaging. Medical Physics 29 (1), pp. 107–107. Cited by: §IV-B.
  • [32] T. Karras, T. Aila, S. Laine, and J. Lehtinen (2017) Progressive Growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196. Cited by: §I, §II-C, §IV-F.
  • [33] T. Karras, S. Laine, and T. Aila (2019) A style-based generator architecture for generative adversarial networks. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 4401–4410. Cited by: §VI.
  • [34] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila (2019) Analyzing and improving the image quality of stylegan. arXiv preprint arXiv:1912.04958. Cited by: §VI.
  • [35] D. P. Kingma and J. Ba (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §IV-F.
  • [36] M. A. Kupinski and H. H. Barrett (2005) Small-animal spect imaging. Vol. 233, Springer. Cited by: §I, §II-A.
  • [37] M. A. Kupinski, E. Clarkson, and J. Y. Hesterman (2007) Bias in Hotelling observer performance computed from finite data. In Medical Imaging 2007: Image Perception, Observer Performance, and Technology Assessment, Vol. 6515, pp. 65150S. Cited by: §V-D.
  • [38] M. A. Kupinski, E. Clarkson, J. W. Hoppin, L. Chen, and H. H. Barrett (2003) Experimental determination of object statistics from noisy images. JOSA A 20 (3), pp. 421–429. Cited by: §I, §I, §II-A.
  • [39] C. M. Li, W. P. Segars, G. D. Tourassi, J. M. Boone, and J. T. Dobbins III (2009) Methodology for generating a 3d computerized breast phantom from empirical data. Medical Physics 36 (7), pp. 3122–3131. Cited by: §I.
  • [40] S. C. Li, B. Jiang, and B. M. Marlin (2019) MisGAN: learning from incomplete data with generative adversarial networks. CoRR abs/1902.09599. External Links: 1902.09599, Link Cited by: §II-B.
  • [41] F. Massanes and J. G. Brankov (2017) Evaluation of cnn as anthropomorphic model observer. In Medical Imaging 2017: Image Perception, Observer Performance, and Technology Assessment, Vol. 10136, pp. 101360Q. Cited by: §VI.
  • [42] K. J. Myers, R. F. Wagner, and K. M. Hanson (1993) Rayleigh task performance in tomographic reconstructions: Comparison of human and machine performance. In Medical Imaging 1993: Image Processing, Vol. 1898, pp. 628–637. Cited by: §I.
  • [43] A. Radford, L. Metz, and S. Chintala (2015-11) Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv e-prints, pp. arXiv:1511.06434. External Links: 1511.06434 Cited by: §II-B.
  • [44] J. P. Rolland and H. H. Barrett (1992-05) Effect of random background inhomogeneity on observer detection performance. J. Opt. Soc. Am. A 9 (5), pp. 649–658. External Links: Document, Link Cited by: §I, §I.
  • [45] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen (2016) Improved techniques for training gans. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, USA, pp. 2234–2242. External Links: ISBN 978-1-5108-3881-9, Link Cited by: §II-B.
  • [46] W. P. Segars and B. M. Tsui (2002) Study of the efficacy of respiratory gating in myocardial spect using the new 4-d ncat phantom. IEEE Transactions on Nuclear Science 49 (3), pp. 675–679. Cited by: §I.
  • [47] W. P. Segars, M. Mahesh, T. J. Beck, E. C. Frey, and B. M. Tsui (2008) Realistic ct simulation using the 4d xcat phantom. Medical Physics 35 (8), pp. 3800–3808. Cited by: §I.
  • [48] K. Shen, J. Fripp, F. Mériaudeau, G. Chételat, O. Salvado, P. Bourgeat, A. D. N. Initiative, et al. (2012) Detecting global and local hippocampal shape changes in alzheimer’s disease using statistical shape models. Neuroimage 59 (3), pp. 2155–2166. Cited by: §I.
  • [49] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb (2016) Learning from simulated and unsupervised images through adversarial training. CoRR abs/1612.07828. External Links: 1612.07828, Link Cited by: §II-B.
  • [50] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna (2016) Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826. Cited by: §IV-A.
  • [51] S. Tomoshige, E. Oost, A. Shimizu, H. Watanabe, and S. Nawano (2014) A conditional statistical shape model with integrated error estimation of the conditions; application to liver segmentation in non-contrast ct images. Medical Image Analysis 18 (1), pp. 130–143. Cited by: §I.
  • [52] R. F. Wagner and D. G. Brown (1985) Unified SNR analysis of medical imaging systems. Physics in Medicine & Biology 30 (6), pp. 489. Cited by: §I.
  • [53] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers (2017) Chestx-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2097–2106. Cited by: §IV-A.
  • [54] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli (2004) Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13 (4), pp. 600–612. Cited by: §IV-A.
  • [55] X. G. Xu (2014) An exponential growth of computational phantom research in radiation protection, imaging, and radiotherapy: a review of the fifty-year history. Physics in medicine and biology 59 (18), pp. R233. Cited by: §I.
  • [56] K. Yan, X. Wang, L. Lu, and R. M. Summers (2018) DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning. Journal of Medical Imaging 5 (3), pp. 036501. Cited by: §IV-B.
  • [57] I. T. Young (1977) Proof without prejudice: use of the kolmogorov-smirnov test for the analysis of histograms from flow systems and other sources.. Journal of Histochemistry & Cytochemistry 25 (7), pp. 935–941. Cited by: §IV-A.
  • [58] M. Zankl and K. Eckerman (2010) The gsf voxel computational phantom family. Handbook of Anatomical Models for Radiation Dosimetry, pp. 65–85. Cited by: §I.
  • [59] J. Zbontar, F. Knoll, A. Sriram, M. J. Muckley, M. Bruno, A. Defazio, M. Parente, K. J. Geras, J. Katsnelson, H. Chandarana, et al. (2018) FastMRI: an open dataset and benchmarks for accelerated mri. arXiv preprint arXiv:1811.08839. Cited by: §IV-D.
  • [60] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang (2017) Beyond a gaussian denoiser: residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing 26 (7), pp. 3142–3155. Cited by: §VI.
  • [61] W. Zhou and M. A. Anastasio (2018) Learning the Ideal Observer for SKE detection tasks by use of convolutional neural networks. In Medical Imaging 2018: Image Perception, Observer Performance, and Technology Assessment, Vol. 10577, pp. 1057719. Cited by: §VI.
  • [62] W. Zhou and M. A. Anastasio (2019) Learning the ideal observer for joint detection and localization tasks by use of convolutional neural networks. In Medical Imaging 2019: Image Perception, Observer Performance, and Technology Assessment, Vol. 10952, pp. 1095209. Cited by: §VI.
  • [63] W. Zhou and M. A. Anastasio (2020) Markov-chain monte carlo approximation of the ideal observer using generative adversarial networks. In Medical Imaging 2020: Image Perception, Observer Performance, and Technology Assessment, Vol. 11316, pp. 113160D. Cited by: §VI.
  • [64] W. Zhou, S. Bhadra, F. Brooks, and M. A. Anastasio (2019) Learning stochastic object model from noisy imaging measurements using ambientgans. In Medical Imaging 2019: Image Perception, Observer Performance, and Technology Assessment, Vol. 10952, pp. 109520M. Cited by: §I, §II-B.
  • [65] W. Zhou, H. Li, and M. A. Anastasio (2019)

    Approximating the Ideal Observer and Hotelling Observer for binary signal detection tasks by use of supervised learning methods

    .
    IEEE Transactions on Medical Imaging 38 (10), pp. 2456–2468. Cited by: §VI.
  • [66] W. Zhou, H. Li, and M. A. Anastasio (2019) Learning the Hotelling observer for SKE detection tasks by use of supervised learning methods. In Medical Imaging 2019: Image Perception, Observer Performance, and Technology Assessment, Vol. 10952, pp. 1095208. Cited by: §VI.
  • [67] X. G. Zu (2005) The VIP-Man Model-A Digital Human Testbed for Radiation Siimulations. SAE Transactions, pp. 779–787. Cited by: §I.