Improved low-count quantitative PET reconstruction with a variational neural network

06/05/2019 ∙ by Hongki Lim, et al. ∙ University of Michigan 0

Image reconstruction in low-count PET is particularly challenging because gammas from natural radioactivity in Lu-based crystals cause high random fractions that lower the measurement signal-to-noise-ratio (SNR). In model-based image reconstruction (MBIR), using more iterations of an unregularized method may increase the noise, so incorporating regularization into the image reconstruction is desirable to control the noise. New regularization methods based on learned convolutional operators are emerging in MBIR. We modify the architecture of a variational neural network, BCD-Net, for PET MBIR, and demonstrate the efficacy of the trained BCD-Net using XCAT phantom data that simulates the low true coincidence count-rates with high random fractions typical for Y-90 PET patient imaging after Y-90 microsphere radioembolization. Numerical results show that the proposed BCD-Net significantly improves PET reconstruction performance compared to MBIR methods using non-trained regularizers, total variation (TV) and non-local means (NLM), and a non-MBIR method using a single forward pass deep neural network, U-Net. BCD-Net improved activity recovery for a hot sphere significantly and reduced noise, whereas non-trained regularizers had a trade-off between noise and quantification. BCD-Net improved CNR and RMSE by 43.4 (29.1 reconstruction results show that the non-MBIR U-Net over-fits the training data, BCD-Net successfully generalizes to data that differs from training data. Improvements were also demonstrated for the clinically relevant phantom measurement data where we used training and testing datasets having very different activity distribution and count-level.



There are no comments yet.


page 1

page 6

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Image reconstruction in low-count PET is particularly challenging because dominant gammas from natural radioactivity in Lu-based crystals cause low signal-to-noise-ratio (SNR) measurements [1]. To accurately reconstruct images in low-count PET, regularized model-based image reconstruction (MBIR) solves the following variational problem consisting of 1) a data fidelity that models the physical PET imaging model, and 2) a regularization term that penalizes image roughness and controls noise[2]:


Here, is the Poisson negative log-likelihood for measurement

and estimated measurement means

, the matrix denotes the system model, and denotes the mean background events such as scatter and random coincidences. Recently, applying learned regularizers to is emerging for MBIR [3].

While there is much ongoing research on machine learning or deep-learning techniques applied to CT

[4, 5, 6, 7, 8] and MRI [9, 10, 11, 12, 13] reconstruction problems, only a few studies have applied these techniques to PET. Most past PET studies use deep learning in image space without exploiting the physical imaging model in (1). For example, [14] applied a deep neural network (NN) mapping between reconstructed PET images with normal dose and reduced dose and [15]

applied a multilayer perceptron mapping between reconstructed images using maximum a posteriori algorithm and a reference (true) image, however, their framework uses the acquisition data only to form the initial image. Therefore, the reconstruction quality depends greatly on the training dataset and information from atypical imaging situations and variable noise levels (that are not part of the training set) may not be recovered well. A recent work

[16] trained a NN to reconstruct an image directly from sinogram, however, its framework has similar limitation with [14]. Recently [17, 18] proposed a PET MBIR framework using a deep-learning based regularizer. Our proposed MBIR framework, BCD-Net, also uses a regularizer that penalizes differences between the unknown image and “denoised” images given by regression neural network in a recurrent manner. In particular, whereas [17, 18] trained only a single image denoising NN, the proposed method is a recurrent framework that is composed of multiple trained NNs. This recurrent framework enables NNs in the later stages to learn how to recover fine details. Our proposed BCD-Net is also distinct from [17, 18] in that denoising NNs are derived by variational (optimization) formulation with a mathematical motivation (whereas, for the trained regularizer, [17, 18] brought U-Net [19] and DnCNN [20] developed for medical image segmentation and general Gaussian denoising) and characterized by less parameters, thereby avoiding over-fitting and generalizing well to unseen data especially when training samples are limited (see Section IV).

Variational NNs [21, 11, 9, 22, 10, 8] are a broad family of methods that originate from unrolling algorithm for solving an optimization problem and BCD-Net [23] is a specific example of a variational NN. BCD-Net is constructed by unfolding block coordinate descent (BCD) MBIR algorithm using “learned” convolutional analysis operators[24, 25], and significantly improved image recovery accuracy in extreme imaging applications, e.g., highly undersampled MRI, denoising low-SNR images, etc. A preliminary version of this paper was presented at the 2018 Nuclear Science Symposium and Medical Imaging Conference [26]. We significantly extended this work by applying our proposed method to the measurement data with newly developed techniques. We also added detailed analysis on our proposed method as well as comparisons to related works.

To show the efficacy of our proposed method BCD-Net in low-count PET imaging, we performed both digital phantom simulation and experimental measurement studies with activity distributions and count-rates that are relevant to clinical Y-90 PET imaging after liver radioembolization. Novel therapeutic applications have sparked growing interest in quantitative imaging of Y-90, an almost pure beta emitter that is widely used in internal radionuclide therapy. In addition to the FDA approved Y-90 microsphere radioembolization and Y-90 ibritumomab radioimmunotherapy, there are 50 active clinical trials for Y-90 labeled therapies ( However, the lack of gamma photons complicates imaging of Y-90; it involves SPECT via bremsstrahlung photons produced by the betas [27] or PET via a very low abundance positron in the presence of bremsstrahlung that leads to low signal-to-noise [28]. We apply a deep BCD-Net that is trained for realistic low-count PET imaging environments and compare its performance with those of non-trained regularizers and single forward pass denoising NN.

Our proposed BCD-Net applies to PET imaging in general, particularly in other imaging situations that also have low counts. Using shorter scan times and lower tracer activity in diagnostic PET has cost benefits and reduces radiation exposure, but at the expense of reduced counts that makes traditional iterative reconstruction challenging.

Section II develops our proposed BCD-Net architecture for PET MBIR. Section II also explains the simulation studies in the setting of Y-90 radioembolization and provides details on how we perform the physical phantom measurement. Section III presents how the different reconstruction methods perform with the simulation and measurement data. Section IV

discusses what imaging factor affects generalization performance of BCD-Net most, and compares between BCD-Net using shallow convolutional autoencoding denoiser and BCD-Net using deep U-Net denoiser. Section 

V concludes with future works.

Ii Methods

This section presents the problem formulation of the BCD-Net and gives a detailed derivation on how we get the final form of BCD-Net. We also provide several techniques for BCD-Net that we specifically devised for PET measurement data where each measurement has different count-level (and noise-level). Then we review the related works that we compare with BCD-Net such as MBIR methods using conventional non-trained regularizers and single forward pass denoising (non-MBIR) NN. This section also describes the simulation setting and details on measurement data and what evaluation metrics are used to assess the efficacy of each reconstruction algorithm.

Ii-a BCD algorithm for MBIR using “learned” convolutional regularization

Conventional PET regularizers penalize differences between neighboring pixels [29]. That approach is equivalent to assuming that convolving the image with the [1,-1] finite difference filters with different directions produces a sparse output. Using such “hand-crafted” filters is unlikely to be the best approach. A recent trend is to use training data to learn filters that produce sparse outputs when convolved with images of interest [24, 25]. Such learned filters can be used to define a regularizer that prefers images having sparse outputs, as follows[30]:


where is regularization parameter, is a set of convolutional filters, is a set of sparse codes, is a set of thresholding parameters controlling the sparsity of , is the number of image voxels, and and is the size and number of learned filters, respectively. BCD-Net is inspired by this type of “learned” regularizer. Ultimately, we hope that the learned regularizer can better separate true signal from noisy components.

BCD algorithm solves (1) with regularizer (2) by alternatively updating and :


where is the element-wise soft thresholding operator: .

Fig. 1: Architecture of the proposed BCD-Net for PET. The proposed BCD-Net has a recurrent NN architecture: each BCD-Net layer uses three inputs – fixed measurement and mean background , and the image reconstructed at the previous BCD-Net layer – and provides the reconstructed image .

Assuming that learned filters satisfy the tight-frame condition,   [24], we rewrite the updates in (3)-(4) as follows:


where denotes a rotated version of . For efficient image reconstruction (6) in PET, we use EM-surrogate of Poisson log-likelihood function [31]:

where and is the number of rays. Equating to zero is equivalent to finding the root of the following quadratic formula:

where denotes an element of the system model at th row and th column, and denotes th iteration in (6).

Ii-B BCD-Net for PET MBIR and its training

To further improve denoising capability, we extend the convolutional autoencoder in (5) [23], by replacing with separate decoding filters . We define the following updates for each layer:


One can further extend the convolutional autoencoder in (7) to a general regression NN, e.g., deep U-Net [19]. See Section IV.

We train the image denoising module in the form of (7) – specifically, consisting of encoding and decoding filters, and thresholding values – that “best” map between high-quality images (e.g., true images if available) and noisy images in the sense of mean squared error:


where is a set of true images and is a set of images estimated by image reconstruction module in the th layer.

Fig. 1 shows the corresponding BCD-Net architecture. We name the and updates in (7)-(8) as following two modules: 1) image denoising module and 2) image reconstruction module. A layer refers to one loop of denoising and reconstruction module.

Ii-C BCD-Net for measurement data in PET

Ii-C1 Normalization and scaling scheme

Different PET images can have very different intensity values due to variations in scan time and activity, and it is important for trained methods to be able to generalize to a wide range of count levels. Towards this end, we implemented normalization and scaling techniques in BCD-Net. [18] extended [17] by implementing “local linear fitting” to ensure that the denoising NN output has similar intensity as the input patch from the current estimated image. Our approach is different in that we normalize and scale the image with a global approach, not a patch-based approach. In particular, we modify the architecture in (7)-(8) as:


where the normalization function is defined by to satisfy , and the scaling function is defined by with . We solve the optimization problem over using Newton’s method:


We also apply this normalization technique when training the convolutional filters and thresholding values:

Ii-C2 Adaptive regularization parameter scheme

The best regularization parameter value can also vary greatly between scans, depending on the count level. Therefore, instead of choosing one specific value for regularization parameter, we set the value based on evaluation on current gradients of data-fidelity term and regularization term:


where is a constant specifying how we balance between the data-fidelity term and regularization term and denotes th iteration in (11).

Ii-D Conventional MBIR methods: Non-trained regularizers

We compared the proposed BCD-Net with two MBIR methods using standard non-trained regularizers.

Ii-D1 Total-variation (TV)

TV regularization penalizes the sum of absolute value of differences between adjacent voxels:

where is finite differencing matrix. Recent work [32] applied Primal-Dual Hybrid Gradient (PDHG) [33] for PET MBIR using TV regularization and demonstrated that PDHG-TV is superior than clinical reconstruction (e.g., OS-EM) for low-count datasets in terms of several image quality evaluation metrics such as contrast recovery and variability.

Ii-D2 Non-local means (NLM)

NLM regularization minimizes the differences between nearby patches in image:

where is a potential function of a scalar variable , is the search neighborhood around the th voxel, and is a patch extraction operator at the th voxel. We implemented Fair potential function for :

where is a design parameter and is the number of voxels in the patch . Unlike conventional local filters that assume similarity between only adjacent voxels, NLM filters can average image intensities over distant voxels. As in [34], we used ADMM to accelerate algorithmic convergence with an adaptive penalty parameter selection method [35].

Ii-E Conventional non-MBIR method: Denoising deep U-Net

Many related works [5, 36, 14]

use single image denoising (deep) NN (e.g., U-Net) as a post-reconstruction processing. We implemented non-recurrent (single forward pass) 3-D version of U-Net to compare with the proposed BCD-Net. The ‘encoder’ part of U-Net consists of multiple sets of 1) max pooling layer, 2) 3


3 convolutional layer, 3) batch normalization (BN) layer, 4) ReLU layer and the ‘decoder’ part of U-Net consists of multiple sets of 1) upsampling with trilinear interpolation

[17], 2) 333 convolutional layer, 3) batch normalization (BN) layer, 4) ReLU layer. We used ReLU layer as the last step to enforce the non-negativity constraint on image [17].

Ii-F Experimental setup: Digital phantom simulation and experimental measurement

Ii-F1 Y-90 PET/CT XCAT simulations

We used XCAT [37] phantom (Fig. 2) to simulate Y-90 PET following radioembolization. We set the image size to 128128100 with a voxel size (mm) and chose 100 slices ranging from lung to liver. To simulate the extremely low count scan, typical for Y-90 PET, we set total true coincidences and random fraction based on numbers from patient PET imaging performed after radioembolization [38]. To test the generalization capability of trained NNs, we changed all imaging factors between training and testing dataset. Here, imaging factors include activity distribution (shape and size of tumor and liver background, concentration ratio between hot and warm region) and count-level (total true coincidences and random fraction). Fig. 2 and Table I provide details on how we changed the testing dataset from the training dataset.

Training data Testing data
Concentration ratio (hot:warm) 9:1 4:1
Total net trues 200 K 500 K
Random fraction (%) 90.9 87.5
TABLE I: Details on XCAT simulation data: variations between training and testing data.
Sphere Liver-torso
Total activity (GBq) 0.65 1.9
Concentration ratio (hot:warm) 8.9:1 5.4:1
Total prompts 3.2 - 6.3 M 2.3 M
Total randoms 2.9 - 5.7 M 2.1 M
Total net trues 308 - 599 K 220 K
Random fraction(%) 90.3 - 90.5 90.7
TABLE II: Details on phantom measurement data: activity concentration ratio between hot and warm regions and randoms fractions for two phantom studies.
Patient A
Total activity (GBq) 2.55
Total prompts 2.7 M
Total randoms 2.3 M
Total net trues 380 K
Random fraction(%) 85.8
TABLE III: Details on a typical patient measurement data: total trues and randoms fractions.

Ii-F2 Y90 PET/CT physical phantom measurements and patient scan

For training NNs (BCD-Net and U-Net), we used measurement with sphere phantom (Fig. 4) where six ‘hot’ spheres (2,4,8,16,30 and 113 mL, 0.5 MBq/ml) are placed in a ‘warm’ background (0.057 MBq/ml) with total activity of 0.65 GBq. The phantom was scanned for 40 (3 acquisitions) - 80 (1 acquisition) minutes on a Siemens Biograph mCT PET/CT. For testing NNs and other reconstruction algorithms, we used an anthropomorphic liver/lung torso phantom (Fig. 4) with total activity and distribution that is clinically realistic for imaging following radioembolization with Y-90 microspheres: 5% lung shunt, 1.17 MBq/mL in liver, 3 hepatic lesions (4 and 16 mL spheres, 29 mL ovoid) of 6.6 MBq/ml. The phantom with total activity of 1.9 GBq was scanned for 30 minutes on a Siemens Biograph mCT PET/CT. Fig. 4 and Table II provide details on the count-level and activity distribution difference between training (sphere phantom) and testing (liver phantom) dataset. We also tested NNs with an actual Y-90 patient scan and the count-level information is provided in Table III.

We acquired all measurement data with time of flight information. The measurement data size is 40016862113. The last dimension of measurement indicates the number of time bin. The reconstructed image size is 400400112 with a voxel size (mm). To reconstruct the image with measurement data, we used SIEMENS time of flight system model ( in (1)) along with manufacturer given attenuation/normalization correction, PSF modelling, and randoms/scatters estimation.

Ii-G Evaluation metrics

For XCAT phantom simulation, we evaluated each reconstruction with activity recovery (AR) (VOI: hot sphere, background), contrast recovery (CR) (VOI: cold spot), contrast to noise ratio (CNR), and root mean squared error (RMSE). For physical phantom measurement, we used AR, coefficient of variation (CV) [39] (VOI: hot spheres), and CNR averaged over multiple hot spheres. For patient measurement, we used field of view (FOV) activity bias since the total activity in FOV is known (equal to the injected activity because the microspheres are trapped) while the activity distribution is unknown:

where is mean counts in the volume of interest (VOI),

is standard deviation between voxel values in uniform background liver, and

is the total number of voxels in field of view (FOV).

Attenuation map (coronal) Attenuation map (axial) True activity (training) True activity (testing) Zoomed in
Fig. 2: XCAT phantom simulation: (First row) coronal and axial view of attenuation map and true relative activity distribution corresponding to axial attenuation map. Red line of attenuation map indicates the slice shown in axial view. Red lines of zoomed image indicate where the profiles (averaged over 3 voxels) in Fig. 3 are taken. (Second row) reconstructed images of one slice from different reconstruction methods.
Profile at Lesion
Profile at Cold Spot
Fig. 3: XCAT phantom simulation result: Line profile where true image contains hot, warm and cold region.
Sphere phantom attenuation map (coronal) Attenuation map (axial) True activity image of BCD-Net
Liver phantom attenuation map (coronal) Attenuation map (axial) True activity image of BCD-Net
TRUE OSEM w/ post-filtering U-Net BCD-Net
Fig. 4: Y90 PET/CT physical phantom measurement: (First row: training data, Second row: testing data) Attenuation map, true activity, and of BCD-Net of sphere and liver phantom used for training and testing BCD-Net and U-Net. (Third row) Reconstructed images of one slice from different reconstruction methods.

Iii Results

Iii-a Experimental setup: Training BCD-Net and U-Net

Iii-A1 BCD-Net

We trained 3D convolutional filters and thresholding values in each layer with a stochastic gradient descent method using PyTorch

[40] deep-learning library. We trained a thirty-layer BCD-Net where each layer has sets of thresholding values and convolutional encoding/decoding filters for the simulation experiment and twenty-layer BCD-Net with (due to GPU memory capacity) for the measurement experiment. We set the size of each filter as , and the initial thresholding values by sorting the initial estimate of image and getting a 10% largest value of sorted initial image. We used Adam optimization method [41] to train the NN with learning rate of for encoding filters, for decoding filters, and

for thresholding values. We applied the learning rate decay scheme (e.g., decreasing the learning rate as a factor of 0.9 per 20 epochs). Due to the large size of 3D input, we set the batch size as 1. We used 300 epochs to train the denoising NN at each layer.

With the XCAT simulation data, we trained BCD-Net using five pairs () of 3D true image and 3D realizations (1 true image, 5 realizations). We generated multiple realizations to train the denoising NN to deal with the Poisson noise.

With measurement data, we used the sphere phantom data to train the BCD-Net. We used 4 pairs () of 3D true image and 4D measurements (1 true image, 4 scans) for training. We set in (13). In measurement experiment, due to the memory capacity of GPUs, we downsample with a factor of 2 before feeding into the denoising module and upsample before feeding into the image reconstruction module. The final output image is from the reconstruction module.

Iii-A2 U-Net

For training U-Net, we used identical training dataset that we used for training BCD-Net. We also used Adam optimization method to train the NN with learning rate of . We scaled the image generated by U-Net with described in (12). We set the number of convolutional filter channels of first layer of encoder as 48 for simulation dataset and 32 for measurement dataset.

Iii-B Experimental setup: Reconstruction and testing trained NN

With simulation dataset, we compare the proposed method BCD-Net to the standard EM (1 subset), TV-based MBIR with PDHG algorithm (PDHG-TV), NLM-based MBIR with ADMM algorithm (ADMM-NLM), and single forward pass denoising U-Net. For BCD-Net and U-Net, we used 20 EM algorithm iterations to get the initial image. We report averaged evaluation metrics over 3 realizations. With measurement dataset, we compare the proposed method BCD-Net to the standard EM (1 subset), OSEM (1 iteration and 21 subsets) with post-filtering which is identical to reconstruction setting used in our clinic, and denoising U-Net. We used 10 EM algorithm iterations to get the initial image for BCD-Net and U-Net.

Iii-B1 Reconstruction (testing) results on simulation data

We used 50 iterations for EM and 200 iterations for PDHG-TV, ADMM-NLM, and 5 iterations for the reconstruction module (8) at each layer of BCD-Net. When reporting evaluation results, we selected the iteration number for EM to obtain the highest CNR. For each regularizer, we finely tuned the regularization parameter (within range ) to achieve highest CNR and lowest RMSE values. For NLM, we additionally tuned the window and search sizes.

Iii-B2 Reconstruction (testing) results on measurement data

We tested on the liver-torso phantom measurement data and patient measurement data with trained BCD-Net (and U-Net) using sphere phantom data. We used 2 iterations for the reconstruction module (11) at each layer of BCD-Net.

Iii-C Results: Reconstruction (testing) on simulation data

The proposed variational NN, BCD-Net, significantly improves overall reconstruction performance over the other non-trained regularization methods. See Table IV and Fig. 2. Table IV shows that BCD-Net achieves best results in most evaluation metrics. In particular, BCD-Net improves CNR and RMSE by 43.4%/85.7%/8.3% and 12.9%/29.1%/26.9% compared to PDHG-TV/ADMM-NLM/U-Net. Fig. 2 shows that reconstructed image using BCD-Net is closest to the true image whereas PHDG-TV and U-Net exceedingly blur in cold region and ADMM-NLM is noisy in uniform region. The profiles in Fig. 3 also illustrate that PDHG-TV and U-Net underestimate the hot region, whereas profile of BCD-Net is very close to true value in both hot and cold region.

Lesion Liver Cold Spot
EM 77.6 79.5 49.4 7.7 7.0
PDHG-TV 73.8 90.9 64.5 6.9 6.5
ADMM-NLM 77.6 88.2 68.0 5.3 7.9
U-Net 72.3 88.3 35.7 9.1 7.7
BCD-Net 83.7 90.4 70.2 9.9 5.6
TABLE IV: XCAT Phantom Simulation data results: evaluation metrics for reconstructed images from different regularization methods showing superior performance of BCD-Net ( 100 (%) AR/CR, highest CNR, and lowest RMSE).

29ml 16ml 4ml Liver
EM 74.9 66.0 49.1 88.3 35.2 6.1
OSEM w/ post-filtering 83.9 80.1 47.5 90.4 40.9 5.6
U-Net 72.9 62.1 35.9 46.5 12.7 7.2
BCD-Net 97.1 80.3 54.0 87.2 25.6 6.8

We report CV and CNR value using average of three hot lesions.

TABLE V: Liver phantom measurement data results: evaluation metrics for reconstructed images from different methods
FOV bias
OSEM w/ filter -19.2
U-Net -33.0
BCD-Net -10.6
TABLE VI: Patient measurement data results: evaluation metrics for reconstructed images from different reconstruction methods

Iii-D Results: Reconstruction (testing) on measurement data

Iii-D1 Phantom study

Similar to the result of simulation data, BCD-Net improves overall reconstruction performance over the other reconstruction methods. See Fig. 4 and Table V. When reporting evaluation metrics and visualizing the images, we selected the iteration number for EM and BCD-Net to obtain the highest CNR. Table V shows that BCD-Net improves AR/CV/CNR at multiple spheres by 9.3%/37.5%/22.3% compared to OSEM. Even though U-Net improves CNR more than BCD-Net, image generated by U-Net looks unnatural and the shape of liver is round-shape like sphere phantom showing that U-Net is over-fitting to the training data.

Iii-D2 Patient study

Because of the unknown true activity distribution, we quantitatively evaluated each reconstruction method with FOV activity bias. BCD-Net improves the FOV bias by 44.9% and 67.0% compared to OSEM and U-Net. U-Net generates very artificial image that resembles the sphere phantom activity distribution. See Fig. 5.

Patient attenuation map (coronal) Attenuation map (axial) OSEM w/ filter (coronal) OSEM w/ filter (axial)
U-Net (coronal) U-Net (axial) BCD-Net (coronal) BCD-Net (axial)
Fig. 5: Y90 PET/CT patient measurement: Attenuation map and reconstructed images of one slice (coronal and axial view) using OSEM, U-Net, and BCD-Net.

Iv Discussion

In this study we showed the efficacy of trained regularization method BCD-Net on both qualitative and quantitative Y-90 PET/CT imaging and compared between conventional non-trained regularizers and other training-based denoising method. The proposed regularization method uses convolutional filters to lift estimated signals and thresholding operations to remove unwanted signals. Moreover, the recurrent framework of BCD-Net enables one to train the filters and thresholding values to deal with the different image roughness at its each layer. We experimentally demonstrate its capability of generalization with simulation and measurement data. In the XCAT PET/CT simulation with activity distributions and count-rates mimicking Y-90 PET imaging, AR in the volume of interest was significantly underestimated with standard reconstruction and other MBIR methods using non-trained regularization, however approached the true activity with the proposed regularization method. Improvements were also demonstrated for the measurement data where we used training and testing datasets having very different activity distribution and count-level.

We investigated if the recurrent BCD-Net combined with U-Net denoisers [17] (denoising module in (7) is replaced by U-Net) is better than the proposed BCD-Net using convolutional autoencoder denoiser (7). At each BCD-Net layer, U-Net in this experiment has about 22.6 million trainable parameters, whereas the denoiser in (7) has only 7040 ( trainable parameters. In Fig. 6, RMSE value of U-Net-based BCD-Net is lower than that of the proposed BCD-Net using (7) in training dataset, however, it is opposite in testing data. This result demonstrates that the shallow convolutional autoencoders in (7) has better generalization capability over the deep U-Net denoisers, when the training samples are limited. Fig. 6 also compares between reconstructed images from BCD-Net with the convolutional autoencoders (7) and that using U-Nets, and it shows that the visual quality of (7

)-based BCD-Net is better than that of U-Net-based BCD-Net in that U-Net denoisers generates artificial cold region around the hot region. This over-fitting issue with U-Net based regularizer could be moderated if one trained the NN with more diverse and large dataset; however, medical imaging community often finds it hard to collect large enough datasets compared to the computer vision community

[42], therefore it is important for a reconstruction method to ensure the generalization to unseen dataset.

(a) RMSE in training dataset (b) RMSE in testing dataset
(c) TRUE (d) Autoencoder denoiser (e) U-Net denoiser
Fig. 6: (a)-(b) RMSE vs Iterations in training/testing dataset. Plots show that BCD-Net using convolutional autoencoder denoiser achieves higher RMSE in training dataset and lower RMSE in testing dataset compared to BCD-Net using U-Net denoiser, thereby generalizing better to unseen dataset. (d)-(e) Reconstructed images of one slice from BCD-Net with convolutional autoencoder and U-Net denoiser.
(a) Impact of number of filter (b) Impact of size of filter
(Fixed ) (Fixed )
(c) Trained with loss (d) (e)
Fig. 7: (a)-(b) Impact of number/size of filter and training loss on testing dataset RMSE. Plots show that BCD-Net achieves lower training RMSE when using larger number and size of filters, however, it does not decrease testing RMSE compared to smaller number and size of filters and BCD-Net with larger size of filter exceedingly blurs image thereby resulting in higher RMSE. (c) Reconstructed image from BCD-Net with filters and thresholding values trained with -loss. It generates unnaturally piece-wise constant image and ignores details in small cold regions.
(a) at each layer
(b) at each layer
Fig. 8: (a) Thresholding values (largest 32 values are shown) at each layer (b) Sum of the values at each layer. Sum of thresholding value decreases as the algorithm iterates. Plots shows that the noise component of signal is removed by thresholding operations at early layer and fine details of true signal are recovered at later layer.
in training and testing case when
Fig. 9: Efficacy of adaptive selection of regularization parameter .
Changed imaging variable Training Testing RMSE Drop (%)
Identical - 4.74 -
Shape and size See Fig. 2 5.49 15.9
Concentration ratio 9:1 4:1 5.55 17.1
Concentration ratio 1.7:1 4:1 5.81 22.5
Trues Count-level 5.01 5.7
Trues Count-level 5.71 20.5
TABLE VII: Impact of imaging variable on generalization capability of BCD-Net.

We tested which imaging variable impacts most on the generalization performance of the proposed BCD-Net. Table VII shows that how BCD-Net performs when training and testing data had same activity distribution and count-level (only difference is Poisson noise) and how the performance of BCD-Net is degraded when each imaging variable is changed between training and testing dataset. We changed one of three factors (shape and size of tumor and liver, concentration ratio, count-level) in training dataset compared to testing dataset. The result shows that generalization performance of the proposed BCD-Net depends largely on all of imaging variables. However, training with higher contrast and lower count-level dataset (compared to testing dataset) gives less degradation of performance compared to the opposite cases. This result suggests that it is better to have noisier data in training dataset than testing dataset. In other words, training for extra noise reduction than needed is better than less noise reduction than needed.

We investigated how each factor in training of denoising module (7) impacts on the generalization capability of BCD-Net. Fig. 7(a)-(b) show the impact of number and size of filters on performance. Plots show that the proposed BCD-Net achieves lower training RMSE when using larger number and size of filters; however, it does not decrease testing RMSE compared to smaller number and size of filters and BCD-Net with larger size of filter exceedingly blurs image thereby resulting in higher RMSE. See Fig. 7(e). This result well corresponds to the above result related to the parameter dimension of NN. We also tested if replacing training loss (mean squared error) in (9) by loss improves the performance. However, it results in unnaturally piece-wise constant image and details in small cold regions are ignored.

Fig. 8 shows plots of the thresholding values at each layer and sum of the values at each layer. Sum of thresholding value decreases as the algorithm iterates. The result corresponds to the motivation of BCD-Net in that the noise component of signal (unwanted signal) is removed by thresholding operations at early layer and fine details of true signal are recovered at later layer.

Fig. 9 shows how regularization parameter in (13) changes with iterations in training and testing datasets. The value in each layer converges to different limits in training and testing cases. Large discrepancy of values between training and testing cases is mostly from the different intensity of normalization correction factor (sum of factors in training case is times larger than testing case) associated in the system model in (1). These empirical results underscore the importance of such adaptive regularization parameter selection schemes proposed in Section II-C2 in PET imaging.

V Conclusion

It is important for a “learned” regularizer to have generalization capability to guarantee the performance when applying it to unseen dataset. For low-count PET reconstruction, the proposed variational NN, BCD-Net, significantly improves the generalization capability, compared to a deep NN, U-Net, to the unseen dataset, when training dataset is small. The proposed BCD-Net achieves significant qualitative and quantitative improvements over the conventional MBIR methods using “hand-crafted” non-trained regularizers, TV and NLM. In particular, these conventional MBIR methods have a trade-off between noise and recovery accuracy, whereas the proposed BCD-Net improves AR for hot regions and reduced noise at the same time. Visual comparison of reconstructed images also shows that the proposed BCD-Net significantly improves PET image reconstruction performance compared to MBIR methods using non-trained regularizers and non-MBIR denoising U-Net.

Future work includes investigating performance of BCD-Net trained with end-to-end training principles and adaptive selection of trainable parameter numbers depending on the size of training dataset.

Vi Acknowledgment

We acknowledge Se Young Chun (UNIST) for providing NLM regularization codes. We also acknowledge Maurizio Conti and Deepak Bharkhada (SIEMENS Healthcare Molecular Imaging) for providing the forward/back projector for TOF measurement data. This work was supported by NIH-NIBIB grant R01EB022075.


  • [1] T. Carlier, K. P. Willowson, E. Fourkal, D. L. Bailey, M. Doss, and M. Conti, “Y90-PET imaging: exploring limitations and accuracy under conditions of low counts and high random fraction,” Med. Phys., vol. 42, no. 7, pp. 4295–309, Jun. 2015.
  • [2] S. Ahn, S. G. Ross, E. Asma, J. Miao, X. Jin, L. Cheng, S. D. Wollenweber, and R. M. Manjeshwar, “Quantitative comparison of OSEM and penalized likelihood image reconstruction using relative difference penalties for clinical PET,” Physics in Medicine & Biology, vol. 60, no. 15, p. 5733, 2015.
  • [3] G. Wang, J. C. Ye, K. Mueller, and J. A. Fessler, “Image reconstruction is a new frontier of machine learning,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1289–96, Jun. 2018.
  • [4]

    H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, and G. Wang, “Low-dose CT with a residual encoder-decoder convolutional neural network,”

    IEEE transactions on medical imaging, vol. 36, no. 12, pp. 2524–2535, 2017.
  • [5] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,” IEEE Transactions on Image Processing, vol. 26, no. 9, pp. 4509–4522, 2017.
  • [6] J. C. Ye, Y. Han, and E. Cha, “Deep convolutional framelets: A general deep learning framework for inverse problems,” SIAM Journal on Imaging Sciences, vol. 11, no. 2, pp. 991–1048, 2018.
  • [7] H. Gupta, K. H. Jin, H. Q. Nguyen, M. T. McCann, and M. Unser, “CNN-based projected gradient descent for consistent CT image reconstruction,” IEEE transactions on medical imaging, vol. 37, no. 6, pp. 1440–1453, 2018.
  • [8] I. Y. Chun, H. Lim, Z. Huang, and J. A. Fessler, “Fast and convergent iterative signal recovery using trained convolutional neural networkss,” in Proc. Allerton Conf. on Commun., Control, and Comput., Allerton, IL, Oct. 2018.
  • [9] H. K. Aggarwal, M. P. Mani, and M. Jacob, “MoDL: model-based deep learning architecture for inverse problems,” IEEE Trans. Med. Imag., vol. 38, no. 2, pp. 394–405, Feb. 2019.
  • [10] K. Hammernik, T. Klatzer, E. Kobler, M. P. Recht, D. K. Sodickson, T. Pock, and F. Knoll, “Learning a variational network for reconstruction of accelerated MRI data,” Magnetic resonance in medicine, vol. 79, no. 6, pp. 3055–3071, 2018.
  • [11] J. Sun, H. Li, Z. Xu et al., “Deep ADMM-Net for compressive sensing MRI,” in Advances in neural information processing systems, 2016, pp. 10–18.
  • [12] M. Mardani, E. Gong, J. Y. Cheng, S. S. Vasanawala, G. Zaharchuk, L. Xing, and J. M. Pauly, “Deep generative adversarial neural networks for compressive sensing MRI,” IEEE transactions on medical imaging, vol. 38, no. 1, pp. 167–179, 2019.
  • [13] G. Yang, S. Yu, H. Dong, G. Slabaugh, P. L. Dragotti, X. Ye, F. Liu, S. Arridge, J. Keegan, Y. Guo et al., “DAGAN: Deep De-Aliasing Generative Adversarial Networks for fast compressed sensing MRI reconstruction,” IEEE transactions on medical imaging, vol. 37, no. 6, pp. 1310–1321, 2018.
  • [14] J. Xu, E. Gong, J. Pauly, and G. Zaharchuk, “200x low-dose PET reconstruction using deep learning,” arXiv preprint arXiv:1712.04119, 2017.
  • [15] B. Yang, L. Ying, and J. Tang, “Artificial Neural Network Enhanced Bayesian PET Image Reconstruction,” IEEE Transactions on Medical Imaging, vol. 37, no. 6, pp. 1297–1309, June 2018.
  • [16] I. Haggstrom, C. R. Schmidtlein, G. Campanella, and T. J. Fuchs, “DeepPET: A deep encoder-decoder network for directly solving the PET image reconstruction inverse problem,” Med. Im. Anal., vol. 54, pp. 253–62, May 2019.
  • [17] K. Gong, J. Guan, K. Kim, X. Zhang, J. Yang, Y. Seo, G. El Fakhri, J. Qi, and Q. Li, “Iterative PET image reconstruction using convolutional neural network representation,” IEEE transactions on medical imaging, vol. 38, no. 3, pp. 675–685, 2019.
  • [18] K. Kim, D. Wu, K. Gong, J. Dutta, J. H. Kim, Y. D. Son, H. K. Kim, G. El Fakhri, and Q. Li, “Penalized PET reconstruction using deep learning prior and local linear fitting,” IEEE transactions on medical imaging, vol. 37, no. 6, pp. 1478–1487, 2018.
  • [19] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention.      Springer, 2015, pp. 234–241.
  • [20] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising,” IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142–3155, July 2017.
  • [21] K. Gregor and Y. LeCun, “Learning fast approximations of sparse coding,” in Proceedings of the 27th International Conference on International Conference on Machine Learning.      Omnipress, 2010, pp. 399–406.
  • [22] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 6, pp. 1256–1272, 2017.
  • [23] I. Y. Chun and J. A. Fessler, “Deep BCD-net using identical encoding-decoding CNN structures for iterative image recovery,” in 2018 IEEE 13th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP).      IEEE, 2018, pp. 1–5.
  • [24] ——, “Convolutional analysis operator learning: Acceleration and convergence,” arXiv preprint arXiv:1802.05584, Jan 2018.
  • [25] I. Y. Chun, D. Hong, B. Adcock, and J. A. Fessler, “Convolutional analysis operator learning: Dependence on training data,” submitted, Feb. 2019.
  • [26] H. Lim, Z. Huang, J. A. Fessler, Y. K. Dewaraja, and I. Y. Chun, “Application of trained deep BCD-Net to iterative low-count PET image reconstruction,” in 2018 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC).      IEEE, 2018.
  • [27] M. Elschot, M. G. Lam, M. A. van den Bosch, M. A. Viergever, and H. W. de Jong, “Quantitative Monte Carlo-based 90Y SPECT reconstruction,” Journal of Nuclear Medicine, vol. 54, no. 9, pp. 1557–1563, 2013.
  • [28] A. S. Pasciak, A. C. Bourgeois, J. M. McKinney, T. T. Chang, D. R. Osborne, S. N. Acuff, and Y. C. Bradley, “Radioembolization and the dynamic role of 90Y PET/CT,” Frontiers in oncology, vol. 4, p. 38, 2014.
  • [29] J. Nuyts, D. Beque, P. Dupont, and L. Mortelmans, “A concave prior penalizing relative differences for maximum-a-posteriori reconstruction in emission tomography,” IEEE Transactions on nuclear science, vol. 49, no. 1, pp. 56–60, 2002.
  • [30] I. Y. Chun and J. A. Fessler, “Convolutional analysis operator learning: Application to sparse-view CT,” in Proc. Asilomar Conf. on Signals, Syst., and Comput., Pacific Grove, CA, Oct. 2018.
  • [31]

    A. R. De Pierro, “A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography,”

    IEEE Trans. Med. Imag., vol. 14, no. 1, pp. 132–7, Mar. 1995.
  • [32] Z. Zhang, S. Rose, J. Ye, A. E. Perkins, B. Chen, C.-M. Kao, E. Y. Sidky, C.-H. Tung, and X. Pan, “Optimization-based image reconstruction from low-count, list-mode TOF-PET data,” IEEE Transactions on Biomedical Engineering, vol. 65, no. 4, pp. 936–946, 2018.
  • [33] A. Chambolle and T. Pock, “An introduction to continuous optimization for imaging,” Acta Numerica, vol. 25, pp. 161–319, 2016.
  • [34] S. Y. Chun, Y. K. Dewaraja, and J. A. Fessler, “Alternating direction method of multiplier for tomography with nonlocal regularizers,” IEEE transactions on medical imaging, vol. 33, no. 10, pp. 1960–1968, 2014.
  • [35] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. & Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, 2010.
  • [36] E. Kang, J. Min, and J. C. Ye, “A deep convolutional neural network using directional wavelets for low-dose X-ray CT reconstruction,” Medical physics, vol. 44, no. 10, pp. e360–e375, 2017.
  • [37] W. Segars, G. Sturgeon, S. Mendonca, J. Grimes, and B. M. Tsui, “4D XCAT phantom for multimodality imaging research,” Medical physics, vol. 37, no. 9, pp. 4902–4915, 2010.
  • [38] H. Lim, Y. K. Dewaraja, and J. A. Fessler, “A PET reconstruction formulation that enforces non-negativity in projection space for bias reduction in Y-90 imaging,” Phys. Med. Biol., vol. 63, no. 3, p. 035042, Feb. 2018.
  • [39] Quality Assurance for PET and PET/CT Systems, ser. Human Health Series.      Vienna: INTERNATIONAL ATOMIC ENERGY AGENCY, 2009, no. 1. [Online]. Available:
  • [40] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in PyTorch,” in NIPS-W, 2017.
  • [41] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [42]

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database,” in

    CVPR09, 2009.