Deep Kernel Representation for Image Reconstruction in PET

10/04/2021 ∙ by Siqi Li, et al. ∙ University of California-Davis 0

Image reconstruction for positron emission tomography (PET) is challenging because of the ill-conditioned tomographic problem and low counting statistics. Kernel methods address this challenge by using kernel representation to incorporate image prior information in the forward model of iterative PET image reconstruction. Existing kernel methods construct the kernels commonly using an empirical process, which may lead to suboptimal performance. In this paper, we describe the equivalence between the kernel representation and a trainable neural network model. A deep kernel method is proposed by exploiting deep neural networks to enable an automated learning of an optimized kernel model. The proposed method is directly applicable to single subjects. The training process utilizes available image prior data to seek the best way to form a set of robust kernels optimally rather than empirically. The results from computer simulations and a real patient dataset demonstrate that the proposed deep kernel method can outperform existing kernel method and neural network method for dynamic PET image reconstruction.



There are no comments yet.


page 4

page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Positron emission tomography (PET) is an imaging modality for quantitatively measuring biochemical and physiological processes in vivo by using a radiotracer. Image reconstruction for PET is challenging due to the ill-conditioned tomographic problem and low-counting statistics (high noise) of PET data, for example, in dynamic PET imaging where short time frames are used to monitor rapid change in tracer distribution.

Among different methods of PET image reconstruction, the kernel methods (e.g., [1, 2, 3, 4, 5]) address the noise challenge by uniquely integrating image prior information into the forward model of PET image reconstruction through a kernel representation framework [1]. Image prior information may come from composite time frames of a dynamic PET scan [1], or from anatomical images (e.g., MRI image [2, 3]

in integrated PET/MRI). The kernel methods can be easily implemented with the existing expectation-maximization (EM) algorithm and have demonstrated substantial image quality improvement as compared to other methods

[1, 2, 3].

In the existing kernel methods, a kernel representation is commonly built using an empirical process for defining feature vectors and manually selecting method-associated parameters

[1]. However, such an experience-based parameter tuning and optimization approach often leads to suboptimal performance. In this paper, we first describe the equivalence between the kernel representation and a trainable neural network model. Based on this connection, we then propose a deep kernel method that learns the trainable components of the neural network model from available image data to enable a data-driven automated learning of an optimized kernel model. The learned kernel model is then applied to tomographic image reconstruction and is expected to outperform existing kernel models that are empirically defined.

There are a lot of ongoing efforts in the field to explore deep learning with neural networks for PET image reconstruction. Deep neural networks have been proposed for direct mapping from the projection domain to the image domain

[6] but the models are so far mainly practical for 2D data training. By unrolling an iterative tomographic reconstruction algorithm, model-based deep-learning reconstruction [7] represents a promising direction. One limitation of this method is it requires pre-training using a large number of data sets and involves projection data in the iterative training process, which is computationally intensive. Alternatively, neural networks can be used for image representation in iterative reconstruction, e.g. by the deep image prior model [8]. The resulting reconstruction problem, however, is nonlinear and is often complex and challenging to optimize.

Different from these methods that utilize pure neural networks, the proposed deep kernel method combines deep neural networks into the kernel framework to form a novel way for tomographic image representation. The method has a unique advantage that once the model is trained with neural networks, the unknown kernel coefficient image remains linear in the model and is therefore easy to be reconstructed from PET data. It does not necessarily require a large data set for training but is directly applicable to single-subject learning and reconstruction.

Figure 1: The construction of kernel representation for PET image is described as a series of neural network modules.

2 Background

2.1 PET Image Reconstruction

PET projection data

can be well modeled as independent Poisson random variables using the log-likelihood function



where denotes the detector index and is the total number of detector pairs. The expectation of the projection data, , is related to the unknown image through



is the detection probability matrix for PET and includes normalization factors for scanner sensitivity, scan duration, deadtime correction and attenuation correction.

is the expectation of random and scattered events [9].

The maximum likelihood (ML) estimate of the image

is found by maximizing the Poisson log-likelihood,


A common way of seeking the solution of Eq. (3) is to use the EM algorithm [10].

2.2 Kernel Methods for PET Reconstruction

The kernel methods describe the image intensity at the pixel as a linear representation of kernels [1],


where defines the neighborhood of pixel and is the total number of image pixels. and are the feature vectors extracted from image priors for pixel and pixel , respectively. is the kernel coefficient at pixel . is the kernel function that defines a weight between pixel and pixel . A popular choice of is the radial Gaussian kernel,


with being the kernel parameter. The equivalent matrix-vector form of Eq. (4) is


with the th element of the square kernel matrix being .

The kernel coefficient image is then estimated from the projection data by maximizing the log-likelihood ,


which can be solved using the kernelized EM algorithm [1],


where denotes the iteration number and the superscript “” denotes matrix transpose. is a vector with all elements being 1. Once is estimated, the final PET activity image is given by

Note that in practice, a normalized kernel matrix


is commonly used for better performance [1]. The (,)th element of is equivalent to


The neighborhood of pixel can be defined by its

-nearest neighbors (kNN)

[14] to make sparse. The feature vector is usually set to the intensity values of the image prior at pixel and the kernel parameter is chosen empirically, e.g. .

3 Proposed Deep Kernel Method

3.1 Kernel Representation as Neural Networks

We first describe the kernel representation using a neural network description illustrated in Fig. 1

. The construction of kernel representation is decomposed into two main modules: (1) feature extraction and (2) pairwise attention.

Denote the image prior data by which consists of prior images. The feature extraction module is to extract a feature vector of length for each image pixel from ,



denotes the feature extraction operator, for example, a convolutional neural network. The extraction of conventional intensity-based features is equivalent to a

convolution operations on (if the images are 3D). This step provides a feature data of size where can be equal to .

The pairwise attention module first calculates the similarity between pixel and its neighboring pixels that are specified by a pre-determined neighborhood ,


Note that here only neighbors are selected for each pixel (due to kNN). This leads to a similarity data of size . Pairwise weights are then calculated from using


generating a weight data of size . Here . This type of weight calculation is also called softmax in neural networks and can be directly explained as a pairwise attention mechanism [11, 12]. is the attention weight of other “key” pixels as compared to the “query” pixel .

The final step reshapes using the neighborhood indices defined by to generate a sparse matrix, which is equal to the normalized kernel matrix defined in Eq. (9). Each row of the kernel matrix is of the size and can be displayed as an image, which can also be understood as an attention map for the corresponding pixel in .

3.2 Deep Kernel Model

Integrating all the neural network components in Fig. 1 together, we have the following deep kernel model to represent a PET image ,


where denotes the equivalent neural network model of with the image prior data as the input and collecting any model parameters that are trainable.

The deep kernel model is nonlinear with respect to and but remains linear with respect to the kernel coefficient image . While this model shares the spirit of using attention with the nonlocal neural network [12], the linearity of makes it unique and more suitable for tomographic image reconstruction problems. Once is determined, can be easily reconstructed from the projection data using the kernelized EM algorithm in Eq. (8).

In conventional kernel methods, is determined empirically, which often leads to suboptimal performance. For example, intensity-based features are commonly used for . However, convolutional neural network-derived features can be more informative [13]. In this paper, we exploit the capability of deep learning to train an optimized form for generating from available image prior data based on the proposed deep kernel model.

3.3 Deep Kernel Learning

The deep kernel learning problem is formulated using the observation that in the kernelized image model Eq. (6), is usually a clean version of if

is noisy. This inspires the following use of the denoising autoencoder framework

[15] to train the model parameters of ,


where denotes the th high-quality image in the training dataset and is a corrupted version of . is the total number of training image pairs. In PET, and can be obtained from high count data and low-count data, respectively.

The deep kernel model can be pretrained using a large number of patient scans (large ) if such a training dataset is available. It can also be trained online for single subject (small ) without pretraining, as described below.

3.4 Single-Subject Deep Kernel Method for Dynamic PET

In dynamic PET, the image prior data may consist of several composite images where is the number of composite frames (e.g. ). These images are reconstructed from the rebinned long-scan projection data [1] and may have good image quality due to the relatively high count level of a composite frame. The composite image prior has been used in the standard kernel methods for constructing the kernel matrix empirically. Here we use it to train an optimized kernel model adaptive to a single subject.

The single-subject deep kernel learning problem is formulated as


where the corrupted image can be obtained from the reconstruction of the low-count projection data that are downsampled from using a count reduction factor (e.g. ). Once is trained, the learned kernel model is then used to reconstruct all the dynamic frames of the scan frame-by-frame using the kernel EM algorithm in Eq. (8).

In theory, both the feature extraction and pairwise attention modules in the neural network model (Fig. 1) are trainable. As a proof of concept, we only train the feature extraction operator in this preliminary work. A residual U-net architecture [8] is used for the feature extraction module .

Figure 2: Digital phantom and time activity curves used in the simulation studies. (a) Zubal brain phantom composed of gray matter, white matter, blood pools and a tumor (15 mm in diameter). (b) Regional time activity curves.
Figure 3: Attention maps of two different pixels, one in the tumor region (top row) and the other in the gray matter (bottom row) provided by (a) traditional MLEM, (b) kernel method without deep learning, and (c) kernel method with deep learning. All the attention maps are overlaid on the structural images.

4 Computer Simulation Validation

4.1 Simulation Setup

Dynamic F-fluorodeoxyglucose (FDG) PET scans were simulated for a GE 690 PET scanner using a Zubal head phantom shown in Fig. 2a. The phantom is composed of gray matter, white matter, blood pools and a tumor (15 mm in diameter). The scanning schedule consisted of 24 time frames over one hour: 420s, 440s, 460s, 4180s, and 8300s. The time acitivity curve of each region is shown in Fig. 2b. An attenuation map was simulated with a constant linear attenuation coefficient assigned in the whole brain. Dynamic images were first forward projected to generate noise-free sinograms. A 20% uniform background was included to account for mean random and scatter events. Poisson noise was then introduced with the expected total number of events over 60 min set to be 8 million.

4.2 Reconstruction Methods

The simuated dynamic data were reconstructed using four different methods: (1) standard ML-EM reconstruction; (2) existing kernel method without learning [1]; (3) proposed deep kernel method with online training of the feature extraction; and (4) the deep image prior (DIP) reconstruction method [8] as a recent representative of nonlinear neural network-based reconstruction methods.

The image prior for the kernel methods and DIP were the composite images obtained from three composite frames, each with 20 min scan duration, using the approach described in [1]. For the conventional kernel method, the setting was the same as empirically optimized in [1]. The DIP method was implemented in a way similar to [8] but was adapted to use the composite image prior data as the input of the U-net.

In the deep kernel method, the low-count images were obtained by using one-tenth of the counts in each composite frame. Five hundred iterations were used for the training with the learning rate set to . The in kNN for defining the neighborhood was set to be 200 for nearly optimal performance.

Figure 4: True activity images and reconstructed images by different reconstruction methods for frame 2 (top row) and frame 24 (bottom row). (a) True images, (b) ML-EM, (c) DIP method, (d) conventional kernel method, and (e) proposed deep kernel method.
Figure 5: Plots of image SNR as a function of iteration number in the 2th and 24th frames, respectively.
Figure 6: Plot of image SNR across all time frames.
(a) Frame 10 at
(b) Frame 45 at
Figure 7: Reconstruction of high-temporal resolution frames (2s/frame) at (a) and (b) by different methods: MLEM, conventional kernel method and proposed deep kernel method.

4.3 Evaluation Metrics

Different image reconstruction methods were compared using the image signal-to-noise ratio (SNR) defined by


where is an image estimate of frame obtained with one of the reconstruction methods and denotes the ground truth image.

4.4 Demonstration of Attention Map for the Kernel Methods

To understand how the deep kernel method may improve image reconstruction, Fig. 3 shows the attention weight maps for two different pixels, one from the tumor and the other from the gray matter region. These attention maps were generated by reshaping the corresponding row of the kernel matrix for a pixel

. The traditional MLEM reconstruction can be considered as a special pixel kernel method for which the kernel matrix is the identity matrix. As illustrated in Fig.

3(a), the attention of MLEM just focuses on the current pixel . No spatial correlation is explored by this pixel kernel.

The kernel methods are able to exploit spatial correlation from pixels that are considered as neighbors of pixel by kNN. The attention is now not only on the current pixel but also spreads to similar pixels in the image space, which in turn can help image reconstruction. However, neighboring pixels may be falsely identified if in kNN is large [1]. Without deep learning, the existing kernel model is unable to exclude the effect of those false neighbors, resulting in worse reconstruction as increases [1]. As shown in Fig. 3(b), high attention weights also appear in the gray matter region for a tumor pixel, and vice versa.

In comparison, the deep kernel model with training can learn a way to assign a more appropriate weight to irrelevant pixels even if those pixels are initially included in the nearest neighbors. Fig. 3(c) shows that with deep learning, attention is predominantly extracted in the tumor region for a tumor pixel and in the gray matter region for a gray matter pixel.

Figure 8: Time activity curves reconstructed by different algorithms for the pixels in different regions: (a) descending aorta and (b) myocardium.

4.5 Image Quality Comparisons

Fig. 4 shows the reconstructed images for an early frame #2 and a late frame #24 by different reconstruction algorithms with 60 iterations. The results of image SNR are included. The proposed deep kernel method achieved a better image quality with higher SNR as compared to other three methods, especially in the blood region in frame 2 and in the tumor region in frame 24. The DIP method generally led to oversmooth images.

Fig. 5 shows the SNR plots of frame 2 and frame 24 by varying the iteration number in each reconstruction algorithm. Fig. 6 further shows the SNR results of all different time frames at iteration 60. The DIP method was only comparable to the conventional kernel method for dynamic PET image reconstruction. In contrast, the deep kernel demonstrated a significant improvement over other methods.

5 Application to Patient Data

5.1 Data Acquisition

A cardiac patient scan was performed on the GE Discovery ST PET/CT scanner in 2D mode at the UC Davis Medical Center. The patient received approximately 20 mCi F-FDG with a bolus injection, followed by an immediate data scan. A low-dose transmission CT scan was then performed at the end of PET scan to provide the CT image for PET attenuation correction. The raw data of the first 90 seconds were binned into a total of 45 dynamic frames with a 2-second duration. The projection data size was and the image size was . The data correction sinograms of each frame, including normalization, attenuation correction, scattered correction and randoms correction, were extracted using the vendor software used in the reconstruction process.

The dynamic data were reconstructed using different methods. Five composite frames (, ) were used to build the kernel matrix .

5.2 Results

Fig. 7 shows the reconstructed images using different algorithms for reconstructing two high-temporal resolution (HTR) frames, one at s and the other at s. The ML-EM reconstructions were extremely noisy and the conventional kernel method substantially reduced the noise. The DIP method led to oversmoothed images (results not shown). In comparison, the images by the proposed deep kernel demonstrated an improvement with clearer structures and lower noise in the left ventricle, right ventricle and myocardium, though no ground truth is available for the real dataset.

Fig. 8 further shows the HTR time activity curves for two pixels in the descending aorta (AOR) and myocardium (MYO). The MLEM suffered from heavy noise. While the conventional kernel method achieved a significant noise reduction at high-activity time points, those low-activity time points still suffer from noise. In comparison, the deep kernel method demonstrated an effective noise reduction for both early-time and late-time points.

6 Conclusion

In this paper, we have developed a new deep kernel method for PET image reconstruction. The proposed deep kernel model allows the construction of kernel representation to be trained from data optimally rather than defined by an empirical process. Computer simulation and patient results have demonstrated the improvement of the deep kernel method over existing methods in dynamic PET imaging.


  • [1] G. Wang and J. Qi, “PET image reconstruction using kernel method,” IEEE Trans. Med. Imag., vol. 34, no. 1, pp. 61-71, Jan., 2015.
  • [2] W. Hutchcroft, G. B. Wang, K. Chen, C. Catana, and J. Qi, “Anatomically-aided PET reconstruction using the kernel method,” Phys. Med. Biol., vol. 61, no. 18, pp. 6668-6683, 2016.
  • [3] P. Novosad and A. J. Reader, “MR-guided dynamic PET reconstruction with the kernel method and spectral temporal basis functions,” Phys. Med. Biol., vol. 61, no. 12, pp. 4624-4645, 2016.
  • [4] G. Wang, “High temporal-resolution dynamic PET image reconstruction using a new spatiotemporal kernel method,” IEEE Trans. on Med. Imag., vol. 38, no. 3, pp. 664-674, Mar., 2019.
  • [5] J. Bland et al.,“MR-guided kernel EM reconstruction for reduced dose PET imaging,”IEEE Trans. Radiat. Plasma Med. Sci., vol. 2, no. 3, pp. 235-243, May 2018.
  • [6] I. Hggstrm, C. R. Schmidtlein, G. Campanella, and T. J. Fuchs, “DeepPET: A deep encoder-decoder network for directly solving the PET image reconstruction inverse problem,” Med. Imag. Anal., vol. 54, pp. 253–262, May 2019.
  • [7] A. J. Reader, G. Corda, A. Mehranian, et al., “Deep learning for PET image reconstruction,” IEEE Transactions on Radiation and Plasma Medical Sciences, vol. 5, no. 1, pp. 1-24, Jan. 2021.
  • [8] K. Gong, C. Catana, J. Y. Qi, and Q. Z. Li, “PET image reconstruction using deep image prior,” IEEE Trans. Med. Imag., vol. 38, no. 7, pp. 1655-1665, Jul., 2019.
  • [9] J. Qi and R. M. Leahy, “Iterative reconstruction techniques in emission computed tomography,” Phys. Med. Biol., vol. 51, no. 15, pp. R541-R578, 2006.
  • [10] L. A. Shepp and Y. Vardi, “Maximum likelihood reconstruction for emission tomography,” IEEE Trans. Med. Imag., vol. MI-1, no. 2, pp. 113-122, Oct., 1982.
  • [11] A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” Advances in Neural Information Processing Systems , 2017, pp. 5998–6008.
  • [12] X. Wang, R. Girshick, A. Gupta et al., “Non-local neural networks,”

    in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , 2018, pp. 7794-7803.
  • [13]

    S. Li and G. Wang, “Modified kernel MLAA using autoencoder for PET-enabled dual-energy CT.”

    Philosophical Transactions of the Royal Society A, vol. 379, no. 2204, 2021.
  • [14] J. H. Friedman, J. Bentely, and R. A. Finkel, “An algorithm for finding best matches in logarithmic expected time,” ACM Trans. Math. Software, vol. 3, no. 3, pp. 209-226, 1977.
  • [15]

    M. A. Kramer, “Nonlinear principal component analysis using autoassociative neural networks,”

    AIChE Journal, vol. 37, no. 2, pp. 233-243, 1991.
  • [16] R. L. Harrison et al., “Preliminary experience with the photon history generator module of a public-domain simulation system for emission tomography,” in Conf. Rec. IEEE Nucl. Sci. Symp. Med. Imag. Conf., pp. 1154-11581, 1993.