1 Introduction
Positron emission tomography (PET) is a molecular imaging technology where a radioactive tracer is administered to a patient. The tracer is an xray source that emits pairs of photons travelling into opposite directions, and the PET scanner is an arrangement of detectors for detecting such photon pairs (coincidence events). The goal is then to recover the spatial distribution of the tracer (activity map) from these coincidence events.
Acquiring a sufficient amount of coincidence events takes time, typically twenty to forty minutes depending on the detector efficiency and the size of the region being imaged. Organs, such as the heart and lungs, move during the PET data acquisition, so the activity map one seeks to recover in PET imaging is a spatiotemporal quantity. Failure to account for the temporal variability during reconstruction results in a deteriorated PET image.
1.1 Survey of existing works
Most approaches that consider motion in PET image reconstruction assume access to gated PET data. Here, PET data is subdivided into subsets where the coincidence data is from the activity map in a specific temporal state. For cardiac and respiratory motion, gated data would correspond to decomposing the entire dataset into parts that represent different breathing and/or cardiac phases. Hence, the activity associated to each gate can be assumed to be stationary, but data in the gates also suffer from a relatively low signaltonoise ratio since they only contain a small portion of the coincidence events.
A straightforward approach based on gated data is to recover each temporal state of the activity independently of each other (framebyframe reconstruction). This does not account for the temporal dynamics of the activity, instead it reduces the spatiotemporal reconstruction problem into a sequence of independent stationary reconstruction problems, which in PET is done by MLEM [21] (or a variant thereof, like OSEM [14]
). Spatiotemporal reconstruction refers to methods that instead take the temporal dynamics into account. Several approaches have been proposed where most rely on estimating a motion model prior to reconstruction, see
[8, 18, 19, 10] for survey.In this paper, the proposed method falls into the family of algorithms that, contrarily to those based on a priori built motion models, jointly estimate the image and motion, directly from the full set of measured data. An objective function is optimised with respect to two arguments: image and motion. Hence, only one image with the full statistic is reconstructed. Given the close relationship between the image reconstruction and motion estimation steps, a simultaneous method of estimating the two is better able to reduce motion blur and compensate for poor signaltonoise ratios and to improve the accuracy of the estimated motion [11, 12].
In the latter works, one performs a twostep minimisation of a joint energy functional term (which includes both image likelihood and motionmatching terms). The method chosen by Jacobson and Fessler[15, 16], referred to as joint estimation with deformation modelling, is based on maximising the likelihood for a parametric Poisson model for gated PET measurements. Motion (from gate to gate) is defined by a set of deformation parameters. A similar motionaware likelihood function was used by Blume and colleagues [5], although using a distinct optimisation scheme and depicting more convincing results. In this context one may also consult [23], which compares three approaches for joint reconstruction of image and motion.
An alternative is to consider motion models derived from deformations modelled by diffeomorphisms, as obtained from example through the LDDMM framework [22]. Here, one can calculate regularising functionals that incorporate such deformations. Finally, [9] provides an overview of variational shape models as applied to the registration and segmentation problems. These could also be coupled with variational regularisation methods for image reconstruction.
The main drawback of all these methods, however, is the relatively high computational costs involved in the joint reconstruction approach.
1.2 Proposed method
In this paper, we develop a joint reconstruction method based on the minimisation of a suitable functional. The main novelty of our work is the scalability of the resulting algorithm, as its complexity is of the order of the usual MLEM algorithm. Images are indeed estimated using a generalised MLEM algorithm. Motion estimation, with deformations modelled by diffeomorphisms, is based on the unsupervised deep learning framework voxelmorph [6]
. That is, we make use of a pretrained neural network which performs direct image registration, i.e., the network finds a diffeomorphism which, given two images, deforms the first one into the second.
Interestingly, one single outer iteration of our algorithm is close to the recently proposed approach [17]. Thus, it generalises the previous work and shows that it can be interpreted in the framework of an optimisation problem.
The results of the proposed method are tested on the Derenzo phantom, and shown to recover a significant part of the information lost when one uses gatebygate reconstruction.
2 Methods
2.1 Mathematical background
2.1.1 MLEM algorithm [21].
Let us consider the statistical model
where is the unknown image, and
is the acquired data—a vector of
; this models the physics of stationary PET with forward operator .The MLEM algorithm solves the corresponding maximum likelihood problem, which amounts to minimising the divergence , defined for two nonnegative vectors , in by
The MLEM algorithm is an iterative solver with update
(1) 
starting from an initial guess , usually .
2.1.2 Diffeomorphisms acting on images.
Viewing images as elements of , i.e., squareintegrable functions on a compact with or , we model motion as an appropriate group action of diffeomorphisms onto images. In this paper, given a diffeomorphism , we will use the specific definition as the intensitypreserving action
Note that our approach is, however, general, and we could have used the masspreserving action instead, namely
(2) 
We will parameterise diffeomorphisms by exponentials of (stationary) vector fields, i.e., , where the exponential of a vector field is defined as , where solves the differential equation , with initial condition .
2.1.3 Image registration.
The (direct) image registration problem consists in deforming a template into a target , i.e., finding a diffeomorphism such that . This is usually done by minimising a functional of the form
(3) 
where is the distance on , is a regularisation term on diffeomorphisms that is discussed in subsection 2.3, and is a regularisation parameter.
2.2 General approach
2.2.1 Modelling.
We are given gated data in different gates, corrupted by Poisson noise. For denoting the data, the images in each gate and the forward operator, we thus assume
We also assume that for , two consecutive images and are related by the statistical model
where
is the exponential of a vector field following a given probability law (see (
8)) and is avalued random variable.
2.2.2 Variational problem.
We now define the variational problem associated to the inverse problem of finding both the images and diffeomorphisms from the data . It reads
(4) 
where
2.2.3 General algorithm.
We solve the variational problem (4) by an intertwined method, which consists in alternating between estimating the diffeomorphisms (the motion estimation step), and the images (the reconstruction step).
The images are first initialised by solving the maximum likelihood problem , associated to in each gate. This is done by the algorithm MLEM (1), yielding estimates , .
For a given estimate of images , the motion estimation part consists in solving
which in turn can be decomposed into problems of the form
(5) 
Note that each of these becomes an image registration problem, as we are looking for a diffeomorphism matching the template against the target .
For the reconstruction part, we assume for and neglect the corresponding terms. The minimisation problem thus becomes
We then focus on a particular gate, say the zero’th gate, and use to obtain the optimisation problem with as the only variable:
(6) 
where
(7) 
and we have used the notation for . Solving the above yields a next estimate for . All the images are then obtained by , .
2.3 Motion estimation
The motion estimation problem (5), can be rewritten for two generic images and as
(8) 
where we parameterise the diffeomorphisms by exponentials of stationary vector fields .
To solve this direct image registration problem, we use the voxelmorph unsupervised deep learning approach [6], where a neural network parameterises a function . That neural network is itself based on the network architecture Unet [20]. We keep the architecture of voxelmorph
, with the same hyperparameters and specific regularisation functional
given in [6]. Once trained, the network produces a mapping matching any two images , which we denote(9) 
Training. In [6], the network is trained on tuples of images coming from brain MRI scans. We use instead synthetic data: tuples of images generated on the fly.
We generate training images as follows. A random image consists of a Poisson random number of ellipsoids [3, 4]
. The centre of each ellipsoid has uniform distribution inside the central part of the domain
, the principal axes have exponential distribution, and the orientation follows a uniform distribution. We apply a mask vanishing at the boundary to avoid boundary effects. when diffeomorphisms are applied. We generate random vector fields
using a Gaussian random field with radial basis function kernel, with appropriate scale and typical size. The training image
is then . We show in Figure 1 a sample of images , and vector field generated as above.2.4 Reconstruction
We now focus on the reconstruction problem (6) which we solve using a reformulation of MLEM. Given operators , we can simply write MLEM for the compound operator which yields
(10) 
for an initial guess . We call this algorithm “MMLEM” to avoid the confusion with the vanilla MLEM algorithm (1). We use this algorithm with defined in (7). Note that this algorithm has been used in [13] for the particular case of the intensitypreserving action. The computation of requires the computation of . We achieve this by using the identity valid for any diffeomorphism , where denotes the masspreserving action (2).
2.5 Full algorithm
We summarise the algorithm with all the necessary details in Algorithm 1.
2.6 Complexity
Evaluating vector fields with the network is negligible when compared to MLEM or MMLEM iterations. Each iteration is itself controlled by the time required to compute an expression of the form . Since MMLEM sums these quantities times, an iteration of it is of the order of . Note that evaluating the denominator in (10) (which involves sums of ) does not take more time than evaluating the denominator in MLEM since can be computed offline.
3 Results
3.1 Derenzo phantom
We present experiments with the Derenzo phantom, with image size . Although this phantom is made of ellipses, we stress that they are very different from the data used to train the network, compare Figure 1 and (a).
This phantom is then deformed successively with the intensity preserving action by exponentials of vector fields, where each vector field is drawn from the same distribution used to train the network. For the experiments, we use , which amounts to four gates, and we want to recover the image in the initial gate. The resulting four phantoms are presented in Figure 2.
The forward operator is a 2D PET operator with angles (views) and tangential positions. The noisy data is for each image , where is the acquisition time and thus controls the noise level.
For the phantoms in Figure 2, we choose . This noise level gives rise to typical optimal numbers of iterates for MLEM which are of the same order of magnitude as the ones in clinic applications. Note that all images are multiplied by the same time factor, which amounts to assuming that acquisition time is roughly the same in each gate.
3.2 Methods without motion correction
We compare our method with two simple reconstruction methods (simple because without motion correction) for images with gated data:

Either one aggregates the whole data and reconstructs from MLEM, leading to blurry results because of the movement.

Or one tries and limit blur by focusing on one gate (say the first) and reconstructing only from that. Since there is less data, the result is noisier.
In order to quantitatively compare these strategies, we use MLEM for the data obtained from taking gate zero only, aggregating gates zero and one, and so on up until aggregating all the four gates. Finally, we can also estimate the best reconstruction one could hope for, that is, if there were no movement. This amounts to acquiring the phantom in the th gate four times longer.
The results are given in Figure 3, where the PSNR between the estimated image and the real image in gate zero is computed at each iteration.
The results show that aggregating the gates progressively induces a drop in image quality, as measured by the PSNR. Compared to gate zero acquired four times longer, the best possible achievable gain is about .
3.3 Proposed method
We apply Algorithm 1 to the data above. It turns out that a single outer iteration is responsible for most of the improvement, so we focus on that case for presenting experiments. In other words:

we initialise by running some MLEM iterations in each gate,

we then match the resulting images to estimate the diffeomorphisms,

we finally run some MMLEM iterations.
We plot the PSNR between the image reconstructed (in the initial gate zero) and the real image, for a given number em_iter of MLEM iteration followed by a given number diff_iter of MMLEM iterations. These results are presented in Figure 4.
We find that the optimal strategy is to iterate only a few times (six iterations in this specific experiment) with MLEM before estimating the diffeomorphisms through MMLEM ( iterations in this specific experiment). Note that this yields a total of iterations which is higher than the MLEM iterations which would be optimal for reconstructing from the gate zero.
The gain in PSNR is , which makes up for about of the maximal gain of . Reconstructions obtained from the optimal uncorrected ( iterations of MLEM are used on gate zero) and the proposed method with the optimal number of iterations of MLEM and MMLEM are presented in Figure 5. The proposed method seems to give smoother results. The smaller discs towards the middle of the image are also better seen.
We also emphasise that these results (improvement in PSNR and optimal number of iterations) are extremely robust with respect to the randomness involved in the experiments, namely the vector fields drawn randomly as well as the Poisson noise.
3.4 Implementation Details
4 Perspectives
This paper presents a new method for joint motion estimation and image reconstruction in PET. Its main advantage is its cost, similar to that of the usual MLEM algorithm, making it scalable to clinical 4D data.
Our framework also allows for further modelling such as attenuation correction. In a future work, we consider testing this method with clinical data. This would require training the network on appropriate datasets. We also plan to generalise the approach to other group actions, such as the masspreserving one, which is more physically relevant.
Acknowledgements
We acknowledge support from the Swedish Foundation of Strategic Research grant AM13004.
References

[1]
Abadi, M., et al.: TensorFlow: Largescale machine learning on heterogeneous systems (2015),
http://tensorflow.org/, software available from tensorflow.org  [2] Adler, J., Kohr, H., Öktem, O.: ODLa Python framework for rapid prototyping in inverse problems. Royal Institute of Technology (2017)
 [3] Adler, J., Öktem, O.: Solving illposed inverse problems using iterative deep neural networks. Inverse Problems 33(12), 124007 (2017)
 [4] Adler, J., Öktem, O.: Learned primaldual reconstruction. IEEE transactions on medical imaging 37(6), 1322–1332 (2018)
 [5] Blume, M., MartinezMoller, A., Keil, A., Navab, N., Rafecas, M.: Joint reconstruction of image and motion in gated positron emission tomography. IEEE Transactions on Medical Imaging 29(11), 1892—1906 (2010)

[6]
Dalca, A.V., Balakrishnan, G., Guttag, J., Sabuncu, M.R.: Unsupervised learning for fast probabilistic diffeomorphic registration. In: International Conference on Medical Image Computing and ComputerAssisted Intervention. pp. 729–738. Springer (2018)

[7]
Dalca, A.V., Guttag, J., Sabuncu, M.R.: Anatomical priors in convolutional networks for unsupervised biomedical segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 9290–9299 (2018)
 [8] Dawood, M., Jiang, X., Schäfers, K.P. (eds.): Correction Techniques in Emission Tomography. Series in Medical Physics and Biomedical Engineering, CRC Press (2008)
 [9] Farag, A.A., Shalaby, A., El Munim, H.A., Farag, A.: Variational shape representation for modeling, elastic registration and segmentation. In: Li, S., Tavares, J.M.T.S. (eds.) Shape Analysis in Medical Image Analysis, Lecture Notes in Computational Vision and Biomechanics, vol. 14, chap. 95–122. SpringerVerlag (2014)
 [10] Gigengack, F., Jiang, X., Dawood, M., Schäfers, K.P.: Motion Correction in Thoracic Positron Emission Tomography. SpringerVerlag (2015)
 [11] Gilland, D.R., Mair, B.A., Bowsher, J.E., Jaszczak, R.J.: Simultaneous reconstruction and motion estimation for gated cardiac ECT. IEEE Transactions in Nuclear Science 49(5), 2344—2349 (2002)
 [12] Gravier, E., Yang, Y., King, M.A., Jin, M.: Fully 4D motioncompensated reconstruction of cardiac SPECT images. Physics in Medicine and Biology 51(18), 4603—4619. (2006)
 [13] Hinkle, J., Szegedi, M., Wang, B., Salter, B., Joshi, S.: 4d ct image reconstruction with diffeomorphic motion model. Medical image analysis 16(6), 1307–1316 (2012)
 [14] Hudson, H.M., Larkin, R.S.: Accelerated image reconstruction using ordered subsets of projection data. IEEE transactions on medical imaging 13(4), 601–609 (1994)
 [15] Jacobson, M.W., Fessler, J.A.: Joint estimation of image and deformation parameters in motioncorrected PET. In: 2003 IEEE Nuclear Science Symposium Conference Record. pp. 3290—3294 (2003)
 [16] Jacobson, M.W., Fessler, J.A.: Joint estimation of respiratory motion and activity in 4D PET using CT side information. In: 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro. Arlington, VA, April 6–9, 2006. pp. 275—278 (2006)
 [17] Li, T., Zhang, M., Qi, W., Asma, E., Qi, J.: Motion correction of respiratorygated pet image using deep learning based image registration framework. In: 15th International Meeting on Fully ThreeDimensional Image Reconstruction in Radiology and Nuclear Medicine. vol. 11072, p. 110720Q. International Society for Optics and Photonics (2019)
 [18] Rahmim, A., Tang, J., Zaidi, H.: Fourdimensional image reconstruction strategies in cardiacgated and respiratory gated PET imaging. PET Clinics 8(1), 51–67 (2013)
 [19] Reader, A.J., Verhaeghe, J.: 4D image reconstruction for emission tomography. Physics in Medicine and Biology 59(22), R371–R418 (2014)
 [20] Ronneberger, O., Fischer, P., Brox, T.: Unet: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computerassisted intervention. pp. 234–241. Springer (2015)
 [21] Shepp, L.A., Vardi, Y.: Maximum likelihood reconstruction for emission tomography. IEEE transactions on medical imaging 1(2), 113–122 (1982)
 [22] Younes, L.: Shapes and Diffeomorphisms, Applied Mathematical Sciences, vol. 171. SpringerVerlag (2010)
 [23] Zhang, Y., Ghodrati, A., Brooks, D.H.: An analytical comparison of three spatiotemporal regularization methods for dynamic linear inverse problems in a common statistical framework. Inverse Problems 21(1), 357–382 (2005)
Comments
There are no comments yet.