Spatiotemporal PET reconstruction using ML-EM with learned diffeomorphic deformation

08/26/2019 ∙ by Ozan Öktem, et al. ∙ 4

Patient movement in emission tomography deteriorates reconstruction quality because of motion blur. Gating the data improves the situation somewhat: each gate contains a movement phase which is approximately stationary. A standard method is to use only the data from a few gates, with little movement between them. However, the corresponding loss of data entails an increase of noise. Motion correction algorithms have been implemented to take into account all the gated data, but they do not scale well, especially not in 3D. We propose a novel motion correction algorithm which addresses the scalability issue. Our approach is to combine an enhanced ML-EM algorithm with deep learning based movement registration. The training is unsupervised, and with artificial data. We expect this approach to scale very well to higher resolutions and to 3D, as the overall cost of our algorithm is only marginally greater than that of a standard ML-EM algorithm. We show that we can significantly decrease the noise corresponding to a limited number of gates.



There are no comments yet.


page 6

page 8

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Positron emission tomography (PET) is a molecular imaging technology where a radioactive tracer is administered to a patient. The tracer is an x-ray source that emits pairs of photons travelling into opposite directions, and the PET scanner is an arrangement of detectors for detecting such photon pairs (coincidence events). The goal is then to recover the spatial distribution of the tracer (activity map) from these coincidence events.

Acquiring a sufficient amount of coincidence events takes time, typically twenty to forty minutes depending on the detector efficiency and the size of the region being imaged. Organs, such as the heart and lungs, move during the PET data acquisition, so the activity map one seeks to recover in PET imaging is a spatiotemporal quantity. Failure to account for the temporal variability during reconstruction results in a deteriorated PET image.

1.1 Survey of existing works

Most approaches that consider motion in PET image reconstruction assume access to gated PET data. Here, PET data is subdivided into subsets where the coincidence data is from the activity map in a specific temporal state. For cardiac and respiratory motion, gated data would correspond to decomposing the entire dataset into parts that represent different breathing and/or cardiac phases. Hence, the activity associated to each gate can be assumed to be stationary, but data in the gates also suffer from a relatively low signal-to-noise ratio since they only contain a small portion of the coincidence events.

A straightforward approach based on gated data is to recover each temporal state of the activity independently of each other (frame-by-frame reconstruction). This does not account for the temporal dynamics of the activity, instead it reduces the spatiotemporal reconstruction problem into a sequence of independent stationary reconstruction problems, which in PET is done by ML-EM [21] (or a variant thereof, like OSEM [14]

). Spatiotemporal reconstruction refers to methods that instead take the temporal dynamics into account. Several approaches have been proposed where most rely on estimating a motion model prior to reconstruction, see

[8, 18, 19, 10] for survey.

In this paper, the proposed method falls into the family of algorithms that, contrarily to those based on a priori built motion models, jointly estimate the image and motion, directly from the full set of measured data. An objective function is optimised with respect to two arguments: image and motion. Hence, only one image with the full statistic is reconstructed. Given the close relationship between the image reconstruction and motion estimation steps, a simultaneous method of estimating the two is better able to reduce motion blur and compensate for poor signal-to-noise ratios and to improve the accuracy of the estimated motion [11, 12].

In the latter works, one performs a two-step minimisation of a joint energy functional term (which includes both image likelihood and motion-matching terms). The method chosen by Jacobson and Fessler[15, 16], referred to as joint estimation with deformation modelling, is based on maximising the likelihood for a parametric Poisson model for gated PET measurements. Motion (from gate to gate) is defined by a set of deformation parameters. A similar motion-aware likelihood function was used by Blume and colleagues [5], although using a distinct optimisation scheme and depicting more convincing results. In this context one may also consult [23], which compares three approaches for joint reconstruction of image and motion.

An alternative is to consider motion models derived from deformations modelled by diffeomorphisms, as obtained from example through the LDDMM framework [22]. Here, one can calculate regularising functionals that incorporate such deformations. Finally, [9] provides an overview of variational shape models as applied to the registration and segmentation problems. These could also be coupled with variational regularisation methods for image reconstruction.

The main drawback of all these methods, however, is the relatively high computational costs involved in the joint reconstruction approach.

1.2 Proposed method

In this paper, we develop a joint reconstruction method based on the minimisation of a suitable functional. The main novelty of our work is the scalability of the resulting algorithm, as its complexity is of the order of the usual ML-EM algorithm. Images are indeed estimated using a generalised ML-EM algorithm. Motion estimation, with deformations modelled by diffeomorphisms, is based on the unsupervised deep learning framework voxelmorph [6]

. That is, we make use of a pre-trained neural network which performs direct image registration, i.e., the network finds a diffeomorphism which, given two images, deforms the first one into the second.

Interestingly, one single outer iteration of our algorithm is close to the recently proposed approach [17]. Thus, it generalises the previous work and shows that it can be interpreted in the framework of an optimisation problem.

The results of the proposed method are tested on the Derenzo phantom, and shown to recover a significant part of the information lost when one uses gate-by-gate reconstruction.

2 Methods

2.1 Mathematical background

2.1.1 ML-EM algorithm [21].

Let us consider the statistical model

where is the unknown image, and

is the acquired data—a vector of

; this models the physics of stationary PET with forward operator .

The ML-EM algorithm solves the corresponding maximum likelihood problem, which amounts to minimising the divergence , defined for two non-negative vectors , in by

The ML-EM algorithm is an iterative solver with update


starting from an initial guess , usually .

2.1.2 Diffeomorphisms acting on images.

Viewing images as elements of , i.e., square-integrable functions on a compact with or , we model motion as an appropriate group action of diffeomorphisms onto images. In this paper, given a diffeomorphism , we will use the specific definition as the intensity-preserving action

Note that our approach is, however, general, and we could have used the mass-preserving action instead, namely


We will parameterise diffeomorphisms by exponentials of (stationary) vector fields, i.e., , where the exponential of a vector field is defined as , where solves the differential equation , with initial condition .

2.1.3 Image registration.

The (direct) image registration problem consists in deforming a template into a target , i.e., finding a diffeomorphism such that . This is usually done by minimising a functional of the form


where is the -distance on , is a regularisation term on diffeomorphisms that is discussed in subsection 2.3, and is a regularisation parameter.

2.2 General approach

2.2.1 Modelling.

We are given gated data in different gates, corrupted by Poisson noise. For denoting the data, the images in each gate and the forward operator, we thus assume

We also assume that for , two consecutive images and are related by the statistical model


is the exponential of a vector field following a given probability law (see (

8)) and is a

-valued random variable.

2.2.2 Variational problem.

We now define the variational problem associated to the inverse problem of finding both the images and diffeomorphisms from the data . It reads



2.2.3 General algorithm.

We solve the variational problem (4) by an intertwined method, which consists in alternating between estimating the diffeomorphisms (the motion estimation step), and the images (the reconstruction step).

The images are first initialised by solving the maximum likelihood problem , associated to in each gate. This is done by the algorithm ML-EM (1), yielding estimates , .

For a given estimate of images , the motion estimation part consists in solving

which in turn can be decomposed into problems of the form


Note that each of these becomes an image registration problem, as we are looking for a diffeomorphism matching the template against the target .

For the reconstruction part, we assume for and neglect the corresponding terms. The minimisation problem thus becomes

We then focus on a particular gate, say the zero’th gate, and use to obtain the optimisation problem with as the only variable:




and we have used the notation for . Solving the above yields a next estimate for . All the images are then obtained by , .

It now only remains to explain how the optimisation problems (5) and (6) are solved, which is the topic of the next subsections.

2.3 Motion estimation

The motion estimation problem (5), can be rewritten for two generic images and as


where we parameterise the diffeomorphisms by exponentials of stationary vector fields .

To solve this direct image registration problem, we use the voxelmorph unsupervised deep learning approach [6], where a neural network parameterises a function . That neural network is itself based on the network architecture Unet [20]. We keep the architecture of voxelmorph

, with the same hyperparameters and specific regularisation functional

given in [6]. Once trained, the network produces a mapping matching any two images , which we denote


Training. In [6], the network is trained on tuples of images coming from brain MRI scans. We use instead synthetic data: tuples of images generated on the fly.

We generate training images as follows. A random image consists of a Poisson random number of ellipsoids [3, 4]

. The centre of each ellipsoid has uniform distribution inside the central part of the domain

, the principal axes have exponential distribution, and the orientation follows a uniform distribution. We apply a mask vanishing at the boundary to avoid boundary effects. when diffeomorphisms are applied. We generate random vector fields

using a Gaussian random field with radial basis function kernel, with appropriate scale and typical size. The training image

is then . We show in Figure 1 a sample of images , and vector field generated as above.

Figure 1: Example of a 2D synthetic tuple of images (a) and (b), related by for the intensity-preserving action, with plotted in (c).

2.4 Reconstruction

We now focus on the reconstruction problem (6) which we solve using a reformulation of ML-EM. Given operators , we can simply write ML-EM for the compound operator which yields


for an initial guess . We call this algorithm “M-ML-EM” to avoid the confusion with the vanilla ML-EM algorithm (1). We use this algorithm with defined in (7). Note that this algorithm has been used in [13] for the particular case of the intensity-preserving action. The computation of requires the computation of . We achieve this by using the identity valid for any diffeomorphism , where denotes the mass-preserving action (2).

2.5 Full algorithm

We summarise the algorithm with all the necessary details in Algorithm 1.

Choose the outer number of iterates , the inner number of iterates for M-ML-EM, and , the number of iterates for vanilla ML-EM.

for  do
      Iterates of (1)
end for
for  do
     for  do
          Network registration (9)
     end for
     for  do
     end for
      Iterates of (10)
     for  do
     end for
end for

The outcome is .

Algorithm 1 Full Algorithm

2.6 Complexity

Evaluating vector fields with the network is negligible when compared to ML-EM or M-ML-EM iterations. Each iteration is itself controlled by the time required to compute an expression of the form . Since M-ML-EM sums these quantities times, an iteration of it is of the order of . Note that evaluating the denominator in (10) (which involves sums of ) does not take more time than evaluating the denominator in ML-EM since can be computed off-line.

3 Results

3.1 Derenzo phantom

We present experiments with the Derenzo phantom, with image size . Although this phantom is made of ellipses, we stress that they are very different from the data used to train the network, compare Figure 1 and (a).

This phantom is then deformed successively with the intensity preserving action by exponentials of vector fields, where each vector field is drawn from the same distribution used to train the network. For the experiments, we use , which amounts to four gates, and we want to recover the image in the initial gate. The resulting four phantoms are presented in Figure 2.

Figure 2: Derenzo phantom in four different gates.

The forward operator is a 2D PET operator with angles (views) and tangential positions. The noisy data is for each image , where is the acquisition time and thus controls the noise level.

For the phantoms in Figure 2, we choose . This noise level gives rise to typical optimal numbers of iterates for ML-EM which are of the same order of magnitude as the ones in clinic applications. Note that all images are multiplied by the same time factor, which amounts to assuming that acquisition time is roughly the same in each gate.

3.2 Methods without motion correction

We compare our method with two simple reconstruction methods (simple because without motion correction) for images with gated data:

  • Either one aggregates the whole data and reconstructs from ML-EM, leading to blurry results because of the movement.

  • Or one tries and limit blur by focusing on one gate (say the first) and reconstructing only from that. Since there is less data, the result is noisier.

In order to quantitatively compare these strategies, we use ML-EM for the data obtained from taking gate zero only, aggregating gates zero and one, and so on up until aggregating all the four gates. Finally, we can also estimate the best reconstruction one could hope for, that is, if there were no movement. This amounts to acquiring the phantom in the th gate four times longer.

The results are given in Figure 3, where the PSNR between the estimated image and the real image in gate zero is computed at each iteration.

The results show that aggregating the gates progressively induces a drop in image quality, as measured by the PSNR. Compared to gate zero acquired four times longer, the best possible achievable gain is about .

Figure 3: PSNR for different ML-EM strategies without motion correction, and comparison with ”no-movement” data, reconstructing from the initial gate acquired four times longer.

3.3 Proposed method

We apply Algorithm 1 to the data above. It turns out that a single outer iteration is responsible for most of the improvement, so we focus on that case for presenting experiments. In other words:

  1. we initialise by running some ML-EM iterations in each gate,

  2. we then match the resulting images to estimate the diffeomorphisms,

  3. we finally run some M-ML-EM iterations.

We plot the PSNR between the image reconstructed (in the initial gate zero) and the real image, for a given number em_iter of ML-EM iteration followed by a given number diff_iter of M-ML-EM iterations. These results are presented in Figure 4.

Figure 4: PSNR for various choices of number em_iter of initial ML-EM iterates and number diff_iter of M-ML-EM iterates.

We find that the optimal strategy is to iterate only a few times (six iterations in this specific experiment) with ML-EM before estimating the diffeomorphisms through M-ML-EM ( iterations in this specific experiment). Note that this yields a total of iterations which is higher than the ML-EM iterations which would be optimal for reconstructing from the gate zero.

The gain in PSNR is , which makes up for about of the maximal gain of . Reconstructions obtained from the optimal uncorrected ( iterations of ML-EM are used on gate zero) and the proposed method with the optimal number of iterations of ML-EM and M-ML-EM are presented in Figure 5. The proposed method seems to give smoother results. The smaller discs towards the middle of the image are also better seen.

(a) Optimal ML-EM reconstruction: 29 iterations (using one gate)

(b) Optimal reconstruction with M-ML-EM: 6 ML-EM + 42 M-ML-EM iterations
Figure 5: Optimal reconstructions of the gate zero (measured in PSNR).

We also emphasise that these results (improvement in PSNR and optimal number of iterations) are extremely robust with respect to the randomness involved in the experiments, namely the vector fields drawn randomly as well as the Poisson noise.

3.4 Implementation Details

All computations are run in Python and use Operator Discretization Library (odl) for manipulating operators [2], neuron for warping utilities [7], which itself uses tensorflow [1]. The training was performed with voxelmorph [6].

4 Perspectives

This paper presents a new method for joint motion estimation and image reconstruction in PET. Its main advantage is its cost, similar to that of the usual ML-EM algorithm, making it scalable to clinical 4D data.

Our framework also allows for further modelling such as attenuation correction. In a future work, we consider testing this method with clinical data. This would require training the network on appropriate datasets. We also plan to generalise the approach to other group actions, such as the mass-preserving one, which is more physically relevant.


We acknowledge support from the Swedish Foundation of Strategic Research grant AM13-004.


  • [1]

    Abadi, M., et al.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015),, software available from
  • [2] Adler, J., Kohr, H., Öktem, O.: ODL-a Python framework for rapid prototyping in inverse problems. Royal Institute of Technology (2017)
  • [3] Adler, J., Öktem, O.: Solving ill-posed inverse problems using iterative deep neural networks. Inverse Problems 33(12), 124007 (2017)
  • [4] Adler, J., Öktem, O.: Learned primal-dual reconstruction. IEEE transactions on medical imaging 37(6), 1322–1332 (2018)
  • [5] Blume, M., Martinez-Moller, A., Keil, A., Navab, N., Rafecas, M.: Joint reconstruction of image and motion in gated positron emission tomography. IEEE Transactions on Medical Imaging 29(11), 1892—1906 (2010)
  • [6]

    Dalca, A.V., Balakrishnan, G., Guttag, J., Sabuncu, M.R.: Unsupervised learning for fast probabilistic diffeomorphic registration. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 729–738. Springer (2018)

  • [7]

    Dalca, A.V., Guttag, J., Sabuncu, M.R.: Anatomical priors in convolutional networks for unsupervised biomedical segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 9290–9299 (2018)

  • [8] Dawood, M., Jiang, X., Schäfers, K.P. (eds.): Correction Techniques in Emission Tomography. Series in Medical Physics and Biomedical Engineering, CRC Press (2008)
  • [9] Farag, A.A., Shalaby, A., El Munim, H.A., Farag, A.: Variational shape representation for modeling, elastic registration and segmentation. In: Li, S., Tavares, J.M.T.S. (eds.) Shape Analysis in Medical Image Analysis, Lecture Notes in Computational Vision and Biomechanics, vol. 14, chap. 95–122. Springer-Verlag (2014)
  • [10] Gigengack, F., Jiang, X., Dawood, M., Schäfers, K.P.: Motion Correction in Thoracic Positron Emission Tomography. Springer-Verlag (2015)
  • [11] Gilland, D.R., Mair, B.A., Bowsher, J.E., Jaszczak, R.J.: Simultaneous reconstruction and motion estimation for gated cardiac ECT. IEEE Transactions in Nuclear Science 49(5), 2344—2349 (2002)
  • [12] Gravier, E., Yang, Y., King, M.A., Jin, M.: Fully 4D motion-compensated reconstruction of cardiac SPECT images. Physics in Medicine and Biology 51(18), 4603—4619. (2006)
  • [13] Hinkle, J., Szegedi, M., Wang, B., Salter, B., Joshi, S.: 4d ct image reconstruction with diffeomorphic motion model. Medical image analysis 16(6), 1307–1316 (2012)
  • [14] Hudson, H.M., Larkin, R.S.: Accelerated image reconstruction using ordered subsets of projection data. IEEE transactions on medical imaging 13(4), 601–609 (1994)
  • [15] Jacobson, M.W., Fessler, J.A.: Joint estimation of image and deformation parameters in motion-corrected PET. In: 2003 IEEE Nuclear Science Symposium Conference Record. pp. 3290—3294 (2003)
  • [16] Jacobson, M.W., Fessler, J.A.: Joint estimation of respiratory motion and activity in 4D PET using CT side information. In: 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro. Arlington, VA, April 6–9, 2006. pp. 275—278 (2006)
  • [17] Li, T., Zhang, M., Qi, W., Asma, E., Qi, J.: Motion correction of respiratory-gated pet image using deep learning based image registration framework. In: 15th International Meeting on Fully Three-Dimensional Image Reconstruction in Radiology and Nuclear Medicine. vol. 11072, p. 110720Q. International Society for Optics and Photonics (2019)
  • [18] Rahmim, A., Tang, J., Zaidi, H.: Four-dimensional image reconstruction strategies in cardiac-gated and respiratory- gated PET imaging. PET Clinics 8(1), 51–67 (2013)
  • [19] Reader, A.J., Verhaeghe, J.: 4D image reconstruction for emission tomography. Physics in Medicine and Biology 59(22), R371–R418 (2014)
  • [20] Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. pp. 234–241. Springer (2015)
  • [21] Shepp, L.A., Vardi, Y.: Maximum likelihood reconstruction for emission tomography. IEEE transactions on medical imaging 1(2), 113–122 (1982)
  • [22] Younes, L.: Shapes and Diffeomorphisms, Applied Mathematical Sciences, vol. 171. Springer-Verlag (2010)
  • [23] Zhang, Y., Ghodrati, A., Brooks, D.H.: An analytical comparison of three spatio-temporal regularization methods for dynamic linear inverse problems in a common statistical framework. Inverse Problems 21(1), 357–382 (2005)