Proximal Gradient Descent Network (PGD-Net) for Magnetic Resonance Fingerprinting, MICCAI'2020
Consistency of the predictions with respect to the physical forward model is pivotal for reliably solving inverse problems. This consistency is mostly un-controlled in the current end-to-end deep learning methodologies proposed for the Magnetic Resonance Fingerprinting (MRF) problem. To address this, we propose ProxNet, a learned proximal gradient descent framework that directly incorporates the forward acquisition and Bloch dynamic models within a recurrent learning mechanism. The ProxNet adopts a compact neural proximal model for de-aliasing and quantitative inference, that can be flexibly trained on scarce MRF training datasets. Our numerical experiments show that the ProxNet can achieve a superior quantitative inference accuracy, much smaller storage requirement, and a comparable runtime to the recent deep learning MRF baselines, while being much faster than the dictionary matching schemes. Code has been released at https://github.com/edongdongchen/PGD-Net.READ FULL TEXT VIEW PDF
Proximal Gradient Descent Network (PGD-Net) for Magnetic Resonance Fingerprinting, MICCAI'2020
Magnetic resonance fingerprinting (MRF) is an emerging technology that enables simultaneous quantification of multitudes of tissues’ physical properties in short and clinically feasible scan times . Iterative reconstruction methods based on Compressed Sensing (CS) have proven efficient to help MRF overcome the challenge of computing accurate quantitative images from the undersampled k-space measurements taken in aggressively short scan times [10, 3, 31, 11]. However, these methods require dictionary matching (DM) that is non-scalable and can create enormous storage and computational overhead. Further, such approaches often do not fully account for the joint spatiotemporal structures of the MRF data which can lead to poor reconstructions .
Deep learning methodologies have emerged to address DM’s computational bottleneck [9, 15, 28, 25], and in some cases to perform joint spatiotemporal MRF processing through using convolutional layers [12, 17, 8, 4, 19, 27, 13]. These models are trained in an end-to-end fashion without an explicit account for the known physical acquisition model (i.e. the forward operator) and a mechanism for explicitly enforcing measurement consistency according to this sampling model which can be crucial in the safety-first medical applications. Further, ignoring the structure of the forward model could lead to building unnecessary large inference models and possible overfitted predictions, especially for the extremely scarce labelled anatomical quantitative MRI datasets that are available for training.
Our contributions: we propose PGD-Net a deep convolutional model that is able to learn and perform robust spatiotemporal MRF processing, and work with limited access to the ground-truth (i.e. labelled) quantitative maps. Inspired by iterative proximal gradient descent (PGD) methods for CS reconstruction , we adopt learnable, compact and shared convolutional layers within a data-driven proximal step, meanwhile explicitly incorporating the acquisition model as a non-trainable gradient step in all iterations. The proximal operator is an auto-encoder network whose decoder embeds the Bloch magnetic responses and its convolutional encoder embeds a de-aliasing projector to the tissue maps’ quantitative properties. Our work is inspired by recent general CS methodologies [23, 26, 1, 2, 7] that replace traditional hand-crafted image priors by deep data-driven models. To the best of our knowledge, this is the first work to adopt and investigate the feasibility of such an approach for solving the MRF inverse problem.
MRF adopts a linear spatiotemporal compressive acquisition model:
where are the k-space measurements collected at temporal frames and corrupted by some noise , and is the Time-Series of Magnetisation Images (TSMI) with voxels across timeframes. The forward operator
models Fourier transformations subsampled according to a set of temporally-varying k-space locations in each timeframe. Accelerated MRF acquisition implies working with heavily under-sampled data, which makes becomes ill-posed for the inversion.
and is the dimension-reduced TSMI, and denotes the subspace-compressed Bloch solutions (for more details see ).
Tissue quantification: Given the compressed measurements y, the goal of MRF is to solve the inverse problem (2) and to compute the underlying multi-parametric maps (and x as a bi-product). Such problems are typically casted as an optimisation problem of the form:
and solved iteratively by the proximal gradient descent (PGD):
where the gradient updates encourage k-space fidelity (the first term of (3)), and the proximal operator enforces image structure priors through a regularisation term that makes the inverse problem well-posed. The Bloch dynamics in (2) place an important temporal constraint (prior) for per-voxel trajectories of x. Projecting onto this model (i.e. a temporal Prox model) has been suggested via iterative dictionary search schemes [10, 3]. This approach boost MRF reconstruction accuracy compared to the non-iterative DM , however, DM is non-scalable and can create enormous storage and computational overhead. Further, such approach processes data independently per voxel and neglects important spatial domain regularities in the TSMIs and quantitative maps.
We propose to learn a data-driven proximal operator within the PGD mechanism for solving the MRF problem. Implemented by compact networks with convolutional layers, the neural Prox improves the storage overhead and the sluggish runtime of the DM-based PGD by orders of magnitudes. Further, trained on quantitative MR images, the neural Prox network learns to simultaneously enforce spatial- and temporal-domain data structures within PGD iterations.
Prox auto-encoder: We implement through a deep convolutional auto-encoder network:
consisting of an encoder and a decoder Bloch: subnetworks. The information bottleneck in the (neural) Prox auto-encoder corresponds to projecting multichannel TSMIs to the low-dimensional manifold of the tissues’ intrinsic (quantitative) property maps .
Decoder network: creates a differentiable model for generating the Bloch magnetic responses. This network uses filters to process image time-series in a voxel-wise manner. Given quantitative properties , the decoder approximates (dimension-reduced) Bloch responses in voxel i.e. . This network is trained separately from the encoder. Training uses physical (Bloch) simulations for many combinations of the T1, T2 and PD values which can flexibly produce a rich training dataset .
Encoder network: projects g the gradient-updated TSMIs in each iteration (i.e. the first line of (4)) to the quantitative property maps m. Thus, must simultaneously (i) learn to incorporate spatial-domain regularities to de-alias TSMIs from the undersampling artefacts, and (ii) resolve the temporal-domain inverse mapping from the (noisy) TSMIs to the quantitative property maps. For this, and unlike Bloch which applies pixel-wise temporal-only processing, uses multichannel convolution filters with wider receptive fields to learn/enable spatiotemporal processing of the TSMIs.
PGD-Net: Fig. 1 shows the recurrent architecture of the proposed learned PGD algorithm, coined as the PGD-Net. The trainable parameters within the PGD-Net are those of the encoder network and the step sizes . Other operators such as and Bloch (pre-trained separately) are kept frozen during training. Further, ’s parameters are shared through all iterations. In practice, a truncated recurrent iterations is used for training. Supervised training requires the MRF measurements, TSMIs, and the ground truth property maps to form the training input y and target samples.
Note there are many arts of engineering to determine the optimal network architecture, including different ways to encode temporal  or spatial-temporal information , these aspects are somewhat orthogonal to the model consistency question. Indeed, such mechanisms could also be incorporated in PGD-Net.
Training loss: Given a training set , and recurrent iterations of the PGD-Net (i.e. iterations used in PGD), the loss is defined as
where is the MSE loss defined with appropriate weights , on the reconstructed TSMIs x (which measures the Bloch dynamic consistency) and tissue property maps m, as well as on y to maximise k-space data consistency with respect to the (physical) forward acquisition model. In this paper, the scaling between parameters , and were initialized based on the physics (see 4.3).
We construct a dataset of brain scans acquired using the 1.5T GE HDxT scanner with 8-channel receive-only head RF coil. For setting ground-truth (GT) values for the T1, T2 and PD parameters, gold standard anatomical maps were acquired using MAGIC quantification protocol . Ground-truth quantitative maps were acquired from 8 healthy volunteers (16 axial brain slices each, at the spatial resolution of pixels). From these parametric maps, we then construct the TSMIs and MRF measurements using the MRF acquisition protocol mentioned below to form the training/testing tuples . Data from 7 subjects were used for training our models, and one subject was kept for performance testing. We augmented training data into total 224 samples using random rotations (uniform angles in
), and left-right flipping of the GT maps. Training batches at each learning epoch were corrupted by i.i.d Gaussian noises of 30dB SNR added toy (we similarly add noise to the k-space test data).
Our experiments use an excitation sequence of repetitions which jointly encodes T1 and T2 values using an inversion pulse followed by a flip angle schedule that linearly ramps up from to , i.e. truncated sequence than [16, 14]. Following , we set acquisition parameters Tinv=18 msec (inversion time), fixed TR=10 msec (repetition time), and TE = 0.46 msec (echo time). Spiral readouts subsample the k-space frequencies (the Cartesian FFT grid) across 200 repetition times. We sample spatial frequencies for , which after quantisation to the nearest FFT grid, results in samples per timeframe. In every repetition, similar to , this spiral pattern rotates by in order to sub-sample new k-space frequencies. Given the anatomical T1, T2 and PD maps, we simulate magnetic responses using the Extended Phase Graph (EPG) formalism  and construct TMSIs and k-space measurements datasets, and use them for training and retrospective validations.
Two DM baselines namely, the non-iterative Fast Group Matching (FGM)  and the model-based iterative algorithm BLIP empowered by the FGM’s fast searches, were used for comparisons. For this, a MRF dictionary of 113’781 fingerprints was simulated over a dense grid of (T1, T2)=[100:10:4000][20:2:600] msec values. We implemented FGM searches on GPU using 100 groups for clustering this dictionary. The BLIP algorithm uses backtracking step size search and runs for maximum 20 iterations if is not convergent earlier. Further, we compared against related deep learning MRF baselines MRFCNN  and SCQ . In particular, MRFCNN is a fully convolutional network and SCQ mainly uses 3 U-nets to separately infer T1, T2 and PD maps. The input to these networks is the dimension-reduced back-projected TSMIs , and their training losses only consider quantitative maps consistency i.e. the second term in (6).
We trained PGD-Net with recurrent iterations and 5 to learn appropriate proximal encoder and the step sizes . The architectures of and Bloch networks are illustrated in Fig. 1. Similar to , the MRF dictionary was used for pre-training the Bloch decoder that embeds a differentiable model for generating Bloch magnetic responses. A compact shallow network with one hidden layer and filters (for pixel-wise processing) implements our Bloch model . On the other hand, our encoder has two residual blocks with filters (for de-aliasing) followed by three convolutional layers with filters for quantitative inference. The inputs were normalized such that PD ranged in ; smaller weights were used for x and y since they have higher energy than PD; we set since x’s norm is larger than y; values typically exhibit different ranges with , justifying their relative weightings in to balance these terms. The final hyper-parameters were , and selected via a multiscale grid search to minimize error w.r.t. the ground truth. We used ADAM optimiser with 2000 epochs, mini-batch size 4 and learning rate . We pre-trained our encoder using back-projected TSMIs to initialise the recurrent training, and also to compare the encoder alone predictions to the PGD-Net. All algorithms use a dimensional MRF subspace representation for temporal-domain dimensionality reduction. The input and output channels are respectively 10 and 3 for MRFCNN, SCQ and
. All networks were implemented in PyTorch, and trained and tested on NVIDIA 2080Ti GPUs.
|NRMSE||SSIM||MAE (msec)||time (sec)||memory (MB)|
Table 1 and Figure 2 compare the performances of the MRF baselines against our proposed PGD-Net using and 5 recurrent iterations. We also include inference results using the proposed encoder alone , without proximal iterations. Reconstruction performances were measured by the Normalised RMSE , MAE , Structural Similarity Index Metric (SSIM) , the required storage for the MRF dictionary (in DM methods) or the networks, and the algorithm runtimes averaged over the test image slices.
The non-iterative FGM results in incorrect maps due to the severe under-sampling artefacts. The model-based BLIP iterations improve this, however, due to lacking spatial regularisation, BLIP has limited accuracy and cannot fully remove aliasing artefacts (e.g. see T2 maps in Figure 2) despite 20 iterations and very long runtime. In contrast, all deep learning methods outperform BLIP not only in accuracy but also in having 2 to 3 orders of magnitude faster reconstruction times—an important advantage of the learning-based methods. The proposed PGD-Net consistently outperforms all baselines, including DM and learning-based methods, over all defined accuracy metrics. This is achieved due to learning an effective spatiotemporal model (only) for the proximal operator i.e. the and Bloch networks, directly incorporating the physical acquisition model H into the recurrent iterations to avoid over-parameterisation of the overall inference model, as well as enforcing reconstructions to be consistent with the Bloch dynamics and the k-space data through the multi-term training loss (6). The MRFCNN and SCQ over-parametrise the inference by 1 and 3 orders of magnitude larger model sizes (the SCQ requires larger memory than DM) and are unable to achieve PGD-Net’s accuracy e.g. see the corresponding over-smoothed T2 maps in Fig. 2. Finally, we observe that despite having roughly the same model size (storage), the encoder alone predictions are not as accurate as the results of the PGD-Net’s recurrent iterations. Similar to proximal GD, PGD-Net are expected to converge to a fixed point. By increasing the number of iterations we observe that the PGD-Net’s accuracy consistently improves despite having an acceptable longer inference time. However, accuracy gains for are marginal, suggesting method’s fast convergence.
In this work we showed that the consistency of the computed quantitative maps with respect to the physical forward acquisition model and the Bloch dynamics is important for reliably solving the MRF inverse problem using compact deep neural networks. For this, we proposedPGD-Net, a learned model-based iterative reconstruction framework that directly incorporates the forward acquisition and Bloch dynamic models within a recurrent learning mechanism with a multi-term training loss. The PGD-Net adopts a data-driven neural proximal model for spatiotemporal processing of the MRF data, TSMI de-aliasing and quantitative inference. A chief advantage of this model is its compactness (a small number of weights/biases to tune), which might makes it particularly suitable for supervised training using scarce quantitative MRI datasets. Through our numerical validations we showed that the proposed PGD-Net achieves a superior quantitative inference accuracy, much smaller storage requirement, and a comparable runtime to the recent deep learning MRF baselines, while being much faster than the MRF fast dictionary matching schemes. In future work, we plan to evaluate the non-simulated scanner datasets with higher diversities and possible pathologies to further validate the method’s potential for clinical usage.
The authors would like to thank Pedro Gomez, Carolin Prikl and Marion Menzel from the GE Healthcare in Munich, for useful discussions and for the quantitative anatomical maps dataset. DC and MD are supported by the ERC C-SENSE project (ERCADG-2015-694888).
Magnetic resonance fingerprinting reconstruction via spatiotemporal convolutional neural networks. In
International Workshop on Machine Learning for Medical Image Reconstruction, pp. 39–46. Cited by: §1.
RinQ fingerprinting: recurrence-informed quantile networks for magnetic resonance fingerprinting. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 92–100. Cited by: §1.
Magnetic resonance fingerprinting using recurrent neural networks. In IEEE Intl. Symposium on Biomedical Imaging (ISBI), pp. 1537–1540. Cited by: §1.
Proceedings of the IEEE International Conference on Computer Vision, pp. 5888–5897. Cited by: §1.