Magnetic resonance fingerprinting (Ma2013) (MRF) is a concept for simultaneous and fast acquisition of multiple quantitative MR parameters. The MR acquisition relies on temporal variations of MR sequence parameters usually combined with high k-space under-sampling. As a result, a time-series of weighted MR images is acquired, where each tissue has a unique MR signal evolution - or fingerprint. Such fingerprints can be simulated, e.g., by Bloch equations, and a dictionary of expected fingerprints can be built. During image reconstruction, the acquired fingerprints are matched to this dictionary of simulated fingerprints with known MR parameters. The highest correlated fingerprint in the dictionary yields the MR parameters at the given voxel. By repeating this process for all voxels, parametric maps are reconstructed.
MRF has the potential for clinically feasible multiparametric MR imaging, and could enable objective evaluation and comparison for a wide variety of clinical applications (Poorman2019). However, whilst the MRF acquisition is fast, the dictionary matching reconstruction is computationally demanding and lacks scalability as the problem worsens exponentially with the number of reconstructed MR parameters. For instance, as reported in (Marty2019a), the reconstruction of five parametric maps can require minutes to hours depending on the implementation and computational hardware. This long reconstruction is mainly attributed to the large dictionary with approximately 9 million simulated fingerprints for five parametric maps. The reconstruction time will especially proof problematic when acquiring large data sets with many slices in clinically settings. Additionally, the dictionary matching results in discretized parametric maps, which might be undesirable considering continuous-valued parametric maps using gold-standard MR sequences. Therefore, the dictionary matching represents currently a drawback of MRF, which makes a routine clinical application of MRF potentially inappropriate, and calls for accurate and fast reconstruction alternatives.
Several methods attempting to improve the MRF reconstruction have been proposed lately. Acceleration of the dictionary matching (McGivney2014; Cauley2015; Gomez2016), iterative reconstruction (Davies2014; Pierre2016), and low-rank approximations (Mazor2018; Asslander2018; Zhao2018; LimadaCruz2019) were proposed for MRF reconstruction. While some of these methods are promising, both further acceleration of the reconstruction process and continuous values in the parametric maps, as opposed to the discretely sampled dictionary-based reconstruction, are highly desired. Promising in these regards are deep learning-based methods, which offer near real-time reconstructions and produce parametric maps with continuous values. Methods vary from fingerprint-wise reconstruction (Hoppe2017; Virtue2017; Cohen2018; Oksuz2019; Song2019a; Golbabaee2019) to methods considering neighborhoods of fingerprints for the reconstruction (Balsiger2018b; Fang2018; Balsiger2019a; Fang2019). So far, all deep learning-based studies come with either one or several shortcomings: First, all except one study (Balsiger2019a) used simulated, phantom, or healthy volunteer data. Second, only two studies used highly undersampled in vivo MRF with more than one subject (Fang2018; Fang2019). Third, although empirical evidence for the superiority of using spatial over fingerprint-wise reconstruction exists (Balsiger2019a), it has not yet been thoroughly investigated for highly undersampled MRF.
We aimed to overcome the aforementioned shortcomings of existing deep learning-based MRF reconstruction. Our contributions, which build upon our conference article (Balsiger2019a), are manifold: (i) we propose a flexible convolutional neural network (CNN) architecture for parametric map reconstruction from highly undersampled MRF, (ii) we evaluated the method’s performance on a large (n=164) and highly heterogenous patient dataset, (iii) we compared the proposed method to recent deep learning-based MRF reconstruction methods, (iv) we analyzed the influence of both the spatial and temporal MRF data dimensions on the reconstruction, (v) we investigated the influence of the number of training subjects for robustness to highly heterogeneous data, and (vi) we determined the generalization of the method to different anatomical regions not used during training.
2 Materials and Methods
2.1 MRF Acquisition and Dictionary Matching Reconstruction
We used MRF T1-FF (Marty2019a), an MRF sequence for T1 relaxation time of water (T1H2O) and fat fraction (FF) mapping. The acquisition consisted of a 1400 radial spokes FLASH echo train following the golden angle scheme after non-selective inversion. Echo time, repetition time, and nominal flip angle were varied during the echo train. The field of view was set to 350 mm 350 mm with a voxel size of 1.0 mm 1.0 mm 8.0 mm, and five slices were acquired within a total acquisition time of 50 seconds. All acquisitions were performed on a 3 tesla Siemens MAGNETOM Prismafit scanner (Siemens Healthineers, Erlangen, Germany) using a set of 18-channel flexible phase array coils, combined with a 48-channel spine coil.
Five parametric maps were reconstructed after the MRF T1-FF acquisition: FF, T1H2O, T1 relaxation time of fat (T1fat), static magnetic field inhomogeneity (
f), and flip angle efficacy (B1). First, the acquired data was transformed to image space using the non-uniform fast Fourier transform (NUFFT)(Fessler2003a) with eight spokes per temporal frame, resulting in a highly undersampled time series of 175 temporal frames (acceleration factor of 68.7). Second, dictionary matching was conducted using a dictionary simulated by Bloch equations with for FF, ms for T1H2O, ms for T1fat, Hz for f, for B1 (start:increment:stop). Despite fast group matching (Cauley2015) and dictionary compression (McGivney2014), the dictionary matching still required approximately 8 hours using standard desktop computer hardware (2.6 GHz Intel Xenon E5-2630, 48 GB memory) due to the large number of MR parameter combinations.
2.2 Conceptual Formulation
The hypothesis leading to the design of the proposed CNN architecture is that the reconstruction performance depends on 1) the CNN’s receptive field and 2) the number of learnable parameters of the CNN. On one hand, the receptive field determines the number of neighboring fingerprints the CNN will use to predict the value of the parametric maps of the central fingerprint. We argue that it is, especially in the case of highly undersampled MRF, beneficial to leverage the spatial correlation among fingerprints but this spatial correlation is limited to a certain extent due to different tissue properties yielding different fingerprints, especially in lesions. Technically speaking, the extent of spatial correlation limits the number of spatial convolutions, i.e., convolutions with kernel sizes larger or equal than . On the other hand, the reconstruction performance will also be determined by the number of learnable parameters. The CNN requires a certain capacity to extract features that cover the space of possible input fingerprints. Similar as for the receptive field, an appropriate capacity is especially needed when dealing with multiple parametric maps and diverse tissue properties. A simple yet effective way to increase the number of learnable parameters while maintaining the receptive field is to use convolutions.
The schematic overview of the proposed MRF reconstruction is shown in Fig. 1. Let us consider a 2-D+time MRF image slice after NUFFT in the image space with matrix size and temporal frames. We aim to find the mapping from the MRF image space to parametric maps . Here, we propose a 2-D CNN to learn this mapping, which processes the MRF data patch-wise and treats the temporal frames as channels. Therefore, the proposed method learns the mapping
, and estimates the parametric maps by
where the non-linear mapping parametrized by the CNN’s parameters , and and are patches extracted from the 2-D+time MRF image slice and the parametric maps. The CNN reconstructs non-overlapping patches of the parametric maps with size of (the patch-wise processing is mainly motivated by graphics processing unit (GPU) memory limitations, in this study 12 GB). The input patch size is determined with the CNN’s receptive field by . Here, we used a receptive field of , i.e., and . Further for MRF T1-FF, , , and (FF, T1H2O, T1fat, f, B1).
2.3 CNN Architecture
The CNN architecture consists of temporal and spatial blocks, which are interleaved within the architecture as shown in Fig. 2a. An additional 1
1 convolution with linear activation function andchannels for predicting is added after the interleaved blocks. Input to the CNN are real-valued with real and imaginary parts concatenated as channels. A temporal block (Fig. 2b) extracts temporal features from fingerprints and follows the design of dense blocks (Huang2017). They are composed of layers, and each of them is a sequence of 1 Glorot2011)
, and batch normalization (BN)(Ioffe2015). All convolutions have the same number of filters (growth rate), and the feature maps of the preceding layers are concatenated before the next layer to reuse features and facilitate the gradient flow (Huang2017). A spatial block (Fig. 2c) extracts features from neighboring fingerprints, and, therefore, increases the CNN’s receptive field. It consists of a valid 3 3 convolution with
filters followed by ReLU activation function, and BN. All convolutions are performed with a stride of 1.
The use of temporal and spatial blocks allows building the CNN architecture based on the needs of receptive field and number of learnable parameters. The temporal blocks extract a high number of input channels, from which the spatial blocks then extract spatially correlated features with an even higher number of parameters (factor 9 due to 3 3 convolution). Using Algorithm 1, the CNN architecture can be built based on the needs of the MRF sequence to reconstruct. Inputs to the algorithm are the receptive field , the number of parameters , i.e., the number of learnable weights of all convolutional kernels in the CNN, and the number of non-linearities , i.e., the number of ReLU activation functions444Note that the number of non-linearities are equal to the number of convolutions in the temporal and spatial blocks because each convolution is directly followed by a ReLU activation function.. The number of temporal and spatial blocks are determined by the receptive field. The number of channels of the spatial blocks are chosen such that it gradually decreases down to , in steps of , before the last convolutional filter with linear activation, i.e.,
where denotes a sequence (e.g., for , , and ). The number of layers of the temporal blocks are determined by
where is the remaining number of non-linearities and is the indicator function returning if the statement is true and otherwise (e.g., for and ). Given that the number of filters of the temporal blocks gradually decrease down to , in steps of , an ideal number of channels in the first temporal block can be calculated such that the total number of learnable convolutional parameters are as close as possible to the desired number of learnable parameters 555Calculating the number of parameters of a convolution is straightforward, i.e., for a convolution with kernel size , input channels, and output channels.. is calculated by
(e.g., for , , , and ). As we will analyze in Section 3.3, the optimal receptive field of the CNN for the MRF T1-FF sequence is 15
15 with approximately 5 million parameters. Therefore, the architecture consists of seven temporal and spatial blocks. We empirically chose the hyperparameters, , , , and , resulting in number of layers , channels of the temporal blocks , i.e. , and channels of the spatial blocks .
2.4 CNN Training and Implementation
The CNN was trained for 75 epochs with a batch size of 20 randomly selected patches, which we empirically found to be sufficient. The Adam optimizer(Kingma2015) was used to minimize a mean squared error loss () with a learning rate of , , and . Therefore, the learning objective was
Before training, we normalized the data subject-wise: The MRF image was normalized to zero mean and unit standard deviation along the real and imaginary parts and each parametric map was rescaled to the rangeusing the minimum and maximum values in the dictionary (see Section 2.1
). After inference, the predicted parametric maps were rescaled back to the original dictionary range. The CNN was implemented in TensorFlow 1.10.0 (Google Inc., Mountain View, CA, U.S.) with Python 3.6.7 (Python Software Foundation, Wilmington, DA, U.S.). The training was performed with an NVIDIA TITAN Xp (Nvidia Corporation, Santa Clara, CA, U.S.). For reproducibility, the source code is available online666https://github.com/fabianbalsiger/… (link upon publication). Further investigation of some architecture hyperparameters can be found in Section 1 of the supplementary material.
2.5 Evaluation and Comparison
|BMD||45.4 15.0||45.7 16.9||64.0 0.0||42.3 11.3|
|19 / 19 / 19||11 / 11 / 11||1 / 1 / 1||7 / 7 / 7|
|DMD||12.4 2.2||12.5 2.3||12.8 2.5||11.9 2.0|
|25 / 25 / 12||14 / 14 / 8||4 / 4 / 1||7 / 7 / 3|
|IBM||64.5 9.6||65.3 10.4||66.3 10.3||62.6 8.3|
|70 / 30 / 34||37 / 16 / 13||9 / 4 / 7||24 / 10 / 14|
|Other||46.9 14.4||49.2 14.6||45.0 16.0||41.7 12.8|
|50 / 19 / 26||32 / 8 / 15||6 / 6 / 4||12 / 5 / 7|
|Overall||49.0 21.0||49.7 21.2||49.1 23.4||47.6 19.8|
|164 / 93 / 91||94 / 49 / 47||20 / 15 / 13||50 / 29 / 31|
We evaluated the performance of the proposed method on a clinical dataset consisting of 164 patients with various neuromuscular diseases (NMDs). The dataset is highly heterogeneous due to the variable phenotypic appearance of lesions in NMDs, and further comprises thigh and leg images. To evaluate the robustness of the methods, we purposely did not apply any stratification regarding disease type, patient sex, patient age, or anatomical region when splitting the dataset into training, validation, and testing splits (n=94/20/50). Table 1 summarizes clinical and demographic characteristics of the dataset. Multimedia files characterizing the heterogeneity of the dataset can be found online as supplementary material.
The dictionary matching reconstruction served as reference for the parametric maps. Quantitative analysis between the dictionary matching and the predicted parametric maps was done according to (Zbontar2018)
. The normalized root mean squared error (NRMSE), the peak signal-to-noise ratio (PSNR), and the structural similarity index measure (SSIM)(Wang2004) were calculated at the image level. Due to the quantitative nature of parametric maps, we provide further quantitative analysis based on the coefficient of determination (R2
), scatter plots, and Bland-Altman analysis. To this end, we manually segmented regions of interest (ROIs) lying within the major muscles of each subject. The ROIs allowed calculating the mean parametric value within each ROI of each image slice. Then a linear regression between the mean ROI values of the dictionary matching and predicted parametric maps quantified the agreement between the methods. For all evaluation, background voxels were excluded based on an automatically segmented mask generated by thresholding an anatomical image obtained from the MRF image space series (pseudo out-of-phase image(Marty2019a)). Further, voxels and ROIs with a FF higher than 0.7 were excluded from the evaluation of the T1H2O map reconstruction due to low confidence of T1H2O at high FF (Marty2019a). For the SSIM, we used a window size of 7 7, , , and was set to the maximum value of the parametric map.
We compared the proposed method to four other deep learning-based MRF reconstruction, which can be grouped into methods working fingerprint-wise and a method considering spatial neighborhoods of fingerprints. The fingerprint-wise methods comprise Cohen2018, a neural network with two hidden fully-connected layers, Hoppe2017, a CNN with four 1-D convolutional layers, and Oksuz2019Fang2019 works patch-wise using an U-Net-like CNN with pooling operations resulting in a receptive field of 54 54.
3 Experiments and Results
3.1 Parametric Map Reconstructions
Reconstruction results of the dictionary matching, the proposed method, the best fingerprint-wise method (Oksuz2019), and the spatial method of Fang2019 are shown in Fig. 3. Visually, the proposed method achieved the best reconstruction results for all parametric maps. Compared to the dictionary matching, all reconstructions appear to be slightly smoothed. Oksuz2019 resulted in noisier reconstructions and could not capture elevated T1H2O. Fang2019 achieved similar results as the proposed method, but the reconstructions contain artifacts originating from the patch-wise processing, which are not present for the proposed method. Further, the reconstructions of Fang2019 appear to be slightly more smooth than the reconstructions of the proposed method. A zoomed-in region with fatty infiltrated muscle and elevated T1H2O can be found in Section 2 of the supplementary material, showing that Oksuz2019 fails to reconstruct elevated T1H2O and that the reconstructions of Fang2019 contain subtle reconstruction artifacts.
Quantitatively, the proposed method achieved the best reconstruction results for the four metrics (Table 2). The parametric maps T1H2O and T1fat are the most difficult to reconstruct while for FF, f, and B1 the quantitative results are better. The quality of the agreement is further shown by the quantitative analysis of the ROIs in Fig. 4. The correlations between the CNN and the dictionary matching reconstruction are very high with the Pearson correlation coefficient . Only for T1H2O and T1fat, the agreements are slightly decreased with R2s of 0.919 and 0.927. The Bland-Altman plots show small to no bias for all five parametric maps, and the 95 % limits of agreement are smaller than the dictionary sampling increment for FF, T1fat, f, and B1 (cf. Section 2.1). For T1H2O, the agreement between the methods is approximately 6 sampling steps or . Similar plots for the Oksuz2019 and Fang2019 can be found in Section 2 of the supplementary material. Reconstructing the parametric maps of one subject (five image slices) required approximately 1 second with the proposed method, which is considerably faster than the dictionary matching requiring up to minutes or even hours depending on the implementation (McGivney2019; Marty2019a).
|NRMSE||FF||0.027 0.004||0.069 0.007||0.069 0.006||0.063 0.007||0.030 0.004|
|T1H2O||0.048 0.011||0.097 0.021||0.100 0.023||0.090 0.018||0.054 0.012|
|T1fat||0.212 0.076||0.290 0.095||0.294 0.096||0.287 0.094||0.217 0.077|
|f||0.027 0.008||0.062 0.013||0.062 0.009||0.056 0.008||0.030 0.007|
|B1||0.056 0.009||0.107 0.019||0.117 0.020||0.107 0.018||0.062 0.010|
|PSNR (dB)||FF||31.6 1.04||23.3 0.95||23.4 0.78||24.1 0.93||30.7 1.02|
|T1H2O||28.6 1.75||22.4 1.71||22.2 1.76||23.1 1.54||27.6 1.69|
|T1fat||22.7 1.85||19.9 1.35||19.8 1.30||20.0 1.38||22.5 1.78|
|f||25.1 2.11||17.7 1.96||17.6 1.62||18.5 1.63||24.1 2.02|
|B1||27.8 1.06||22.1 1.09||21.3 0.96||22.2 1.12||26.9 1.05|
|SSIM||FF||0.984 0.011||0.933 0.039||0.934 0.038||0.939 0.036||0.980 0.014|
|T1H2O||0.957 0.026||0.867 0.068||0.867 0.068||0.872 0.066||0.954 0.026|
|T1fat||0.940 0.028||0.866 0.060||0.867 0.058||0.869 0.057||0.937 0.030|
|f||0.933 0.030||0.824 0.075||0.819 0.071||0.834 0.069||0.919 0.035|
|B1||0.965 0.018||0.894 0.052||0.893 0.052||0.895 0.051||0.959 0.021|
3.2 Blurriness of the Parametric Map Reconstructions
In a post hoc analysis, we investigated the blurriness (or smoothness) of the reconstructions of the different methods. We analyzed the energy of the high frequencies in the parametric maps as a metric of blurriness, i.e., the ratio between the energy of the high frequencies and the energy of all frequencies (similar to Section 2.1 of the supplementary material of Fang2019). We defined the high frequencies in the spectrum of the parametric maps to be the frequencies above a certain threshold, which we varied from 55 to 95 % because defining one single threshold to separate low and high frequencies was difficult. The energy was defined as the sum of the squared magnitudes. Fig. 5 compares the blurriness of the T1H2O map reconstruction between the methods (see Section 3 of the supplementary material for the FF, T1fat, f, and B1 maps). For T1H2O, all methods clearly produced smoother reconstructions than the dictionary matching. Further, the visually smoother appearance of the reconstructions of Fang2019 can be confirmed quantitatively. For the FF and f maps, Oksuz2019 resulted in less smoothing than the dictionary matching, which also confirms the noisy appearance in Fig. 3.
3.3 Influence of the Spatial Dimension
The influence of the spatial dimension on the reconstruction quality, or in other words, to what extent the correlation of neighboring fingerprints is beneficial for the reconstruction, is to this date not well studied. Therefore, we varied the receptive field of the proposed CNN from 1 1, which corresponds to fingerprint-wise reconstruction, up to a receptive field of 21 21 using Algorithm 1. We further varied the number of parameters from 1 to 5 million in steps of 1 million (see Section 4 of the supplementary material for configurations). Fig. 6 shows the R2 of the T1H2O map reconstruction depending on the receptive field and the number of parameters777This experiment led to the choice of the final architecture with a receptive field of 15 15 and 5 million parameters. Therefore, the results in Fig. 6 are from the validation set because analyzing the test set would violate the independence of architecture design and test set.. It is visible that fingerprint-wise reconstruction results in significantly inferior reconstructions. Receptive fields around 15 15 seem to perform well with little to no added value when incorporating more fingerprints for the reconstruction. There is no significant change in performance with fewer or more number of parameters. The influence on the parametric maps and metrics except for the T1H2O map and R2 were less accentuated for receptive fields above 5 5. Increasing the receptive field beyond 15 15 had in some cases negative influence (see Section 4 of the supplementary material). Therefore, we did not experiment with receptive fields beyond 21 21. Considering all parametric maps and metrics, we chose a receptive field of 15 15 and 5 million parameters.
3.4 Influence of the Temporal Dimension
The influence of the temporal dimension, or in other words, to what extent the temporal frames contribute to the reconstruction, might be of valuable information for the MRF community. To investigate the influence of the temporal dimension, we reformulated the occlusion experiments proposed by Zeiler2014 for MRF. We occluded the -th temporal frame by setting its content to zero and reconstructed the parametric maps using this occluded MRF data as input. The absolute difference in NRMSE to the non-occluded reconstruction is then considered as the importance of the -th temporal frame of the MRF sequence. The importance of all temporal frames for reconstructing the five parametric maps is shown alongside the MRF sequence in Fig. 7. The first few temporal frames after the non-selective inversion pulse, which should be sensitive to T1, have the highest importance for the reconstruction of the T1H2O map. For T1fat the temporal frames after 125 are the most important when water and fat are out of phase. Generally, the temporal frames after the changes in the MRF T1-FF sequence parameters result in high importance for the reconstruction (i.e. at temporal frames 75, 100, 125, and 150).
3.5 Robustness to Heterogeneous Morphometric Variations
The results show that the proposed method reconstructs highly heterogeneous morphometric variations in NMD patients well. However, it is unclear how many training subjects are actually needed to obtain a model with good robustness. To investigate this, we randomly selected subsets of a varying number of training subjects from the training set, trained the proposed method with these subsets, and reconstructed the testing set to assess the robustness. The whole process was repeated five times. Fig. 8 summarizes the results of this experiment. With 40 training subjects, the proposed method reconstructs almost identically as when using the entire training set with 94 subjects. However, we also observe that the number of required training subjects depends on the metric of interest.
3.6 Generalization to Unseen Anatomical Regions
The generalization of deep learning-based MRF reconstruction methods to unseen anatomical regions during training has not been investigated so far, to the best of our knowledge. Therefore, we imaged three NMD patients at three anatomical regions distinctly different to the thigh and leg: the shoulder, the lower abdomen, and the pelvis. MRF acquisition, reconstruction, and evaluation were identical as described in Section 2.1 and Section 2.5. FF and T1H2O map reconstructions of the proposed method are shown in Fig. 9 and the R2s of the ROI analysis for all parametric maps and methods are summarized in Table 3 (see Section 5 of the supplementary material for all metrics). Qualitatively, the proposed method reconstructed the parametric maps with good quality. In tissues other than skeletal muscle and fatty tissue, the dictionary matching and the proposed reconstruction resulted in noisy parametric maps, which was expected due to the MRF T1-FF sequence’s purpose. Quantitatively, the proposed approach resulted in a slight decrease of performance for FF, f, and B1. For T1H2O and T1fat, the decrease is more significant (cf. Table 3). This decrease was even more accentuated for the method of Fang2019. And, despite working fingerprint-wise, the method of Oksuz2019 resulted in the worst reconstructions for unseen anatomical regions.
We have investigated the reconstruction of parametric maps from MRF using CNNs. Driven by the hypothesis that the reconstruction performance depends on the incorporation of neighboring fingerprints, i.e., the receptive field of the CNN, and the number of parameters, i.e., the capacity of the CNN, we have designed an algorithm for flexible architecture building based on the specific requirements of the MRF sequence to reconstruct. The configuration for MRF T1-FF was empirically determined to be a receptive field of 15 15 with five million parameters. With this configuration, we have shown that the proposed method yields accurate parametric map reconstruction, independent of the morphometric heterogeneity of imaged patients as well as unseen anatomical regions. The method is fast, enabling reconstruction of parametric maps in a clinical setting. Further, as shown qualitatively and quantitatively, better reconstruction results were achieved with the proposed method as compared to other deep learning-based methods.
The proposed method yielded an absolute reconstruction error lower than the dictionary sampling increment for all except the T1H2O map (Fig. 4). Therefore, we argue that there is no difference in reconstruction accuracy between the proposed method and the dictionary matching for the FF, T1fat, f, and B1 maps. Based on the observed differences for T1H2O map reconstructions, we concede that the sensitivity of MRF T1-FF to T1H2O might not be optimal. The fingerprints encode T1H2O to some extent, but for nuances in T1H2O
, they contain probably more noise than discriminative patterns. This observation can also be confirmed when comparing simulated fingerprints with close T1H2O values (see Section 6 of the supplementary material). Further, large receptive fields were mainly beneficial for T1H2O with a lower effect on the other parametric maps. The CNN might compensate for the low signal-to-noise ratio by regularizing spatially. We, therefore, also believe that modifications of the CNN architecture will bring limited additional reconstruction performance and that efforts are better invested at optimizing the MRF sequence than the deep learning-based reconstruction such as done recently (Cohen2017a; Zhao2019; Lee2019a).
We have analyzed that the receptive field, i.e., considering a spatial neighborhood of fingerprints, influences the reconstruction performance. On the one hand, fingerprint-wise reconstruction (receptive field of 1 1) is significantly inferior to spatial reconstruction. We attribute this mainly to the potentially high correlation of neighboring fingerprints coupled with the strong undersampling of MRF. On the other hand, spatial reconstruction has improved the reconstruction only to some extent (Fig. 6). Regarding the blurriness of the reconstruction, the optimal receptive field is a difficult choice but it lies likely between fingerprint-wise reconstruction and the large receptive field of Fang2019 (Fig. 5). Further, we have observed a larger decrease in performance for Fang2019 when reconstructing unseen anatomical regions. We attribute this decrease to the method’s large receptive field of 54 54 where fingerprints without valuable information, possibly influenced by susceptibility artifacts, were included in the reconstruction (cf. noisy tissues in the parametric maps). Therefore, we conclude that pooling operations with subsequent deconvolution operations, as e.g. in the U-Net-like architecture of Fang2019, are not needed for MRF reconstruction. Clearly, spatial regularization is superior to fingerprint-wise reconstruction but its extent dependents almost certainly on the MRF sequence due to various factors such as the sensitivity to the MR parameters, k-space sampling, in-plane voxel size, among others. But with the proposed algorithm, investigating this aspect becomes straightforward due to the CNN’s adaptability.
We have studied the influence of the temporal frames on the parametric map reconstruction. Such insights might be useful for further developments of MRF reconstruction, as well as the MRF sequence development itself. As expected, the inversion pulse yields high importance to the first few temporal frames for T1 parameters. The general correlation between abrupt sequence parameter changes and high importances hints at highly sensitive temporal frames to MR parameters, and, therefore, rich information for the reconstruction. Fang2019 proposed to reduce the time of the MRF acquisition by considering only the first fractions of the temporal frames. However, such an approach, although being straightforward, might be suboptimal considering that not all temporal frames might be of equal importance for the MRF reconstruction. For instance, in the case of MRF T1-FF, the temporal frames 50 to 75 as well as the last 25 temporal frames might be useless for the reconstruction, containing maybe redundant or irrelevant information. An acceleration of the MRF acquisition might be achieved by discarding these temporal frames without sacrificing reconstruction performance.
We have demonstrated an excellent robustness of the proposed method to heterogeneous morphometric variations and unseen anatomical regions, a desired key property for image reconstruction (Knoll2019). Previous MRF reconstruction studies were performed on small cohorts of healthy volunteers (Balsiger2018b; Fang2018; Fang2019), limiting their clinical significance. Here, we have presented, to the best of our knowledge, the first study on reconstructing highly undersampled MRF of patient data only (Table 1). To study the robustness of a method, NMDs are an excellent subject given their large phenotypic variability, the broad range of affected anatomical regions as well as patient age distribution. Our approach seems to be rather insensitive to such variability, which we also attribute to the large and heterogeneous training data. We have found that it is also possible to achieve good robustness with fewer training data (Fig. 8). However, the robustness is dependent on the parametric map as well as the desired property of the reconstruction (e.g., structural similarity and signal-to-noise). Further, we have found that the number of parameters of the CNN is a rather insensitive characteristic (Fig. 6). Nor a decrease in performance with fewer neither overfitting with more learnable parameters have been observed. Reproducing the results of Fig. 8 with varying number of parameters might give additional insights into a possible link between robustness and the number of parameters but is computationally unfeasible due to the immense number of training runs needed.
Our study has several limitations, which we plan to address in future work. First and foremost, the absence of a better reference for comparison than the dictionary matching is a significant issue. Ideally, the CNN reconstruction should be compared to parametric maps obtained by gold standard parametric mapping (Balsiger2018b). However, while this is possible for FF (e.g., using 3-point Dixon (Glover1991)), there exists no MR sequence for T1H2O mapping in fatty infiltrated tissue. Second, the optimal MRF data handling is still subject to further research. We have investigated several variants (see Section 1.3 of the supplementary material), but there are certainly open questions such as the complex-valued nature of the MRF data, e.g., reconstruction by complex-valued CNN (Virtue2017; Trabelsi2018). Also, the modeling of the temporal domain, e.g., by RNNs as presented by Oksuz2019 or by 3-D CNNs, needs further research. Finally, we can not make any statements of the performance on other MRF sequences. Ideally, a completely independent dataset acquired with another MRF sequence and with other diseases should be available to demonstrate applicability among MRF sequences.
In conclusion, we proposed an adaptable CNN for accurate and fast reconstruction of parametric maps from MRF. We demonstrated that incorporating a spatial neighborhood of fingerprints during the reconstruction is beneficial and that we achieved excellent reconstruction accuracy and robustness to heterogeneous patient data. The proposed method could enable MRF beyond clinical research studies.
This research was supported by the Swiss National Science Foundation (SNSF) under grant number 184273. The authors thank the Nvidia Corporation for their GPU donation and acknowledge the valuable discussions with Pierre-Yves Baudin.