1 Introduction
Magnetic Resonance Fingerprinting (MRF) recently emerged to accelerate acquisition of the quantitative NMR characteristics such as the T1, T2 and T2 relaxation times, field inhomogeneity and perfusion [1, 2, 3, 4]. As opposed to mainstream qualitative assessments these absolute physical quantities can be used for tissue or pathology identification independent of the scanner or scanning sequences. Unlike conventional quantitative approaches MRF uses i) short and often complicated excitation pulses which encode many NMR parameters simultaneously, and ii) significantly undersampled kspace data. To overcome the lack of sufficient spatiotemporal information MRF incorporates a physical model based on exhaustively simulating a large dictionary of magnetic responses (fingerprints) for all combinations of the quantized NMR parameters. This dictionary is then used for matchedfiltering in a modelbased reconstruction scheme e.g. [5]
. As occurs to any multiparametric manifold enumeration, the main drawback of such approach is the size of this dictionary which grows exponentially in terms of the number of parameters and their quantization resolution; a serious (nonscalability) limitation of the current methods to be applicable in the emerging multiparametric MRF applications. In conjunction with the widespread applications of machine learning methodologies, a number of recent empirical studies have proposed a dictionaryfree deep learning (DL) approach to address this shortcoming
[6, 7, 8, 9]the crux of which is to bypass the DM step by using compact deep neural networks. However, reasons that DL works so well for this problem are poorly understood.
This paper aims at uncovering the underlying mechanisms by which DL achieves such progress for the MRF framework from a geometrical point of view. We show that the MRFNet provides a piecewise affine approximation to the Bloch response manifold projection and that rather than memorizing the dictionary, the network efficiently clusters this manifold layerbylayer and implicitly learns a set of hierarchical matchedfilters
for affine parameter regression in each segment. Besides, we propose a competitive architecture to the current DL baselines. The proposed MRFNet features a (unsupervisedly learned) dimensionality reduction first layer which promotes a lowrank subspace prior during image reconstruction, and results in less units and training resources as required for the uncompressed DL approaches earlier proposed for this problem. Our
invivoexperiment for estimating two NMR quantities (i.e. a smallsize MRF problem) indicates that the MRFNet is capable of saving more than 60x in storage and modelfitting computations as required for a dimensionreduced DM baseline (Figure
1).2 Problem statement
MRF acquisitions follow a linear spatiotemporal model: Y≈(X), where denotes noisy kspace measurements collected at temporal frames after each excitation. The MRF image sequence is a complexvalued matrix of spatiotemporal resolution i.e. spatial voxels and temporal frames. The forward operator models multicoil sensitivity maps and a subsampled Fourier operator which represents the kspace acquisition with respect to a set of temporallyvarying locations where .
The main source of quantitative measurements are the pervoxel magnetization response of proton dipoles obtained from dynamic rotations of the external magnetic field i.e. a sequence of Flip Angles (FA) applying at certain repetition times . Tissues with different NMR characteristics respond distinctively to these excitations. The MRF framework relies on this principle to regularize the underdetermined problem (2) by a temporal model and enable parameter estimation. Magnetization trajectories (responses) —denoted by —are distinct solutions of the Bloch differential equations for a given set of intrinsic NMR parameters and excitation sequence [10]. Current MRF approaches discretize through a dense sampling the parameter space and simulate a large dictionary of normalized fingerprints where, D_j := ([Θ_j]; TR,α) ∀j=1,…,d, for all combinations of the quantized parameters. Under the voxel purity assumption each spatial voxel of the MRF image corresponds to a unique NMR parameter and would approximately match to a temporal trajectory in the fingerprint dictionary: where denotes the normalized th row of i.e. a multidimensional spatial voxel.
3 Parameter estimation
A popular approach for parameter estimation is to perform backprojection (adjoint operator) on the kspace data followed by dictionary matching to identify the highest correlated atom and its corresponding NMR parameters for each (normalized) voxel of the highly aliased backprojected image : [Θ_v] = NNS_D(^X_v), ∀v=1,…,n. Here denotes the nearest neighbour search which serves as a Euclidean projection onto the discrete set of fingerprints i.e. the manifold of Bloch Eq solutions. A temporal (subspace) compression can be used to shrink the search dimension i.e. across the dominant principal components of [11], and to promote a lowrank subspace prior during image reconstruction [12, 13]. However, enumerating the multiparametric MRF manifold in order for (3) to be an accurate projection introduces an exponentially growing complexity (in terms of ) to the storage and computations needed for conducting NNS. A recent line of research [14, 15] shows that certain tree search strategies can benefit from the low intrinsic dimensionality of the MRF manifold and significantly accelerate the matching step. However storage of the dictionary or the corresponding tree still remains a big challenge for finegrid enumerations.
3.1 MRFNet
In this study we propose training a 4layer fully connected feedforward network dubbed as the MRFNet for approximating the MRF manifold projection by a continuous mapping : Θ_v = (^X_v), where
. The first layer of MRFNet unsupervisedly learns the best linear projection onto the subspace of clean fingerprints through principal component analysis, and it is kept fixed during training other layers. Three other layers include nonlinear ReLU activations in order to approximate the dimensionreduced projection function
. The size of MRFNet is including an input layer fed with voxel sequences form the backprojected images, and 4 hidden layers as shown in Figure 2. Dimensions of the input/output and hidden units are customized here for the Steady State Precession (FISP) sequence [2] which encodes NMR characteristics i.e. relaxation times. The MRF dictionary corresponding to the FISP sequence is shown to be well represented by very few principal components [11] e.g. here, which determines first layer’s dimension accordingly. Thanks to this dimensionalityreduction, MRFNet requires far less units and training resources compared to the uncompressed DL approaches proposed earlier in [6, 7].3.1.1 Training MRFNet
Finegrid manifold enumeration i.e. the MRF dictionary is only used for training and not during image reconstruction. To avoid loosing discrimination between fingerprints —by the magnitudeonly data treatment proposed in [6]
—we adopt a phasealignment heuristic used in practice
[16, 17] to align dictionary atoms (for training) and backprojected images (the input). With this treatment we can assume that without loss of generality the MRFNet consists of realvalued parameters and approximates a realvalued mapping.A fully connected feedforward network is composed of (here
) layers each applying an affine transform followed by nonlinear activation functions. The network can be modelled as
through a hierarchy of hidden mappings initialized by inputs :(1) 
Here and are the weights and biases at the th layer and is an elementwise nonlinear activation function. The first Layer of MRFNet consists of identity activation , and unsupervised trained parameters . The remaining layers use ReLU activations for . To learn these layers, MRFNet is supervised trained to minimize the meansquared regression loss between .^{1}^{1}1We use Adam optimizer [18]
where the gradient updates are computed by the standard backpropagation on a moderate CPU desktop. Optimization parameters are as follows: batch size 50, 30 epochs and the stepsize
decaying at the rate of after each epoch. Training inputs are dimensionreduced (phasealigned) atoms of the finegrid MRF dictionary corrupted by zeromean independent Gaussian noises with SNRs randomly selected between 4060 dB. We use the Extended Phase Graph framework [19] to simulate Bloch Eq responses to the FISP sequence for all combinations of =[100:10:4000] (msec) and =[20:2:600] (msec), and build a dictionary with atoms for training. After noise corruption (i.e. data augmentation by factor 100) we perform NNS searches to find correct training labels (and not those originally generated the fingerprints): Θ_k:= NNS_~D(~D_k+ξ_k), which enables learning a projection mapping rather than a denoiser.4 Geometry of deep learning for Bloch response manifold projection
In this part we show that the MRFNet provides a piecewise affine approximation to the Bloch response manifold projection. Our analysis is inspired by the recent work [20] and is in relation to the authors’ previous works [14, 16] on adopting cover tree data structures to cluster dictionary atoms in hierarchical segments and using fast approximate NNS search for Dictionary Matching (DM). We show that MRFNet also clusters the input space, however as opposed to the cover trees, the network does not memorize the MRF dictionary but rather efficiently encodes a set of deep (hierarchical) matchedfilters for affine regression of the NMR quantities in each segment. Further, segments here have piecewise affine boundaries.
4.1 Affine spline function approximation
For a network composed of piecewise linear activation functions such as ReLU and the (linear) identity activation, we have that: Each layer’s output is a piecewise affine transformation of its direct input . Composition of such layers gives mappings which are piecewise affine transformations of the input (see e.g. [21]). Further, using continuous activation functions (as above) and for bounded , we have that is Lipschitz continuous.
In MRFNET the last nonlinearity is mainly used to impose nonnegativity of the estimated parameters, and therefore most of the prediction task is done by the preceding layers. We denote by z^(N) (x):= W^(N) h^(N1)+β^(N) as the weighted outputs before the last nonlinearity. We have the following affine spline representation for the weighted outputs [20]: z^(N)(x) = A[x]x+b[x] :=∑_r ( A_r x+b_r )1_Ω_r(x), where is the indicator function with respect to a segment , returning if it belongs to the segment and 0 otherwise —segments form a disjoint partitioning of the input space. Matrices
and vectors
define the corresponding inputoutput affine mapping for each segment. We use the shorthands to represent the inputdependent (piecewise affine) mapping of : inputdependent offsets represented by and similarly, an inputdependent matrix where each row represents a matchedfilter (acting on and measures the mutual correlation) corresponding to a certain output coordinate .Matchfilters and offsets here are used for regressing the outputs —not to be confused with the MRF dictionary matching (DM). In other word, during training the network learns or equivalently to provide a continuous and piecewise affine approximation between input and output e.g. the function (Bloch manifold projection) for the MRF reconstruction problem. The universal approximation theorem [22] states that a 2layer shallow network with large enough units can provide an arbitrarily close approximation to any Borelmeasurable function. Deeper networks however are often more favourable in practice to efficiently reduce the number of hidden units [23]. For certain manifold embedding tasks this has been proven e.g. in [24, 25, 26, 27].
4.2 Visualizing MRFNet’s segments on Bloch manifold
Finitesized networks with bounded weights and biases introduce piecewise affine boundaries for their corresponding segments [21]. It is easy to verify that each layer in (1) with aforementioned nonlinearities introduces segments with piecewise affine boundaries in its direct input space . Further each segment corresponds to an affine transformation of . Therefore, composition of such layers results in piecewise affine segments in the input space for the function .
Continuity of the mapping implies that adjacent segments correspond to distinct . Indeed, if and the only difference is in the offsets , then won’t intersect on boundaries. Therefore they are not adjacent segments unless contradicting the continuity assumption.
This remark gives an idea for visualizing the segments as follows: for densely sampled input signals , we compute gradients of the weighted output with respect to the input. The gradients determine inputdependant slopes in affine spline formulation (4.1) i.e. rows of at a point are populated as follows :^{2}^{2}2For a matrix we denote by as its th row. For a vector , denotes its th element. We also later denote by the coordinate vector whose th element is one and zero elsewhere.
(2) 
For a given input the gradients in (2) can be efficiently calculated using backpropagation. We feed forward to identify all activations and then follow the recursion starting at the vector :
(3) 
where denotes the elementwise vector product, and denotes the (elementwise) derivative of the activation function. For an identity activation
is an allone vector and for the ReLU activation it returns one for the positive (direct) input coordinates and zero elsewhere. By vector quantization (e.g. kmeans clustering) we cluster regions of
which output distinct slopes and identify . We note that for a classification problem [20] used a similar idea to separately identify each hidden layer’s segments and then intersect them in a hierarchical (layerbylayer) fashion to get .Finesampling could be used to visualize intersection of the MRFNet’s segments with the Bloch response manifold. For this purpose we compute (2) for inputs corresponding to a dense sample of grid i.e. the MRF dictionary. Figure 3 visualizes dominant MRFNet’s segments on the manifold of Bloch responses to the FISP sequence used in our experimental validations.
4.3 Deep matchedfiltering
The identity (4.1) interprets how data is treated by our network. Ignoring the offsets, the (two) rows of correlate with inputs belonging to a segment and predict and . Each segment of the input space has a distinct set of (two) matchedfilters i.e. rows of
, whose correlations with input data (belonging to that segment) will linearly regress the outputs. The endtoend matchedfiltering parameters (i.e.
) are implicitly learned during training the network and learning the corresponding layerbylayer affine transformations. In Figure 4 we choose two input segments that include standard values measured for the White (WM) and Gray (GM) Matters in healthy volunteers’ brains [28]. We show the clean magnetic responses (i.e. fingerprints) associated with that region together with the matchedfilters used for predicting and quantities. As can be seen, matchedfilters peak at discriminant parts of the fingerprints that is where the sequence encodes a significant amount of information (i.e. sensitivity) about the underlying NMR characteristics in that segment. These peaks are visible in the beginning of the FISP sequence due to using an Inversion Recovery (i.e. rotating the external magnetic field by ) and they repeat because of the periodic pattern of flip angles used for FISP excitations (see [2, Figure 1.b]).5 Invivo experiment
An invivo MRF dataset was acquired using the Steady State Precession (FISP) sequence in [2] and spiral readouts which sample kspace locations in each of the timeframes in order to reconstruct resolution parametric and maps. Other scanning parameters are TE=2/18 msec, 8 headcoils, 3 Tesla GE HDx MRI system (GE Medical Systems, Milwaukee, WI), variable density spiral sampling with 89 interleaves, 22.5x22.5cm FOV, 256x256 voxel spatial resolution with 5mm slice thickness. As discussed in Section 3.1.1, we simulate a dictionary of atoms which finely samples the grid . The baseline DM scheme requires direct access to this dictionary for NNS searches whereas the proposed DL approach only uses it for data augmentation and training the MRFNet. Figure 1 compares the reconstructed parametric maps using DM with bruteforce searches and the proposed MRFNet. Note that the computationmemory complexity of a dimensionreduced dictionary matching —without a fast tree search —is which in this example is more than 60 times higher than the requirements of the (dimensionreduced) MRFNet. This comparison is on a moderatesize MRF dictionary encoding only two parameters and we expect that for the emerging applications and dictionaries encoding a large number of intrinsic NMR characteristics e.g. , field inhomogeneity, perfusion, diffusion, etc, this gap substantially grows. We leave this direction for further future investigations.
6 Conclusion
In this paper we study a dictionaryless deep learning approach for the MRF reconstruction problem. Featuring a subspace compression in its first layer, the proposed MRFNet is compact, easily trained and is capable of achieving comparable estimation accuracy to a dimensionreduced DM baseline, however, with 60 times less storage and computations. The MRF dictionary is only used for training and not during image reconstruction. We show that the MRFNet provides a piecewise affine approximation to the Bloch response manifold projection through which, the network efficiently clusters the input space and learns hierarchical matchedfilters for affine regression of the quantitative parameters in each segment. Future directions could extend this work to applications with a large number of intrinsic NMR characteristics as well as incorporating spatial regularities by e.g. using convolutional networks [9].
References
 [1] D. Ma, V. Gulani, N. Seiberlich, K. Liu, J. Sunshine, J. Durek, and M. Griswold, “Magnetic resonance fingerprinting,” Nature, vol. 495, no. 7440, pp. 187–192, 2013.
 [2] N. Jiang Y, D. Ma, N. Seiberlich, V. Gulani, and M. Griswold, “MR fingerprinting using fast imaging with steady state precession (fisp) with spiral readout,” Magnetic resonance in medicine, vol. 74, no. 6, pp. 1621–1631, 2015.
 [3] K. L. Wright, Y. Jiang, D. Ma, D. C. Noll, M. A. Griswold, V. Gulani, and L. HernandezGarcia, “Estimation of perfusion properties with mr fingerprinting arterial spin labeling,” Magnetic resonance imaging, vol. 50, pp. 68–77, 2018.
 [4] B. Rieger, F. Zimmer, J. Zapp, S. Weingärtner, and L. R. Schad, “Magnetic resonance fingerprinting using echoplanar imaging: Joint quantification of T1 and T2* relaxation times,” Magnetic resonance in medicine, vol. 78, pp. 1724–1733, 2017.
 [5] M. Davies, G. Puy, P. Vandergheynst, and Y. Wiaux, “A compressed sensing framework for magnetic resonance fingerprinting,” SIAM Journal on Imaging Sciences, vol. 7, no. 4, pp. 2623–2656, 2014.
 [6] O. Cohen, B. Zhu, and M. S. Rosen, “MR fingerprinting deep reconstruction network (DRONE),” Magnetic resonance in medicine, vol. 80, no. 3, pp. 885–894, 2018.
 [7] P. Virtue, X. Y. Stella, and M. Lustig, “Better than real: Complexvalued neural nets for mri fingerprinting,” in Image Processing (ICIP), 2017 IEEE International Conference on. IEEE, 2017, pp. 3953–3957.
 [8] E. Hoppe, G. Körzdörfer, T. Würfl, J. Wetzl, F. Lugauer, J. Pfeuffer, and A. Maier, “Deep learning for magnetic resonance fingerprinting: A new approach for predicting quantitative parameter values from time series.” Studies in health technology and informatics, vol. 243, p. 202, 2017.

[9]
F. Balsiger, A. S. Konar, S. Chikop, V. Chandran, O. Scheidegger, S. Geethanath, and M. Reyes, “Magnetic resonance fingerprinting reconstruction via spatiotemporal convolutional neural networks,” in
International Workshop on Machine Learning for Medical Image Reconstruction. Springer, 2018, pp. 39–46.  [10] E. Jaynes, “Matrix treatment of nuclear induction,” Physical Review, vol. 98, no. 4, p. 1099, 1955.
 [11] D. F. McGivney, E. Pierre, D. Ma, Y. Jiang, H. Saybasili, V. Gulani, and M. A. Griswold, “SVD compression for magnetic resonance fingerprinting in the time domain,” IEEE transactions on medical imaging, vol. 33, no. 12, pp. 2311–2322, 2014.
 [12] J. Assländer, M. A. Cloos, F. Knoll, D. K. Sodickson, J. Hennig, and R. Lattanzi, “Low rank alternating direction method of multipliers reconstruction for mr fingerprinting,” Magnetic resonance in medicine, vol. 79, no. 1, pp. 83–96, 2018.
 [13] B. Zhao, K. Setsompop, E. Adalsteinsson, B. Gagoski, H. Ye, D. Ma, Y. Jiang, P. Ellen Grant, M. A. Griswold, and L. L. Wald, “Improved magnetic resonance fingerprinting reconstruction with lowrank and subspace modeling,” Magnetic resonance in medicine, vol. 79, no. 2, pp. 933–942, 2018.
 [14] M. Golbabaee, Z. Chen, Y. Wiaux, and M. E. Davies, “Cover tree compressed sensing for fast MR fingerprint recovery,” in 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), Sept 2017, pp. 1–6.
 [15] M. Golbabaee and M. E. Davies, “Inexact gradient projection and fast data driven compressed sensing,” IEEE Transactions on Information Theory, vol. 64, no. 10, pp. 6707–6721, 2018.
 [16] M. Golbabaee, Z. Chen, Y. Wiaux, and M. Davies, “Coverblip: accelerated and scalable iterative matchedfiltering for magnetic resonance fingerprint reconstruction,” arXiv preprint arXiv:1810.01967, 2018.
 [17] C. C. Cline, X. Chen, B. Mailhe, Q. Wang, J. Pfeuffer, M. Nittka, M. A. Griswold, P. Speier, and M. S. Nadar, “Airmrf: Accelerated iterative reconstruction for magnetic resonance fingerprinting,” Magnetic resonance imaging, vol. 41, pp. 29–40, 2017.
 [18] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
 [19] M. Weigel, “Extended phase graphs: Dephasing, RF pulses, and echoespure and simple,” Journal of Magnetic Resonance Imaging, vol. 41, no. 2, pp. 266–295, 2015.
 [20] R. Balestriero and richard baraniuk, “A spline theory of deep learning,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 80. PMLR, 10–15 Jul 2018, pp. 374–383.
 [21] G. F. Montufar, R. Pascanu, K. Cho, and Y. Bengio, “On the number of linear regions of deep neural networks,” in Advances in neural information processing systems, 2014, pp. 2924–2932.

[22]
G. Cybenko, “Approximation by superpositions of a sigmoidal function,”
Mathematics of control, signals and systems, vol. 2, no. 4, pp. 303–314, 1989.  [23] O. Delalleau and Y. Bengio, “Shallow vs. deep sumproduct networks,” in Advances in Neural Information Processing Systems, 2011, pp. 666–674.
 [24] R. Basri and D. Jacobs, “Efficient representation of lowdimensional manifolds using deep networks,” arXiv preprint arXiv:1602.04723, 2016.
 [25] U. Shaham, A. Cloninger, and R. R. Coifman, “Provable approximation properties for deep neural networks,” Applied and Computational Harmonic Analysis, 2016.
 [26] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst, “Geometric deep learning: going beyond euclidean data,” IEEE Signal Processing Magazine, vol. 34, no. 4, pp. 18–42, 2017.
 [27] D. Chen, J. Lv, and Y. Zhang, “Unsupervised multimanifold clustering by learning deep representation,” in AAAI Workshops, 2017.
 [28] J. P. Wansapura, S. K. Holland, R. S. Dunn, and W. S. Ball Jr, “Nmr relaxation times in the human brain at 3.0 tesla,” Journal of Magnetic Resonance Imaging, vol. 9, no. 4, pp. 531–538, 1999.
Comments
There are no comments yet.