Magnetic Resonance Fingerprinting (MRF) recently emerged to accelerate acquisition of the quantitative NMR characteristics such as the T1, T2 and T2 relaxation times, field inhomogeneity and perfusion [1, 2, 3, 4]. As opposed to mainstream qualitative assessments these absolute physical quantities can be used for tissue or pathology identification independent of the scanner or scanning sequences. Unlike conventional quantitative approaches MRF uses i) short and often complicated excitation pulses which encode many NMR parameters simultaneously, and ii) significantly undersampled k-space data. To overcome the lack of sufficient spatio-temporal information MRF incorporates a physical model based on exhaustively simulating a large dictionary of magnetic responses (fingerprints) for all combinations of the quantized NMR parameters. This dictionary is then used for matched-filtering in a model-based reconstruction scheme e.g. 
. As occurs to any multi-parametric manifold enumeration, the main drawback of such approach is the size of this dictionary which grows exponentially in terms of the number of parameters and their quantization resolution; a serious (non-scalability) limitation of the current methods to be applicable in the emerging multi-parametric MRF applications. In conjunction with the widespread applications of machine learning methodologies, a number of recent empirical studies have proposed a dictionary-free deep learning (DL) approach to address this shortcoming[6, 7, 8, 9]
the crux of which is to bypass the DM step by using compact deep neural networks. However, reasons that DL works so well for this problem are poorly understood.
This paper aims at uncovering the underlying mechanisms by which DL achieves such progress for the MRF framework from a geometrical point of view. We show that the MRF-Net provides a piece-wise affine approximation to the Bloch response manifold projection and that rather than memorizing the dictionary, the network efficiently clusters this manifold layer-by-layer and implicitly learns a set of hierarchical matched-filters
for affine parameter regression in each segment. Besides, we propose a competitive architecture to the current DL baselines. The proposed MRF-Net features a (unsupervisedly learned) dimensionality reduction first layer which promotes a low-rank subspace prior during image reconstruction, and results in less units and training resources as required for the uncompressed DL approaches earlier proposed for this problem. Ourin-vivo
experiment for estimating two NMR quantities (i.e. a small-size MRF problem) indicates that the MRF-Net is capable of saving more than 60x in storage and model-fitting computations as required for a dimension-reduced DM baseline (Figure1).
2 Problem statement
MRF acquisitions follow a linear spatio-temporal model: Y≈(X), where denotes noisy k-space measurements collected at temporal frames after each excitation. The MRF image sequence is a complex-valued matrix of spatio-temporal resolution i.e. spatial voxels and temporal frames. The forward operator models multi-coil sensitivity maps and a sub-sampled Fourier operator which represents the k-space acquisition with respect to a set of temporally-varying locations where .
The main source of quantitative measurements are the per-voxel magnetization response of proton dipoles obtained from dynamic rotations of the external magnetic field i.e. a sequence of Flip Angles (FA) applying at certain repetition times . Tissues with different NMR characteristics respond distinctively to these excitations. The MRF framework relies on this principle to regularize the under-determined problem (2) by a temporal model and enable parameter estimation. Magnetization trajectories (responses) —denoted by —are distinct solutions of the Bloch differential equations for a given set of intrinsic NMR parameters and excitation sequence . Current MRF approaches discretize through a dense sampling the parameter space and simulate a large dictionary of normalized fingerprints where, D_j := ([Θ_j]; TR,α) ∀j=1,…,d, for all combinations of the quantized parameters. Under the voxel purity assumption each spatial voxel of the MRF image corresponds to a unique NMR parameter and would approximately match to a temporal trajectory in the fingerprint dictionary: where denotes the normalized -th row of i.e. a multi-dimensional spatial voxel.
3 Parameter estimation
A popular approach for parameter estimation is to perform back-projection (adjoint operator) on the k-space data followed by dictionary matching to identify the highest correlated atom and its corresponding NMR parameters for each (normalized) voxel of the highly aliased back-projected image : [Θ_v] = NNS_D(^X_v), ∀v=1,…,n. Here denotes the nearest neighbour search which serves as a Euclidean projection onto the discrete set of fingerprints i.e. the manifold of Bloch Eq solutions. A temporal (subspace) compression can be used to shrink the search dimension i.e. across the dominant principal components of , and to promote a low-rank subspace prior during image reconstruction [12, 13]. However, enumerating the multi-parametric MRF manifold in order for (3) to be an accurate projection introduces an exponentially growing complexity (in terms of ) to the storage and computations needed for conducting NNS. A recent line of research [14, 15] shows that certain tree search strategies can benefit from the low intrinsic dimensionality of the MRF manifold and significantly accelerate the matching step. However storage of the dictionary or the corresponding tree still remains a big challenge for fine-grid enumerations.
In this study we propose training a 4-layer fully connected feed-forward network dubbed as the MRF-Net for approximating the MRF manifold projection by a continuous mapping : Θ_v = (^X_v), where
. The first layer of MRF-Net unsupervisedly learns the best linear projection onto the subspace of clean fingerprints through principal component analysis, and it is kept fixed during training other layers. Three other layers include nonlinear ReLU activations in order to approximate the dimension-reduced projection function. The size of MRF-Net is including an input layer fed with voxel sequences form the back-projected images, and 4 hidden layers as shown in Figure 2. Dimensions of the input/output and hidden units are customized here for the Steady State Precession (FISP) sequence  which encodes NMR characteristics i.e. relaxation times. The MRF dictionary corresponding to the FISP sequence is shown to be well represented by very few principal components  e.g. here, which determines first layer’s dimension accordingly. Thanks to this dimensionality-reduction, MRF-Net requires far less units and training resources compared to the uncompressed DL approaches proposed earlier in [6, 7].
3.1.1 Training MRF-Net
Fine-grid manifold enumeration i.e. the MRF dictionary is only used for training and not during image reconstruction. To avoid loosing discrimination between fingerprints —by the magnitude-only data treatment proposed in 
—we adopt a phase-alignment heuristic used in practice[16, 17] to align dictionary atoms (for training) and back-projected images (the input). With this treatment we can assume that without loss of generality the MRF-Net consists of real-valued parameters and approximates a real-valued mapping.
A fully connected feed-forward network is composed of (here
) layers each applying an affine transform followed by non-linear activation functions. The network can be modelled asthrough a hierarchy of hidden mappings initialized by inputs :
Here and are the weights and biases at the -th layer and is an element-wise nonlinear activation function. The first Layer of MRF-Net consists of identity activation , and unsupervised trained parameters .
The remaining layers use ReLU activations for . To learn these layers, MRF-Net is supervised trained to minimize the mean-squared regression loss between .111We use Adam optimizer  where the gradient updates
are computed by the standard back-propagation on a moderate CPU desktop. Optimization parameters are as follows: batch size 50, 30 epochs and the step-size
where the gradient updates are computed by the standard back-propagation on a moderate CPU desktop. Optimization parameters are as follows: batch size 50, 30 epochs and the step-sizedecaying at the rate of after each epoch. Training inputs are dimension-reduced (phase-aligned) atoms of the fine-grid MRF dictionary corrupted by zero-mean independent Gaussian noises with SNRs randomly selected between 40-60 dB. We use the Extended Phase Graph framework  to simulate Bloch Eq responses to the FISP sequence for all combinations of =[100:10:4000] (msec) and =[20:2:600] (msec), and build a dictionary with atoms for training. After noise corruption (i.e. data augmentation by factor 100) we perform NNS searches to find correct training labels (and not those originally generated the fingerprints): Θ_k:= NNS_~D(~D_k+ξ_k), which enables learning a projection mapping rather than a denoiser.
4 Geometry of deep learning for Bloch response manifold projection
In this part we show that the MRF-Net provides a piece-wise affine approximation to the Bloch response manifold projection. Our analysis is inspired by the recent work  and is in relation to the authors’ previous works [14, 16] on adopting cover tree data structures to cluster dictionary atoms in hierarchical segments and using fast approximate NNS search for Dictionary Matching (DM). We show that MRF-Net also clusters the input space, however as opposed to the cover trees, the network does not memorize the MRF dictionary but rather efficiently encodes a set of deep (hierarchical) matched-filters for affine regression of the NMR quantities in each segment. Further, segments here have piece-wise affine boundaries.
4.1 Affine spline function approximation
For a network composed of piecewise linear activation functions such as ReLU and the (linear) identity activation, we have that: Each layer’s output is a piece-wise affine transformation of its direct input . Composition of such layers gives mappings which are piece-wise affine transformations of the input (see e.g. ). Further, using continuous activation functions (as above) and for bounded , we have that is Lipschitz continuous.
In MRF-NET the last non-linearity is mainly used to impose non-negativity of the estimated parameters, and therefore most of the prediction task is done by the preceding layers. We denote by z^(N) (x):= W^(N) h^(N-1)+β^(N) as the weighted outputs before the last non-linearity. We have the following affine spline representation for the weighted outputs : z^(N)(x) = A[x]x+b[x] :=∑_r ( A_r x+b_r )1_Ω_r(x), where is the indicator function with respect to a segment , returning if it belongs to the segment and 0 otherwise —segments form a disjoint partitioning of the input space. Matrices
and vectorsdefine the corresponding input-output affine mapping for each segment. We use the shorthands to represent the input-dependent (piece-wise affine) mapping of : input-dependent offsets represented by and similarly, an input-dependent matrix where each row represents a matched-filter (acting on and measures the mutual correlation) corresponding to a certain output coordinate .
Match-filters and offsets here are used for regressing the outputs —not to be confused with the MRF dictionary matching (DM). In other word, during training the network learns or equivalently to provide a continuous and piece-wise affine approximation between input and output e.g. the function (Bloch manifold projection) for the MRF reconstruction problem. The universal approximation theorem  states that a 2-layer shallow network with large enough units can provide an arbitrarily close approximation to any Borel-measurable function. Deeper networks however are often more favourable in practice to efficiently reduce the number of hidden units . For certain manifold embedding tasks this has been proven e.g. in [24, 25, 26, 27].
4.2 Visualizing MRF-Net’s segments on Bloch manifold
Finite-sized networks with bounded weights and biases introduce piecewise affine boundaries for their corresponding segments . It is easy to verify that each layer in (1) with aforementioned non-linearities introduces segments with piecewise affine boundaries in its direct input space . Further each segment corresponds to an affine transformation of . Therefore, composition of such layers results in piecewise affine segments in the input space for the function .
Continuity of the mapping implies that adjacent segments correspond to distinct . Indeed, if and the only difference is in the offsets , then won’t intersect on boundaries. Therefore they are not adjacent segments unless contradicting the continuity assumption.
This remark gives an idea for visualizing the segments as follows: for densely sampled input signals , we compute gradients of the weighted output with respect to the input. The gradients determine input-dependant slopes in affine spline formulation (4.1) i.e. rows of at a point are populated as follows :222For a matrix we denote by as its -th row. For a vector , denotes its -th element. We also later denote by the coordinate vector whose -th element is one and zero elsewhere.
For a given input the gradients in (2) can be efficiently calculated using back-propagation. We feed forward to identify all activations and then follow the recursion starting at the vector :
where denotes the element-wise vector product, and denotes the (element-wise) derivative of the activation function. For an identity activation
is an all-one vector and for the ReLU activation it returns one for the positive (direct) input coordinates and zero elsewhere. By vector quantization (e.g. k-means clustering) we cluster regions ofwhich output distinct slopes and identify . We note that for a classification problem  used a similar idea to separately identify each hidden layer’s segments and then intersect them in a hierarchical (layer-by-layer) fashion to get .
Fine-sampling could be used to visualize intersection of the MRF-Net’s segments with the Bloch response manifold. For this purpose we compute (2) for inputs corresponding to a dense sample of grid i.e. the MRF dictionary. Figure 3 visualizes dominant MRF-Net’s segments on the manifold of Bloch responses to the FISP sequence used in our experimental validations.
4.3 Deep matched-filtering
The identity (4.1) interprets how data is treated by our network. Ignoring the offsets, the (two) rows of correlate with inputs belonging to a segment and predict and . Each segment of the input space has a distinct set of (two) matched-filters i.e. rows of
, whose correlations with input data (belonging to that segment) will linearly regress the outputs. The end-to-end matched-filtering parameters (i.e.) are implicitly learned during training the network and learning the corresponding layer-by-layer affine transformations. In Figure 4 we choose two input segments that include standard values measured for the White (WM) and Gray (GM) Matters in healthy volunteers’ brains . We show the clean magnetic responses (i.e. fingerprints) associated with that region together with the matched-filters used for predicting and quantities. As can be seen, matched-filters peak at discriminant parts of the fingerprints that is where the sequence encodes a significant amount of information (i.e. sensitivity) about the underlying NMR characteristics in that segment. These peaks are visible in the beginning of the FISP sequence due to using an Inversion Recovery (i.e. rotating the external magnetic field by ) and they repeat because of the periodic pattern of flip angles used for FISP excitations (see [2, Figure 1.b]).
5 In-vivo experiment
An in-vivo MRF dataset was acquired using the Steady State Precession (FISP) sequence in  and spiral readouts which sample k-space locations in each of the time-frames in order to reconstruct resolution parametric and maps. Other scanning parameters are TE=2/18 msec, 8 head-coils, 3 Tesla GE HDx MRI system (GE Medical Systems, Milwaukee, WI), variable density spiral sampling with 89 interleaves, 22.5x22.5cm FOV, 256x256 voxel spatial resolution with 5mm slice thickness. As discussed in Section 3.1.1, we simulate a dictionary of atoms which finely samples the grid . The baseline DM scheme requires direct access to this dictionary for NNS searches whereas the proposed DL approach only uses it for data augmentation and training the MRF-Net. Figure 1 compares the reconstructed parametric maps using DM with brute-force searches and the proposed MRF-Net. Note that the computation-memory complexity of a dimension-reduced dictionary matching —without a fast tree search —is which in this example is more than 60 times higher than the requirements of the (dimension-reduced) MRF-Net. This comparison is on a moderate-size MRF dictionary encoding only two parameters and we expect that for the emerging applications and dictionaries encoding a large number of intrinsic NMR characteristics e.g. , field inhomogeneity, perfusion, diffusion, etc, this gap substantially grows. We leave this direction for further future investigations.
In this paper we study a dictionary-less deep learning approach for the MRF reconstruction problem. Featuring a subspace compression in its first layer, the proposed MRF-Net is compact, easily trained and is capable of achieving comparable estimation accuracy to a dimension-reduced DM baseline, however, with 60 times less storage and computations. The MRF dictionary is only used for training and not during image reconstruction. We show that the MRF-Net provides a piece-wise affine approximation to the Bloch response manifold projection through which, the network efficiently clusters the input space and learns hierarchical matched-filters for affine regression of the quantitative parameters in each segment. Future directions could extend this work to applications with a large number of intrinsic NMR characteristics as well as incorporating spatial regularities by e.g. using convolutional networks .
-  D. Ma, V. Gulani, N. Seiberlich, K. Liu, J. Sunshine, J. Durek, and M. Griswold, “Magnetic resonance fingerprinting,” Nature, vol. 495, no. 7440, pp. 187–192, 2013.
-  N. Jiang Y, D. Ma, N. Seiberlich, V. Gulani, and M. Griswold, “MR fingerprinting using fast imaging with steady state precession (fisp) with spiral readout,” Magnetic resonance in medicine, vol. 74, no. 6, pp. 1621–1631, 2015.
-  K. L. Wright, Y. Jiang, D. Ma, D. C. Noll, M. A. Griswold, V. Gulani, and L. Hernandez-Garcia, “Estimation of perfusion properties with mr fingerprinting arterial spin labeling,” Magnetic resonance imaging, vol. 50, pp. 68–77, 2018.
-  B. Rieger, F. Zimmer, J. Zapp, S. Weingärtner, and L. R. Schad, “Magnetic resonance fingerprinting using echo-planar imaging: Joint quantification of T1 and T2* relaxation times,” Magnetic resonance in medicine, vol. 78, pp. 1724–1733, 2017.
-  M. Davies, G. Puy, P. Vandergheynst, and Y. Wiaux, “A compressed sensing framework for magnetic resonance fingerprinting,” SIAM Journal on Imaging Sciences, vol. 7, no. 4, pp. 2623–2656, 2014.
-  O. Cohen, B. Zhu, and M. S. Rosen, “MR fingerprinting deep reconstruction network (DRONE),” Magnetic resonance in medicine, vol. 80, no. 3, pp. 885–894, 2018.
-  P. Virtue, X. Y. Stella, and M. Lustig, “Better than real: Complex-valued neural nets for mri fingerprinting,” in Image Processing (ICIP), 2017 IEEE International Conference on. IEEE, 2017, pp. 3953–3957.
-  E. Hoppe, G. Körzdörfer, T. Würfl, J. Wetzl, F. Lugauer, J. Pfeuffer, and A. Maier, “Deep learning for magnetic resonance fingerprinting: A new approach for predicting quantitative parameter values from time series.” Studies in health technology and informatics, vol. 243, p. 202, 2017.
F. Balsiger, A. S. Konar, S. Chikop, V. Chandran, O. Scheidegger, S. Geethanath, and M. Reyes, “Magnetic resonance fingerprinting reconstruction via spatiotemporal convolutional neural networks,” inInternational Workshop on Machine Learning for Medical Image Reconstruction. Springer, 2018, pp. 39–46.
-  E. Jaynes, “Matrix treatment of nuclear induction,” Physical Review, vol. 98, no. 4, p. 1099, 1955.
-  D. F. McGivney, E. Pierre, D. Ma, Y. Jiang, H. Saybasili, V. Gulani, and M. A. Griswold, “SVD compression for magnetic resonance fingerprinting in the time domain,” IEEE transactions on medical imaging, vol. 33, no. 12, pp. 2311–2322, 2014.
-  J. Assländer, M. A. Cloos, F. Knoll, D. K. Sodickson, J. Hennig, and R. Lattanzi, “Low rank alternating direction method of multipliers reconstruction for mr fingerprinting,” Magnetic resonance in medicine, vol. 79, no. 1, pp. 83–96, 2018.
-  B. Zhao, K. Setsompop, E. Adalsteinsson, B. Gagoski, H. Ye, D. Ma, Y. Jiang, P. Ellen Grant, M. A. Griswold, and L. L. Wald, “Improved magnetic resonance fingerprinting reconstruction with low-rank and subspace modeling,” Magnetic resonance in medicine, vol. 79, no. 2, pp. 933–942, 2018.
-  M. Golbabaee, Z. Chen, Y. Wiaux, and M. E. Davies, “Cover tree compressed sensing for fast MR fingerprint recovery,” in 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), Sept 2017, pp. 1–6.
-  M. Golbabaee and M. E. Davies, “Inexact gradient projection and fast data driven compressed sensing,” IEEE Transactions on Information Theory, vol. 64, no. 10, pp. 6707–6721, 2018.
-  M. Golbabaee, Z. Chen, Y. Wiaux, and M. Davies, “Coverblip: accelerated and scalable iterative matched-filtering for magnetic resonance fingerprint reconstruction,” arXiv preprint arXiv:1810.01967, 2018.
-  C. C. Cline, X. Chen, B. Mailhe, Q. Wang, J. Pfeuffer, M. Nittka, M. A. Griswold, P. Speier, and M. S. Nadar, “Air-mrf: Accelerated iterative reconstruction for magnetic resonance fingerprinting,” Magnetic resonance imaging, vol. 41, pp. 29–40, 2017.
-  D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
-  M. Weigel, “Extended phase graphs: Dephasing, RF pulses, and echoes-pure and simple,” Journal of Magnetic Resonance Imaging, vol. 41, no. 2, pp. 266–295, 2015.
-  R. Balestriero and richard baraniuk, “A spline theory of deep learning,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 80. PMLR, 10–15 Jul 2018, pp. 374–383.
-  G. F. Montufar, R. Pascanu, K. Cho, and Y. Bengio, “On the number of linear regions of deep neural networks,” in Advances in neural information processing systems, 2014, pp. 2924–2932.
G. Cybenko, “Approximation by superpositions of a sigmoidal function,”Mathematics of control, signals and systems, vol. 2, no. 4, pp. 303–314, 1989.
-  O. Delalleau and Y. Bengio, “Shallow vs. deep sum-product networks,” in Advances in Neural Information Processing Systems, 2011, pp. 666–674.
-  R. Basri and D. Jacobs, “Efficient representation of low-dimensional manifolds using deep networks,” arXiv preprint arXiv:1602.04723, 2016.
-  U. Shaham, A. Cloninger, and R. R. Coifman, “Provable approximation properties for deep neural networks,” Applied and Computational Harmonic Analysis, 2016.
-  M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst, “Geometric deep learning: going beyond euclidean data,” IEEE Signal Processing Magazine, vol. 34, no. 4, pp. 18–42, 2017.
-  D. Chen, J. Lv, and Y. Zhang, “Unsupervised multi-manifold clustering by learning deep representation,” in AAAI Workshops, 2017.
-  J. P. Wansapura, S. K. Holland, R. S. Dunn, and W. S. Ball Jr, “Nmr relaxation times in the human brain at 3.0 tesla,” Journal of Magnetic Resonance Imaging, vol. 9, no. 4, pp. 531–538, 1999.