1 Introduction
Parallel imaging (PI) [11, 22] and Compressed Sensing MRI (CSMRI) [19] are widely used technique for acquiring and reconstructing undersampled kspace data thereby shortening scanning times in MRI. CSMRI is a computational technique that suppresses incoherent noiselike artifacts introduced by random undersampling, often via a regularized regression strategy. Combining CSMRI with PI promises to make MRI much more accessible and affordable. Therefore, this has been an intense area of research in the past decade [9, 20, 28]
. One major task in PI CSMRI is designing a random undersampling pattern, conventionally controlled by a variabledensity probabilistic density function (PDF). However, the design of the ‘optimal’ undersampling pattern remains an open problem for which heuristic solutions have been proposed. For example,
[18] generated the sampling pattern based on the power spectrum of an existing reference dataset; [12] combined experimental design with the constrained CramerRao bound to generate the contextspecific sampling pattern; [10] designed a parameterfree greedy pattern selection method to find a sampling pattern that performed well on average for the MRI data in a training set.Recently, with the success of learning based kspace reconstruction methods [1, 13, 24, 30]
, a datadriven machine learning based approach called LOUPE
[2] was proposed as a principled and practical solution for optimizing the undersampling pattern in CSMRI. In LOUPE, fully sampled kspace data was simulated from magnitude MR images and retrospective undersampling was deployed on the simulated kspace data. A sampling pattern optimization network and a modified UNet [23]as the undersampled image reconstruction network were trained together in LOUPE to optimize both the kspace undersampling pattern and reconstruction process. In the sampling pattern optimization network, one sigmoid operation was used to map the learnable weights into probability values, and a second sigmoid operation was used to approximate the nondifferentiable step function for stochastic sampling, as the gradient needed to be backpropagated through such layer to update the learnable weights. After training, both optimal sampling pattern and reconstruction network were obtained. For a detailed description of LOUPE we refer the reader to
[2].In this work, we extended LOUPE in three ways. Firstly, inhouse multicoil invivo fully sampled T2weighted kspace data from MR scanner was used to learn the optimal sampling pattern and reconstruction network. Secondly, modified UNet [23] as the reconstruction network in LOUPE was extended to a modified unrolled reconstruction network with learned regularization term in order to reconstruct multicoil data in PI with proper data consistency and reduce the dependency on training data when training cases were scarce. Thirdly, approximate stochastic sampling layer was replaced by a binary stochastic sampling layer with StraightThrough (ST) estimator [3], which was used to avoid zero gradients when backpropagating to this layer. Fully sampled data was acquired in healthy subjects. Undersampled data was generated by retrospective undersampling using various sampling patterns. Reconstructions were performed using different methods and compared.
2 Method
In PI CSMRI, given an undersampling pattern and the corresponding acquired kspace data, a reconstructed image is obtained via minimizing the following objective function:
(1) 
where the MR image to reconstruct, the coil sensitivity map of th coil, the number of receiver coils,
the Fourier transform,
the kspace undersampling pattern, and the acquired undersampled kspace data of the th coil. is a regularization term, such as Total Variation (TV) [21] or wavelet [8]. The minimization in Eq. 1 is performed using iterative solvers, such as the QuasiNewton method [7], the alternating direction method of multipliers (ADMM) [4] or the primaldual method [5]. Eq. 1 can also be mimicked by learning a parameterized mapping such as neural network from input to output . We denote the mapping using either iterative solvers or deep neural networks as .Our goal is to obtain an optimal undersampling pattern for a fixed undersampling ratio from fully sampled data through retrospective undersampling. The mathematical formulation of this problem is:
(2) 
where the th MR image reconstructed by direct inverse Fourier transform from fully sampled kspace data ,
the loss function to measure the similarity between reconstructed image
and fully sampled label , the constraint set of to define how is generated with a fixed undersampling ratio . The bilevel optimization problem [6] of Eq. (2) was solved in LOUPE [2] via jointly optimizing a modified UNet [23] as and an approximate stochastic sampling process as on a large volume of simulated kspace data from magnitude MR images. However, for invivo kspace data with multicoil acquisition as in PI, both UNet architecture for reconstruction and approximate stochastic sampling for pattern generation could be suboptimal. Specifically, due to limited training size of invivo data and no kspace consistency imposed in UNet, inferior reconstructions could happen in test and even training datasets. And the approximate stochastic sampling process generated fractional rather than 01 binary patterns during training, which might not work well during test as binary patterns should be used for realistic kspace sampling. In view of the above, we extend and improve LOUPE in terms of both reconstruction mapping and sampling pattern’s generating process when working on invivo multicoil kspace data in this work.2.1 Unrolled Reconstruction Network
A modified residual UNet [23] was used as the reconstruction network in LOUPE [2] to map from the zerofilled kspace reconstruction input to the fullysampled kspace reconstruction output. UNet works fine with simulated kspace reconstruction when enough training data of magnitude MR images are given, but as for invivo multicoil kspace data, training cases are usually scarce, since fullysampled scans are time consuming and as a result, only a few fullysampled cases can be acquired.
To reduce the dependency on training dataset and improve the data consistency of deep learning reconstructed images, combining neural network block for the regularization term in Eq.
1 with iterative optimization scheme to solve Eq. 1 has been explored in recent years [1, 13, 24], which are called ”unrolled optimization/reconstruction networks” in general. Prior works showed that such unrolled networks performed well for multicoil kspace reconstruction task by means of inserting measured kspace data into the network architecture to solve Eq. 1 with a learningbased regularization. In light of the success of such unrolled reconstruction networks, we apply a modified MoDL [1] as the reconstruction network in this work. MoDL unrolled the quasiNewton optimization scheme to solve Eq. 1 with a neural network based denoiser as the regularization term , and conjugate gradient (CG) descent block was applied in MoDL architecture to solve regularized problem. Besides, we will show that such unrolled network architecture also works as the skip connections for sampling pattern weights’ updating as the generated pattern is connected to each intermediate CG block to perform regularized data consistency (Fig. 1).2.2 ST Estimator for Binary Pattern
In LOUPE [2], a probabilistic pattern was defined as with hyperparameter and trainable weights . The binary kspace sampling pattern
was assumed to follow a Bernoulli distribution
independently on each kspace location. was generated from as , where and the pointwise indicator function on the truth values of . However, indicator function has zero gradient almost everywhere when backpropagating through it. LOUPE addressed this issue by approximatingusing another sigmoid function:
with hyperparameter .Although the gradient issue was solved in LOUPE, was approximated as a fraction between
on each kspace location instead of the binary pattern deployed in both test phase and realistic MR scan. As a result, binary sampling patterns generated in test phase could yield inferior performance due to such mismatch with training phase. To address this issue, binary patterns are also needed during training phase, at the same time gradient backpropagating through binary sampling layer should be properly handled. Such binary pattern generation layer can be regarded as the layer with stochastic neurons in deep learning, and several methods have been proposed to address its backpropagation
[3, 15]. Here we use straight through (ST) estimator [3] in the stochastic sampling layer to generate binary pattern meanwhile addressing the zero gradient issue during backpropagation. Based on one variant of ST estimator, is set as during forward pass. When backpropagating through the stochastic sampling layer, an ST estimator replaces the derivative factor with the following:(3) 
In other words, indicator function in the stochastic layer is applied at forward pass but treated as identity function during backpropagation. This ST estimator allows the network to make a yes/no decision, allowing it to picking up the top fraction of kspace locations most important for our task.
2.3 Network Architecture
Fig. 1 shows the proposed network architecture consisting of two subnetworks: one unrolled reconstruction network and one sampling pattern learning network.
In the sampling pattern learning network (Fig. 1(b)), Renormalize() is a linear scaling operation to make sure the mean value of probabilistic pattern is equal to the desired undersampling ratio . The binary pattern is sampled at every forward pass in the network and once generated, it is used to retrospectively undersample the fully sampled multicoil kspace data.
The deep quasiNewton network (MoDL [1]) as the unrolled reconstruction network architecture is illustrated in Fig. 1(a). In deep quasiNewton, Denoiser + Data consistency blocks are replicated times to mimic quasiNewton outer loops of solving Eq. 1 in which a neural network denoiser for is applied. Five convolutional layers with skip connection [14] and instance normalization [27] are used as the denoiser and the weights are shared among blocks. The binary pattern is used to generate zerofilled reconstruction as the input of reconstruction network and also connected to all the data consistency subblocks to deploy regularized optimization, which also works as the skip connection to benefit the training of pattern weights .
3 Experiments
3.1 Dataset and Implementations
Data acquisition and processing.
Fully sampled kspace data were acquired in 6 healthy subjects ( males and female; age: years) using a sagittal T2weighted variable flip angle 3D fast spin echo sequence on a 3T GE scanner with a 32channel head coil. Imaging parameters were: imaging matrix, isotropic resolution. Coil sensitivity maps of each axial slice were calculated with ESPIRiT [25] using a autocalibration kspace region. From the fully sampled data, a combined single coil image using the same coil sensitivity maps was computed to provide the ground truth label for both sampling pattern learning and reconstruction performance comparison. The central 100 slices of each subject were extracted for the training (300 slices), validation (100 slices) and test (200 slices) dataset. In addition, kspace undersampling was performed retrospectively in the kykz plane for all the following experiments.
Training parameters. In the sampling pattern learning network, were initialized randomly, the slope factor and the undersampling ratio . The central kspace region remained fully sampled for each pattern. For the baseline LOUPE, a second slope factor was used to approximate the binary sampling. The sampling pattern learning networks using binary sampling with ST estimator and approximated sampling were denoted as BS (binary sampling) and AS (approximated sampling) in the following experiments. In the unrolled reconstruction network, replicated blocks were applied and the denoiser was initialized randomly. For the baseline LOUPE, a residual UNet was applied. All of the learnable parameters in Fig. 1 were trained simultaneously using the loss function: , where the th ground truth label in the training dataset, the th intermediate reconstruction ( in UNet). Stochastic optimization with batch size and Adam optimizer (initial learning rate: ) [16]
was used to minimize the loss function. The number of epochs was
. The whole training and inference procedures were implemented in PyTorch with Python version 3.7.3 on an RTX 2080Ti GPU.
3.2 Comparison with LOUPE
Fig. 2 shows the reconstruction results from one of the test subjects to demonstrate the performance improvement of the extended LOUPE over vanilla LOUPE. Four combinations of reconstruction network and sampling pattern optimization network were tested and compared. Binary sampling patterns were generated during test phase. From Fig. 2
, MoDL provided better reconstruction results compared to UNet, while for both UNet and MoDL reconstruction networks, BS (binary sampling) gave less noisy reconstructions than AS (approximate sampling) during test phase. Quantitative comparisons in terms of PSNR (peak signaltonoise ratio) and SSIM (structural similarity index measure
[29]) are shown in Table 1, where MoDL + BS had the best performance.PSNR (dB)  SSIM  

UNet+AS  32.5 1.0  0.885 0.016 
UNet+BS  33.0 0.6  0.898 0.012 
MoDL+AS  41.3 1.2  0.963 0.015 
MoDL+BS  1.1  0.012 
Pattern  Method  PSNR (dB)  SSIM 

VD  ESPIRiT  37.5 1.0  0.920 0.016 
TGV  40.1 0.9  0.952 0.014  
MoDL  40.4 0.9  0.963 0.010  
Learned  ESPIRiT  1.1  0.018 
TGV  1.1  0.016  
MoDL  1.1  0.012 
3.3 Comparison with other pattern
To compare the learned sampling pattern (’learned pattern’ in Fig. 3, generated from MoDL + BS in section 3.2) with the manually designed one with 10 % ratio, a variable density (VD) sampling pattern following a probabilistic density function whose formula is a polynomial of the radius in kspace with tunable parameters [26] was generated (’VD pattern’ in Fig. 3). ESPIRiT [25] and TGV [17] as two representative iterative methods for solving PI CSMRI were also deployed using both sampling patterns, and the corresponding reconstruction results are shown in Fig. 3. For each reconstruction method, the learned sampling pattern captured better image depictions with lower global errors than VD pattern and the structural details as zoomed in were also sharper with the learned sampling pattern. PSNR and SSIM in Table 2 shows consistently improved performance of the learned sampling pattern over the VD pattern for each reconstruction method.
4 Conclusions
In this work, LOUPE for optimizing the kspace sampling pattern in MRI was extended by training on invivo multicoil kspace data and using the unrolled network for undersampled reconstruction and binary stochastic sampling with ST estimator for sampling pattern optimization. Experimental results show that the extended LOUPE worked better than vanilla LOUPE on invivo kspace data and the learned sampling pattern also performed well on other reconstruction methods. Future work includes implementing the learned sampling pattern in the pulse sequence to optimize the kspace data acquisition process prospectively.
References
 [1] (2018) MoDL: modelbased deep learning architecture for inverse problems. IEEE transactions on medical imaging 38 (2), pp. 394–405. Cited by: §1, §2.1, §2.3.
 [2] (2019) Learningbased optimization of the undersampling pattern in mri. In International Conference on Information Processing in Medical Imaging, pp. 780–792. Cited by: §1, §2.1, §2.2, §2.
 [3] (2013) Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432. Cited by: §1, §2.2.
 [4] (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning 3 (1), pp. 1–122. Cited by: §2.
 [5] (2011) A firstorder primaldual algorithm for convex problems with applications to imaging. Journal of mathematical imaging and vision 40 (1), pp. 120–145. Cited by: §2.
 [6] (2007) An overview of bilevel optimization. Annals of operations research 153 (1), pp. 235–256. Cited by: §2.
 [7] (1977) Quasinewton methods, motivation and theory. SIAM review 19 (1), pp. 46–89. Cited by: §2.
 [8] (1995) Nonlinear solution of linear inverse problems by waveletvaguelette decomposition. Applied and computational harmonic analysis 2 (2), pp. 101–126. Cited by: §2.
 [9] (2014) Goldenangle radial sparse parallel mri: combination of compressed sensing, parallel imaging, and goldenangle radial sampling for fast and flexible dynamic volumetric mri. Magnetic resonance in medicine 72 (3), pp. 707–717. Cited by: §1.
 [10] (2018) Learningbased compressive mri. IEEE transactions on medical imaging 37 (6), pp. 1394–1406. Cited by: §1.
 [11] (2002) Generalized autocalibrating partially parallel acquisitions (grappa). Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 47 (6), pp. 1202–1210. Cited by: §1.
 [12] (2019) OEDIPUS: an experiment design framework for sparsityconstrained mri. IEEE transactions on medical imaging 38 (7), pp. 1545–1558. Cited by: §1.
 [13] (2018) Learning a variational network for reconstruction of accelerated mri data. Magnetic resonance in medicine 79 (6), pp. 3055–3071. Cited by: §1, §2.1.

[14]
(2016)
Deep residual learning for image recognition.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pp. 770–778. Cited by: §2.3.  [15] (2012) Neural networks for machine learning. Coursera, video lectures 264 (1). Cited by: §2.2.
 [16] (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §3.1.
 [17] (2011) Second order total generalized variation (tgv) for mri. Magnetic resonance in medicine 65 (2), pp. 480–491. Cited by: §3.3.
 [18] (2011) Adapted random sampling patterns for accelerated mri. Magnetic resonance materials in physics, biology and medicine 24 (1), pp. 43–50. Cited by: §1.
 [19] (2007) Sparse mri: the application of compressed sensing for rapid mr imaging. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 58 (6), pp. 1182–1195. Cited by: §1.
 [20] (2012) Fast spirit compressed sensing parallel imaging mri: scalable parallel implementation and clinically feasible runtime. IEEE transactions on medical imaging 31 (6), pp. 1250–1262. Cited by: §1.
 [21] (2005) An iterative regularization method for total variationbased image restoration. Multiscale Modeling & Simulation 4 (2), pp. 460–489. Cited by: §2.
 [22] (1999) SENSE: sensitivity encoding for fast mri. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 42 (5), pp. 952–962. Cited by: §1.
 [23] (2015) Unet: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computerassisted intervention, pp. 234–241. Cited by: §1, §1, §2.1, §2.

[24]
(2017)
A deep cascade of convolutional neural networks for dynamic mr image reconstruction
. IEEE transactions on Medical Imaging 37 (2), pp. 491–503. Cited by: §1, §2.1. 
[25]
(2014)
ESPIRiT—an eigenvalue approach to autocalibrating parallel mri: where sense meets grappa
. Magnetic resonance in medicine 71 (3), pp. 990–1001. Cited by: §3.1, §3.3.  [26] (2015) Berkeley advanced reconstruction toolbox. In Proc. Intl. Soc. Mag. Reson. Med, Vol. 23. Cited by: §3.3.
 [27] (2016) Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022. Cited by: §2.3.
 [28] (2011) Practical parallel imaging compressed sensing mri: summary of two years of experience in accelerating body mri of pediatric patients. In 2011 ieee international symposium on biomedical imaging: From nano to macro, pp. 1039–1043. Cited by: §1.
 [29] (2004) Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13 (4), pp. 600–612. Cited by: §3.2.
 [30] (2020) Fidelity imposed network edit (fine) for solving illposed image reconstruction. NeuroImage 211, pp. 116579. Cited by: §1.
Comments
There are no comments yet.