Parallel imaging (PI) [11, 22] and Compressed Sensing MRI (CS-MRI)  are widely used technique for acquiring and reconstructing under-sampled k-space data thereby shortening scanning times in MRI. CS-MRI is a computational technique that suppresses incoherent noise-like artifacts introduced by random under-sampling, often via a regularized regression strategy. Combining CS-MRI with PI promises to make MRI much more accessible and affordable. Therefore, this has been an intense area of research in the past decade [9, 20, 28]
. One major task in PI CS-MRI is designing a random under-sampling pattern, conventionally controlled by a variable-density probabilistic density function (PDF). However, the design of the ‘optimal’ under-sampling pattern remains an open problem for which heuristic solutions have been proposed. For example, generated the sampling pattern based on the power spectrum of an existing reference dataset;  combined experimental design with the constrained Cramer-Rao bound to generate the context-specific sampling pattern;  designed a parameter-free greedy pattern selection method to find a sampling pattern that performed well on average for the MRI data in a training set.
, a data-driven machine learning based approach called LOUPE was proposed as a principled and practical solution for optimizing the under-sampling pattern in CS-MRI. In LOUPE, fully sampled k-space data was simulated from magnitude MR images and retrospective under-sampling was deployed on the simulated k-space data. A sampling pattern optimization network and a modified U-Net 
as the under-sampled image reconstruction network were trained together in LOUPE to optimize both the k-space under-sampling pattern and reconstruction process. In the sampling pattern optimization network, one sigmoid operation was used to map the learnable weights into probability values, and a second sigmoid operation was used to approximate the non-differentiable step function for stochastic sampling, as the gradient needed to be back-propagated through such layer to update the learnable weights. After training, both optimal sampling pattern and reconstruction network were obtained. For a detailed description of LOUPE we refer the reader to.
In this work, we extended LOUPE in three ways. Firstly, in-house multi-coil in-vivo fully sampled T2-weighted k-space data from MR scanner was used to learn the optimal sampling pattern and reconstruction network. Secondly, modified U-Net  as the reconstruction network in LOUPE was extended to a modified unrolled reconstruction network with learned regularization term in order to reconstruct multi-coil data in PI with proper data consistency and reduce the dependency on training data when training cases were scarce. Thirdly, approximate stochastic sampling layer was replaced by a binary stochastic sampling layer with Straight-Through (ST) estimator , which was used to avoid zero gradients when back-propagating to this layer. Fully sampled data was acquired in healthy subjects. Under-sampled data was generated by retrospective under-sampling using various sampling patterns. Reconstructions were performed using different methods and compared.
In PI CS-MRI, given an under-sampling pattern and the corresponding acquired k-space data, a reconstructed image is obtained via minimizing the following objective function:
where the MR image to reconstruct, the coil sensitivity map of -th coil, the number of receiver coils,
the Fourier transform,the k-space under-sampling pattern, and the acquired under-sampled k-space data of the -th coil. is a regularization term, such as Total Variation (TV)  or wavelet . The minimization in Eq. 1 is performed using iterative solvers, such as the Quasi-Newton method , the alternating direction method of multipliers (ADMM)  or the primal-dual method . Eq. 1 can also be mimicked by learning a parameterized mapping such as neural network from input to output . We denote the mapping using either iterative solvers or deep neural networks as .
Our goal is to obtain an optimal under-sampling pattern for a fixed under-sampling ratio from fully sampled data through retrospective under-sampling. The mathematical formulation of this problem is:
where the -th MR image reconstructed by direct inverse Fourier transform from fully sampled k-space data ,
the loss function to measure the similarity between reconstructed imageand fully sampled label , the constraint set of to define how is generated with a fixed under-sampling ratio . The bilevel optimization problem  of Eq. (2) was solved in LOUPE  via jointly optimizing a modified U-Net  as and an approximate stochastic sampling process as on a large volume of simulated k-space data from magnitude MR images. However, for in-vivo k-space data with multi-coil acquisition as in PI, both U-Net architecture for reconstruction and approximate stochastic sampling for pattern generation could be sub-optimal. Specifically, due to limited training size of in-vivo data and no k-space consistency imposed in U-Net, inferior reconstructions could happen in test and even training datasets. And the approximate stochastic sampling process generated fractional rather than 0-1 binary patterns during training, which might not work well during test as binary patterns should be used for realistic k-space sampling. In view of the above, we extend and improve LOUPE in terms of both reconstruction mapping and sampling pattern’s generating process when working on in-vivo multi-coil k-space data in this work.
2.1 Unrolled Reconstruction Network
A modified residual U-Net  was used as the reconstruction network in LOUPE  to map from the zero-filled k-space reconstruction input to the fully-sampled k-space reconstruction output. U-Net works fine with simulated k-space reconstruction when enough training data of magnitude MR images are given, but as for in-vivo multi-coil k-space data, training cases are usually scarce, since fully-sampled scans are time consuming and as a result, only a few fully-sampled cases can be acquired.
To reduce the dependency on training dataset and improve the data consistency of deep learning reconstructed images, combining neural network block for the regularization term in Eq.1 with iterative optimization scheme to solve Eq. 1 has been explored in recent years [1, 13, 24], which are called ”unrolled optimization/reconstruction networks” in general. Prior works showed that such unrolled networks performed well for multi-coil k-space reconstruction task by means of inserting measured k-space data into the network architecture to solve Eq. 1 with a learning-based regularization. In light of the success of such unrolled reconstruction networks, we apply a modified MoDL  as the reconstruction network in this work. MoDL unrolled the quasi-Newton optimization scheme to solve Eq. 1 with a neural network based denoiser as the regularization term , and conjugate gradient (CG) descent block was applied in MoDL architecture to solve regularized problem. Besides, we will show that such unrolled network architecture also works as the skip connections for sampling pattern weights’ updating as the generated pattern is connected to each intermediate CG block to perform regularized data consistency (Fig. 1).
2.2 ST Estimator for Binary Pattern
In LOUPE , a probabilistic pattern was defined as with hyper-parameter and trainable weights . The binary k-space sampling pattern
was assumed to follow a Bernoulli distributionindependently on each k-space location. was generated from as , where and the pointwise indicator function on the truth values of . However, indicator function has zero gradient almost everywhere when back-propagating through it. LOUPE addressed this issue by approximating
using another sigmoid function:with hyper-parameter .
Although the gradient issue was solved in LOUPE, was approximated as a fraction between
on each k-space location instead of the binary pattern deployed in both test phase and realistic MR scan. As a result, binary sampling patterns generated in test phase could yield inferior performance due to such mismatch with training phase. To address this issue, binary patterns are also needed during training phase, at the same time gradient back-propagating through binary sampling layer should be properly handled. Such binary pattern generation layer can be regarded as the layer with stochastic neurons in deep learning, and several methods have been proposed to address its back-propagation[3, 15]. Here we use straight through (ST) estimator  in the stochastic sampling layer to generate binary pattern meanwhile addressing the zero gradient issue during back-propagation. Based on one variant of ST estimator, is set as during forward pass. When back-propagating through the stochastic sampling layer, an ST estimator replaces the derivative factor with the following:
In other words, indicator function in the stochastic layer is applied at forward pass but treated as identity function during back-propagation. This ST estimator allows the network to make a yes/no decision, allowing it to picking up the top fraction of k-space locations most important for our task.
2.3 Network Architecture
Fig. 1 shows the proposed network architecture consisting of two sub-networks: one unrolled reconstruction network and one sampling pattern learning network.
In the sampling pattern learning network (Fig. 1(b)), Renormalize() is a linear scaling operation to make sure the mean value of probabilistic pattern is equal to the desired under-sampling ratio . The binary pattern is sampled at every forward pass in the network and once generated, it is used to retrospectively under-sample the fully sampled multi-coil k-space data.
The deep quasi-Newton network (MoDL ) as the unrolled reconstruction network architecture is illustrated in Fig. 1(a). In deep quasi-Newton, Denoiser + Data consistency blocks are replicated times to mimic quasi-Newton outer loops of solving Eq. 1 in which a neural network denoiser for is applied. Five convolutional layers with skip connection  and instance normalization  are used as the denoiser and the weights are shared among blocks. The binary pattern is used to generate zero-filled reconstruction as the input of reconstruction network and also connected to all the data consistency sub-blocks to deploy regularized optimization, which also works as the skip connection to benefit the training of pattern weights .
3.1 Dataset and Implementations
Data acquisition and processing.
Fully sampled k-space data were acquired in 6 healthy subjects ( males and female; age: years) using a sagittal T2-weighted variable flip angle 3D fast spin echo sequence on a 3T GE scanner with a 32-channel head coil. Imaging parameters were: imaging matrix, isotropic resolution. Coil sensitivity maps of each axial slice were calculated with ESPIRiT  using a auto-calibration k-space region. From the fully sampled data, a combined single coil image using the same coil sensitivity maps was computed to provide the ground truth label for both sampling pattern learning and reconstruction performance comparison. The central 100 slices of each subject were extracted for the training (300 slices), validation (100 slices) and test (200 slices) dataset. In addition, k-space under-sampling was performed retrospectively in the ky-kz plane for all the following experiments.
Training parameters. In the sampling pattern learning network, were initialized randomly, the slope factor and the under-sampling ratio . The central k-space region remained fully sampled for each pattern. For the baseline LOUPE, a second slope factor was used to approximate the binary sampling. The sampling pattern learning networks using binary sampling with ST estimator and approximated sampling were denoted as BS (binary sampling) and AS (approximated sampling) in the following experiments. In the unrolled reconstruction network, replicated blocks were applied and the denoiser was initialized randomly. For the baseline LOUPE, a residual U-Net was applied. All of the learnable parameters in Fig. 1 were trained simultaneously using the loss function: , where the -th ground truth label in the training dataset, the -th intermediate reconstruction ( in U-Net). Stochastic optimization with batch size and Adam optimizer (initial learning rate: ) 
was used to minimize the loss function. The number of epochs was
. The whole training and inference procedures were implemented in PyTorch with Python version 3.7.3 on an RTX 2080Ti GPU.
3.2 Comparison with LOUPE
Fig. 2 shows the reconstruction results from one of the test subjects to demonstrate the performance improvement of the extended LOUPE over vanilla LOUPE. Four combinations of reconstruction network and sampling pattern optimization network were tested and compared. Binary sampling patterns were generated during test phase. From Fig. 2
, MoDL provided better reconstruction results compared to U-Net, while for both U-Net and MoDL reconstruction networks, BS (binary sampling) gave less noisy reconstructions than AS (approximate sampling) during test phase. Quantitative comparisons in terms of PSNR (peak signal-to-noise ratio) and SSIM (structural similarity index measure) are shown in Table 1, where MoDL + BS had the best performance.
|U-Net+AS||32.5 1.0||0.885 0.016|
|U-Net+BS||33.0 0.6||0.898 0.012|
|MoDL+AS||41.3 1.2||0.963 0.015|
|VD||ESPIRiT||37.5 1.0||0.920 0.016|
|TGV||40.1 0.9||0.952 0.014|
|MoDL||40.4 0.9||0.963 0.010|
3.3 Comparison with other pattern
To compare the learned sampling pattern (’learned pattern’ in Fig. 3, generated from MoDL + BS in section 3.2) with the manually designed one with 10 % ratio, a variable density (VD) sampling pattern following a probabilistic density function whose formula is a polynomial of the radius in k-space with tunable parameters  was generated (’VD pattern’ in Fig. 3). ESPIRiT  and TGV  as two representative iterative methods for solving PI CS-MRI were also deployed using both sampling patterns, and the corresponding reconstruction results are shown in Fig. 3. For each reconstruction method, the learned sampling pattern captured better image depictions with lower global errors than VD pattern and the structural details as zoomed in were also sharper with the learned sampling pattern. PSNR and SSIM in Table 2 shows consistently improved performance of the learned sampling pattern over the VD pattern for each reconstruction method.
In this work, LOUPE for optimizing the k-space sampling pattern in MRI was extended by training on in-vivo multi-coil k-space data and using the unrolled network for under-sampled reconstruction and binary stochastic sampling with ST estimator for sampling pattern optimization. Experimental results show that the extended LOUPE worked better than vanilla LOUPE on in-vivo k-space data and the learned sampling pattern also performed well on other reconstruction methods. Future work includes implementing the learned sampling pattern in the pulse sequence to optimize the k-space data acquisition process prospectively.
-  (2018) MoDL: model-based deep learning architecture for inverse problems. IEEE transactions on medical imaging 38 (2), pp. 394–405. Cited by: §1, §2.1, §2.3.
-  (2019) Learning-based optimization of the under-sampling pattern in mri. In International Conference on Information Processing in Medical Imaging, pp. 780–792. Cited by: §1, §2.1, §2.2, §2.
-  (2013) Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432. Cited by: §1, §2.2.
-  (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning 3 (1), pp. 1–122. Cited by: §2.
-  (2011) A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of mathematical imaging and vision 40 (1), pp. 120–145. Cited by: §2.
-  (2007) An overview of bilevel optimization. Annals of operations research 153 (1), pp. 235–256. Cited by: §2.
-  (1977) Quasi-newton methods, motivation and theory. SIAM review 19 (1), pp. 46–89. Cited by: §2.
-  (1995) Nonlinear solution of linear inverse problems by wavelet-vaguelette decomposition. Applied and computational harmonic analysis 2 (2), pp. 101–126. Cited by: §2.
-  (2014) Golden-angle radial sparse parallel mri: combination of compressed sensing, parallel imaging, and golden-angle radial sampling for fast and flexible dynamic volumetric mri. Magnetic resonance in medicine 72 (3), pp. 707–717. Cited by: §1.
-  (2018) Learning-based compressive mri. IEEE transactions on medical imaging 37 (6), pp. 1394–1406. Cited by: §1.
-  (2002) Generalized autocalibrating partially parallel acquisitions (grappa). Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 47 (6), pp. 1202–1210. Cited by: §1.
-  (2019) OEDIPUS: an experiment design framework for sparsity-constrained mri. IEEE transactions on medical imaging 38 (7), pp. 1545–1558. Cited by: §1.
-  (2018) Learning a variational network for reconstruction of accelerated mri data. Magnetic resonance in medicine 79 (6), pp. 3055–3071. Cited by: §1, §2.1.
-  (2016) Deep residual learning for image recognition. In , pp. 770–778. Cited by: §2.3.
-  (2012) Neural networks for machine learning. Coursera, video lectures 264 (1). Cited by: §2.2.
-  (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §3.1.
-  (2011) Second order total generalized variation (tgv) for mri. Magnetic resonance in medicine 65 (2), pp. 480–491. Cited by: §3.3.
-  (2011) Adapted random sampling patterns for accelerated mri. Magnetic resonance materials in physics, biology and medicine 24 (1), pp. 43–50. Cited by: §1.
-  (2007) Sparse mri: the application of compressed sensing for rapid mr imaging. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 58 (6), pp. 1182–1195. Cited by: §1.
-  (2012) Fast -spirit compressed sensing parallel imaging mri: scalable parallel implementation and clinically feasible runtime. IEEE transactions on medical imaging 31 (6), pp. 1250–1262. Cited by: §1.
-  (2005) An iterative regularization method for total variation-based image restoration. Multiscale Modeling & Simulation 4 (2), pp. 460–489. Cited by: §2.
-  (1999) SENSE: sensitivity encoding for fast mri. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 42 (5), pp. 952–962. Cited by: §1.
-  (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §1, §1, §2.1, §2.
A deep cascade of convolutional neural networks for dynamic mr image reconstruction. IEEE transactions on Medical Imaging 37 (2), pp. 491–503. Cited by: §1, §2.1.
ESPIRiT—an eigenvalue approach to autocalibrating parallel mri: where sense meets grappa. Magnetic resonance in medicine 71 (3), pp. 990–1001. Cited by: §3.1, §3.3.
-  (2015) Berkeley advanced reconstruction toolbox. In Proc. Intl. Soc. Mag. Reson. Med, Vol. 23. Cited by: §3.3.
-  (2016) Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022. Cited by: §2.3.
-  (2011) Practical parallel imaging compressed sensing mri: summary of two years of experience in accelerating body mri of pediatric patients. In 2011 ieee international symposium on biomedical imaging: From nano to macro, pp. 1039–1043. Cited by: §1.
-  (2004) Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13 (4), pp. 600–612. Cited by: §3.2.
-  (2020) Fidelity imposed network edit (fine) for solving ill-posed image reconstruction. NeuroImage 211, pp. 116579. Cited by: §1.