I Introduction
Magnetic resonance imaging (MRI) is a widely used imaging technique that allows visualization of both anatomical structures and physiological functions. MRI scanners sequentially collect measurements in the frequency domain (or
space) from which an image is reconstructed. A central challenge in MRI is its timeconsuming sequential acquisition process as the scanner needs to densely sample the underlying space for accurate reconstruction. In order to improve patients’ comfort and safety and alleviate motion artifacts, reconstructing highquality images from limited measurements is highly desirable. There are two core parts in the conventional MRI pipeline: a sampling pattern deployed to collect the (e.g., limited/undersampled) data in space and a corresponding reconstruction method (reconstructor) that also enables recovering any missing information. In this work, we use machinelearned models to predict the undersampling pattern and reconstruction in a single shot or pass and in an objectadaptive manner.Ia Background
Early attempts to improve the sampling efficiency without degrading image quality in MRI leveraged additional structure of the target images to perform reconstruction (e.g., as in compressed sensing [14]). MR images are often structured, e.g., they may be approximated as piecewise smooth functions with relatively few sharp edges. This type of structure leads to sparsity in the wavelet domain, as piecewise smooth functions are compressible when represented in the wavelet basis. The MR image reconstruction problem from subsampled space measurements often takes the form of a regularized optimization problem:
(1) 
where denotes the partial measurements, is the underlying MR image to recover, is a subsampling operator,
is the Fourier transform operator,
is a regularizer, and is a regularization parameter that controls the tradeoff between the fidelity to the space measurements and alignment with structure imposed by the regularizer . When representing an MR image as sparse in the wavelet domain and with the total variation penalty (), the above image reconstruction problem can be formulated as(2) 
Compressive sensing theory reveals that when the measurement operator is sufficiently incoherent with respect to the sparsifying operator, one can exactly recover the underlying image from significantly fewer samples [8]. A possible space undersampling scheme in MRI is variable density random sampling [15]
, where a higher probability is allocated for sampling at lower frequencies than higher frequencies. This sampling scheme accounts for the fact that most of the energy in MR images is concentrated close to the center of
space. In this classic MRI protocol, measurements are collected sequentially using subsampling patterns that are chosen a priori, and an iterative solver (e.g., the proximal gradient descent method [11] or the alternating direction method of multipliers, i.e., ADMM [6]) is used as the reconstruction method for the regularized optimization problem.Datadriven approaches can elevate the performance of image reconstruction methods by designing dataspecific regularizers or by introducing deep learning tools as learned reconstructors. For regularizer design, one may replace the penalty for the wavelet coefficients and the total variation penalty in (2) with a learningbased sparsity penalty where a sparsifying transform (or dictionary) is learned in an unsupervised fashion based on a limited number of unpaired clean image patches and used in the regularizer for reconstruction [30, 32]. Alternatively, plugandplay regularization replaces the proximal step in the typical iterative optimization process for (1) by a denoising neural network (or generic denoiser) that essentially functions as an implicit databased regularizer [27]. Another related deep learninginspired approach for reconstruction involves unrolling iterative algorithms and learning the regularization parameters in a supervised manner from paired training sets. Each learnable block in the chain of the unrolling strategy is fulfilled by a network to simulate a certain optimization operation [31, 9]
. Lastly, examples of direct deployment of feedforward neural networks as denoisers include the UNet architecturebased networks
[24] and tandem networks that are trained as generator and discriminator in the GAN framework [16]. We refer readers to [12, 23, 18] for more detailed reviews of learningbased reconstruction methods.In this work, we develop an adaptive sampler that generates objectspecific sampling patterns based on highly limited measurements from varying input objects. We propose to leverage neural networks to generate undersampling patterns in a single pass using highly undersampled initial acquisitions as input. The sampling network is trained jointly with a reconstructor to optimize eventual image reconstruction quality. Such training of neural networks to predict sampling patterns is essentially a supervised task. We propose an endtoend training procedure that takes in fully sampled space data and outputs corresponding desired undersampling patterns and image reconstructions. To ignite the process of training a network for sampling pattern or mask prediction, we use an alternating framework that operates between finding objectadaptive binary masks (and a corresponding reconstructor) exploiting ground truth space data, and updating a network for predicting such sampling masks.
IB Connection to Bilevel and Mixedinteger Optimization Problems
The drawback of the conventional compressive sensing approaches [15, 1] to accelerating MRI scans is that the measurement operator does not adapt to different objects and thus may not result in optimal image reconstructions in general.
An alternative formulation, first explored in [22], is in the form of a bilevel optimization problem:
(3) 
where denotes the underlying parameters of the measurement/sensing operator ; are the ground truth images reconstructed from raw fullysampled space data ; are the corresponding measurements when using the measurement operator ; and
is the regularization term for the lower level optimization problem. The goal is to ultimately generate artifactfree images from partial measurement vectors
as embodied in the upper level optimization problem. This particular formulation accommodates variation in the measurement operator, thus rendering the operator to be adaptive to a training dataset. In the MRI setting, the parameters capture the sampling patterns/masks we intend to let neural networks to predict. A major challenge of tackling the bilevel optimization formulation is the implicit dependence of the upperlevel objective on the parameters so that it is not immediately clear in both theoretical and practical aspects how to compute the (sub)gradients of the upperlevel objective with respect to the pattern parameters. To address this issue, we use a network as the solver for the lowerlevel problem, which essentially renders the in formulation (IB) as, and therefore the gradient computation and backpropagation from the upperlevel loss function to the sampling parameters
becomes possible.In this work, we focus on Cartesian undersampling of space. In this case, we can write , where is a binary diagonal matrix, and an in a certain entry of indicates the corresponding row in space is observed. Namely, the parameters in the bilevel formulation (IB) are the binary elements in . A practitioner can preset a sampling budget , which is reflected as . With a sampling budget constraint, the optimization problem (IB) assumes the flavor of integer programming:
(4) 
We note that there has been work devoted to leveraging neural networks to construct solvers for mixedinteger programming (MIP). In [17], a mixedinteger program is represented as a bipartite graph (nodes are variables and constraints of the MIP, and edges connect variables and their corresponding constraints). One neural network is trained to predict integer values in the solution to cater for the integer constraints in a generative style, and another one is trained to learn the policy of branching. Classic solvers are used to tackle subproblems of the original MIP to provide training data for each network.
The target of predicting a sampling pattern echoes the integervalued constraint in the optimization problem (IB
). We address the integervalued constraint in our approach through a binarization step, which can be considered as a postprocessing action applied on the output from the proposed maskpredicting
MNet. While the way we deploy the network for mask prediction does not constitute directly a solver for the MIP, it gives a potential new direction of using networks for integerconstraint problems.IC Prior Art in Sampling Design
We briefly review several existing sampling design strategies in the literature. An early work optimizes space trajectories using an information gain criterion [25]. Another work [22] directly optimized a space undersampling pattern based on reconstruction errortype loss on a training set. With recent success of deep learning methods, there has been growing interest in the topic of adaptive sampling. A classic recent nonparametric sampling adaptation method for (IB) is the greedy approach [10], which chooses one space line or phase encode to add into the sampled set at each step depending on which line in combination with those already added so far leads to the best reconstruction loss (upper level loss). The greedy approach terminates when the sampling budget is reached. The learned sampling mask is populationadaptive (learned on a dataset) but does not vary with different objects (i.e., not objectadaptive). The greedy sampling process can be accelerated by considering random subsets of all candidate high frequencies as cohort and selecting multiple high frequencies into the sampling set at one step.
LOUPE is a parametric model that characterizes the probability of sampling each pixel or row/column in the frequency domain or
space with underlying parameters that are learned simultaneously with the parameters of a reconstructor [4]. The drawback of this approach is that the learned sampling pattern is generic with respect to the training set (populationadaptive) rather than being adaptive to each individual object’s characteristics. A similar work [26] also assumes that the training data are representative enough of new data acquisitions and learns a fixed undersampling pattern. This latter approach combines the sampling and image reconstruction problem as a combined bilevel optimization problem, and parametrizes the sampling pattern via probability variables to optimize, and resorts to iterative optimization solvers to find the solution.J(oint)MoDL [2] assumes the sampling pattern to be rows/columns sampled in
space and learns a common undersampling pattern or mask along with a neural network reconstructor from a training set. Another way of parametrizing sampling patterns is to assign each frequency a logit as the natural logarithm of the unnormalized class probability so that a categorical distribution with respect to all frequency candidates can be accordingly constructed by applying a softmax function on the associated logits, which enables gradient backpropagation, and the output mask contains the frequencies with topM weights
[13]. PILOT parametrizes the sampling patterns as a matrix to coordinate with the nonuniform FFT measurements, and adds in an additional round of optimization to enforce hardware measurement constraints [29]. BJORK method parametrizes sampling patterns as linear combinations of quadratic Bspline kernels, and attempts to reduce the number of parameters needed to characterize sampling patterns through a scheduling plan that gradually reduces the ratio between the total number ofspace samples and the number of Bspline interpolation kernels
[28]. The approach can handle nonCartesian sampling but does not learn an objectadaptive sampler.Sequential decision processes have also been exploited as a path to build samplers. One attempt in the reinforcement learning style formulates the sampling problem as a partially observable Markov decision process and uses policy gradient and double deep Qnetwork to build the sampling policy
[19]. Another attempt builds a neural network as the sampler to be repeatedly applied in the sampling process and trained simultaneously with the reconstructor [33]. The sequential methods consume additional time for each prediction/decision (idle scanner time) that can introduce artifacts in realtime imaging.ID Contributions
We propose an adaptive sampler, MNet, as a neural network that takes very limited lowfrequency information as input per scan or frame and outputs corresponding desired undersampling patterns adaptive to different input objects in a single pass. The core advantage of the proposed samplingpattern predictor MNet in this work is that it outputs objectspecific masks with respect to different input objects, and thus the sampling pattern differs from object to object. This property gives our sampler more potential compared to several earlier proposed samplers as the mask output from our sampler is not only datadriven but also objectspecific. Moreover, compared to the sequentialtype samplers, our sampling approach determines at once the entirety of the higher frequency samples to acquire, thus significantly reducing the processing time for practical deployment. While the proposed approach is applied for static MRI sampling in this work, it can be readily used for dynamic MRI sampling as well, where variations such as predicting the sampling pattern for a frame based on lowfrequency samples of previous frame(s) could also be used to enable more rapid implementation.
We propose an alternating training framework to update the parameters of the sampler network and the reconstruction network. This training framework makes training an MNet possible without the need to resort to computationally expensive greedy methods to provide training labels for MNet. On the other hand, labels are generated internally as part of our alternating training framework. Our numerical experiments show the superior performance of the MNet framework with respect to several alternative schemes for undersampled singlecoil acquisitions based on the FastMRI [34] dataset.
IE Structure of this paper
The rest of this paper is organized as follows. Section II elaborates the design of the maskbackward training framework. Section III discusses the architecture of MNet, the implementation details and experimental results. We summarize our findings and potential new directions for future work in section IV.
Ii Methodology
MRI scanners acquire measurements in the frequency domain (space). We let denote an underlying (ground truth) image and denotes its Fourier transform. We focus on Cartesian (or line) sampling of space and singlecoil data in the experiments of this work, where we represent the sampling mask parameters (when doing subsampling of rows in space) via the vector in . In this case, the fullysampled space (including noise) is related to the corresponding underlying image as . The subsampling of space with zeros at nonsampled locations can be represented as . More generally, for multicoil data, the coil sensitivity maps would scale the image in this process [20]. We define the acceleration factor as the ratio between the number of all available rows (alternatively columns) in space and the total number of subsampled rows (i.e., ).
In this work, we aim to train a neural network that can generate a subsampling pattern for space based on very limited initial lowfrequency measurements. An immediate ensuing challenge for training such a network in the supervised style is the lack of labels denoting ideal/best subsampling patterns. One plausible option is to generate objectspecific masks for each training image using the greedy algorithm [10] for (IB) and use them as labels. However, the greedy algorithm’s labels are not necessarily optimal (i.e., global minimizers in (IB)) and obtaining such labels for a large number of training images (from fullysampled space) is computationally expensive or infeasible.
Another natural strategy to circumvent the difficulty of obtaining adaptive mask labels is an endtoend pipeline that is comprised of an encoder, which maps limited (e.g., lowfrequency) information in space to intermediate adaptive masks, and a decoder that attempts to reconstruct the artifactfree image from the space information sampled according to the adaptive masks from the encoder. After training of the endtoend pipeline is concluded, one extracts the encoder part from the pipeline, which can function as a desired mask prediction network. We choose not to proceed with this obvious endtoend approach owing to two folds of consideration. The first concern is that binarizing intermediate masks to meet sampling budget target of the mask prediction network is incompatible with gradientbased methods. While the computation technique can still get around the hard binarization threshold in gradient computation, there are not binary labels in the endtoend training framework to supervise the mask prediction network, which may incur unexpected behavior in its performance. The second concern is that a deep endtoend training procedure without intermediate intervention can cause vanishing gradients, which results in slow progress in training. In addition, an endtoend approach makes it harder to track the performance of each component and thus has less interpretability of the outcome of each component in this framework. We alleviate these issues by introducing a split variable that represents the output of MNet and satisfies the explicit mask constraints in (IB). A penalty is then introduced to measure the fidelity of the MNet
output to the split binary variables/labels.
Iia Proposed Training Scheme
We propose a maskbackward inferring method (see Subroutine 1) to obtain binary label masks to feed the supervised training process of the maskpredicting network (MNet). The key component of the maskbackward method is a parametric reconstructor that serves as a bridge connecting mask parameters and ground truth images. With the mask as a parameter to tune, we first compute a zerofilled image reconstruction via inverse Fourier transform (IFFT) of the observed lowfrequency space data with zeros occupying the unobserved space. The reconstructor receives this crude initial reconstruction and refines it to be free of aliasing artifacts. The loss function in (5
) compares the refined image with the ground truth image using specific reconstruction quality metrics, and is optimized with respect to objectadaptive sampling masks and the reconstructor. We mainly use a standard 8block UNet as the reconstructor with additional residual connections between the downsampling and upsampling blocks. Later, we consider other reconstructors in combination with
MNet.In order to train the MNet with guidance from the maskbackward inferring process, we use the following alternating training framework. Let denote the initially observed lowfrequency information of image , denotes the parametrization of the MNet, and denotes the parametrization of the reconstructor Recon. The optimization problems involved in this training framework are the following that are solved in an alternating manner (Fig. 1):
(5)  
(6) 
In loss function (5), the first term characterizes the fidelity between the reconstructed image and the ground truth image using normalized root mean square error (NRMSE), the second term is the regularization term to implicitly control the sparsity of the returned mask, and the third term imposes the consistency between the returned mask and the mask predicted by the current MNet. We add the third term to accelerate the convergence of this alternating training framework. We take to enforce sparsity on the objectadaptive masks. The norm allows computing (sub)gradients, and the exact binary constraint and desired sampling budget (as in (IB)) are enforced on after the gradient step as shown in Subroutine 1. The function denotes the binary cross entropy loss in both loss functions (5) and (6). The nonnegative parameters and are weights of the sparsity regularization and consistency regularization terms respectively.
In this training process, we separate the training of MNet and the image reconstructor Recon into two alternating steps to enable easier training. We first use the maskbackward method to find the ideal mask for each given image along with updating the reconstructor. And then, we use the ideal binary to update the parametrization of the MNet. These two steps are executed alternatively. The complete training algorithm is summarized in Algorithm 2.
IiB Remarks

Algorithm 2 follows a codesign approach to identify a sampler and a reconstructor simultaneously with respect to a training dataset. Finally, one can also take a trained MNet sampler from the output of Algorithm 2 and train a reconstructor in a separate process. We have investigated the performance of both of these practices in the Section III.

Computing gradient with respect to mask parametrization parameter : When implementing (5), we let where
is the typical Sigmoid function and
is the parameter vector in characterizing the mask for image . Compared to parametrizing the adaptive mask corresponding to the image using nonnegative real numbers between 0 and 1 (ideally binary), using Sigmoid improves the numerical stability when updates are made to the free ’s. 
Regarding binarization: The ultimate output sampling masks should be binary for practical use. The threshold we use for binarizing vectors is 0.5. During the training process, each entry in the mask characterization is in . We should point out that the reconstructor in Subroutine 1 (in our experiments, the Unet and the MoDL [3] model) sees the subsampled space information filtered by the binarized mask, while the gradient with respect to unbinarized in (5) can still be computed with the definition of a backward
function in e.g., the PyTorch implementation of binarization (see
[5, 35, 33]). 
Regarding sampling budget control: We have used the normalization trick in [4] to ensure the output mask from MNet as well as the refined mask returned by the maskbackward method has the exact target sampling ratio. Assume the sampling budget of lines is . Thus, the sampling ratio is , or for a corresponding , it is . Assume is the mask characterization before normalization, thus . We define , which is the prenormalization sampling ratio, and thus is the average value of . The normalization is done by
(7) The sampling budget control and binarization are both applied before a refined mask is returned by the maskbackward method to update the MNet model, thus rendering the labels for training the MNet to have the prespecified sampling ratio.
For the mask initialization step in Subroutine 1, the input mask is binarized, and we set for frequencies with mask value 1 and for frequencies with mask value 0.

in Algorithm 2 is a quality function that evaluates how good a mask is with a reconstructor for the image . We define . We start this alternating training framework with a warmedup reconstructor (e.g., Unet) that is pretrained with its inputs (initial reconstructions) set to zerofilled reconstructions obtained with variable density random undersampling of space.
The condition to accept a refined mask in Algorithm 2 is that its quality should be better than that of the prerefinement mask/direct output from MNet and that of random masks, as we need to ensure the refined mask is improved in terms of contributing more for image reconstruction and the refined mask is nontrivial. The quality of the random mask is always evaluated with the initial reconstructor pretrained for randommasked observations, and meanwhile the quality of adaptive masks and their refined counterparts is evaluated by the reconstructor that is updated during training.

One can replace a Unet with other parametric reconstructors such as MoDL and unrolling iterative blocks in the maskbackward method in our joint training framework, as long as the reconstructor is differentiable to let the gradient to propagate through.

In Step 4 in Algorithm 2, if are repetitive for more than half of the images in a batch, then we use random masks as the input for the maskbackward process to avoid any degeneracies.
Iii Experiments
Here, we first discuss the network architectures in our framework along with datasets used, hyperparameter choices, and the general experiments, before presenting results and comparisons.
Iiia Implementation Details
MNet structure
The MNet mask predictor starts with a doubleconvolution block that is followed by four downsampling blocks, which are used in the classic Unet, and ends with four fullyconnected layers. The input to the MNet
is the initially collected appropriately zeropadded limited (lowfrequency)
space information with real and imaginary parts separated into two channels, and the output is the sampling decision with respect to unobserved (higher) frequencies. The design of MNet consists of two parts: the encoding convolutional layers that emulate the encoder part in the Unet structure, and fully connected feedforward layers which originated in classification networks.Reconstructors
For Unet reconstructors, we adopt a standard 8block Unet architecture following [24]. When carrying out the maskbackward training, both the input to and output from the reconstructor are singlechannel realvalued images (to correspond with magnitude of reconstructions). For the input, we take the magnitude of the inverse FFT initial reconstruction. In the separate training process for individual reconstructors, the input images have 2 channels (for a complexvalued image, the real and imaginary parts are separated into 2 channels), which led to slightly better performance, and the outputs are singlechannel realvalued images. The Unet reconstructor contains four downsampling blocks and four upsampling blocks, each consisting of two
convolutions separated by ReLU and instance normalization. We let the first downsampling block to have 64 channels which are expanded from the input 1 or 2 channels. The Unet used in the
maskbackward process has the residual structure with skip connections, while the Unet used in the separate training cases does not.We also briefly review the idea of the MoDL reconstructor [2]. The MoDL reconstructor consists of a neuralnetwork denoiser and a dataconsistencyenforcing block. The image reconstruction problem the MoDL reconstructor tackles is of the form , where is a neuralnetwork denoiser. The two successive or alternating steps in implementing the MoDL reconstructor are specifically (the dataconsistencyenforcing block) and (the denoising step), where is the Hermitian of operator . One typically uses the conjugate gradient (CG) algorithm to compute the inverse in the dataconsistencyenforcing block. In the implementation of MoDL in this work, we set the MoDL reconstructor to have 4 MoDL blocks owing to the consideration of computation complexity in the hardware and the error tolerance of the CG algorithm is set to be . The neuralnetwork denoiser used in the MoDL reconstructor is a Unet as previously described with 64 channels in the first downsampling block without residual structure.
Dataset
We used the singlecoil data in the NYU FastMRI dataset [34] for training and testing. In each file, we selected the middle 6 slices containing most prominent image patterns in the knee scan, thus making up a training dataset with 12649 images, and 1287 images were used for validation and 1300 images were used for testing (we follow the setup of [19] and split the original fastMRI validation set into a new validation set and a test set with the same amount of volumes). Each slice has the dimension , and so does its corresponding space.
Baselines
With the same sampling ratio and lowfrequency base (initial) observations, We benchmark our adaptive sampling MNet framework against the recent LOUPE sampler + reconstructor scheme, a random sampler, and an equidistance sampler. We compare their performance for 4 and 8 acceleration of space sampling, respectively. For a fair comparison to the random and equidistance samplers, corresponding reconstructors are trained for both these cases.
Hyperparameters
We used the normalized loss and the structural similarity index measure (SSIM) for measuring the image reconstruction quality. SSIM is computed using a window size of and hyperparameters , . When training a reconstructor (Unet or MoDL) in a separate manner (with respect to inputs sampled with MNet masks, random masks and equidistance masks), the loss function is set as . The reconstruction loss function of the LOUPE model includes the unnormalized error to be consistent with the original work.
For the alternating training framework, recall that we solve the optimization problem (5) to provide labels (adaptive masks with respect to particular input) to train MNet. We set the consistency parameter . In practice, it is necessary to adjust the choice of the sparsity control parameter with respect to different image inputs in the training process, as an improper choice of may let the maskbackward protocol to output degenerate masks (masks for multiple different objects being the same) or not improve the initial masks. Hence, we prespecify an grid and use a common as the initial setting with respect to each batch in the training set. If the returned masks from the maskbackward protocol fail to demonstrate sufficient adaptivity with respect to the given image batch or do not get updated from the input, we check the sampling ratio of the failed refined masks. If the failed refined masks have higher (lower) sampling ratio than the targeted value before they go through the samplingbudget control, we increase (decrease) to the next value in the preassigned grid and let the image batch to go through the maskbackward again. For acceleration of space, the grid is chosen as a geometric sequence from to with quotient , and for acceleration of space, it is a geometric sequence from to with quotient .
We use the RMSprop optimizer in PyTorch through all the training instances. In the alternating training framework, for each batch, we carry out
steps of iteration inside the maskbackward process to generate labels of adaptive masks with the learning rate for Unet and for mask characterization parameters . The learning rate for MNet is , and 40 steps of update are made to MNet weights with respect to one batch of adaptive mask labels output by maskbackward. We run the alternating training framework for 10 epochs and obtain the corresponding MNet and Unetcotrained.When training separate reconstructors (both Unet and MoDL), we used an initial learning rate and the ReduceLROnPlateau scheduler in PyTorch to regulate the learning rate with patience 5, reduce factor 0.8 and minimal learning rate . Separate training of all reconstructors uses 40 epochs for each reconstructor.
When training the Loupe framework, we use the learning rate to update the mask parameters and to update the Unet in the Loupe framework. The slope parameter is set as 1. The Loupe model is trained for 40 epochs. These settings led to good performance of the LOUPE masks.
Regarding the sampling setting, in the 8 acceleration case, we set the base lowfrequency initially observed lines to 8 rows, and remaining sampling budget for highfrequency information as 32 rows; and in the 4 acceleration case, the base lowfrequency observation budget was 16 rows and the remaining sampling budget for highfrequency information was 64 rows.
IiiB Results
In Figure 3, we visualize the adaptive masks output from the adaptive sampler MNet trained by Algorithm 2. Knee images in Figure 3 are the ground truth images. The MNet sampler sees the lowfrequency information collected in space and outputs the corresponding full subsampling patterns for sampling highfrequency information in a single pass, which are shown for each image and are objectadaptive.
Figure 4 is an example showing the reconstructed images by several different reconstructors based on information collected from space according to different masks. The different corresponding masks are shown in Figure 5. We consider the following combinations of samplerreconstructor pairs: MNet and cotrained Unet (UnetCO), MNet and followup separatelytrained Unet (UnetSEP), MNet and a followup separatelytrained MoDL reconstructor (MNetMoDL), LOUPE cotrained sampler and reconstructor, random sampling mask and separatelytrained Unet (rand.Unet), and equidistance mask and separatelytrained Unet (equidist.Unet). We note the SSIM and NMSE values above the shown images.
One should note that the adaptive MNet mask shown in Figure 5 is for the ground truth object in Figure 4, while the underlying probabilistic parametrization of LOUPE mask in Figure 5 is fixed after the training process completes with respect to the training dataset. The random mask is regenerated for each different input during the training and testing process, so the random mask used in the sampling process can differ from one object to another. The equidistance mask is fixed with respect to the entire training and testing dataset given the preassigned amount of low frequencies observed and the amount of sampling budget for remaining high frequencies.
The complete reconstruction accuracy comparison for 8 acceleration is shown in Figure 6 using box plots. We consider four accuracy criteria to characterize reconstruction quality compared to the ground truth image: relative error, normalized mean square error, structural similarity index (SSIM), and high frequency error norm (HFEN) [21].
Method  rel. er.  NMSE  HFEN  SSIM 

MNetUnetCO  0.1778  0.0415  0.4150  0.6603 
MNetUnetSEP  0.1949  0.0529  0.4274  0.6965 
MNetMoDL  0.1701  0.0379  0.4087  0.6662 
LOUPE  0.1879  0.0451  0.4330  0.6180 
rand.Unet  0.2512  0.1124  0.5735  0.6237 
equidist.Unet  0.2611  0.1128  0.5980  0.6034 
Figure 6 shows that advantages of different combination of samplers and reconstructors under different criteria. The MNetMoDL pair outperformed other combinations in terms of global accuracy (relative error and NMSE) and reconstruction quality of high frequency portions (HFEN), while the MNetUnetSEP pair showed better performance in terms of feature characterization (SSIM). Although separately trained sophisticated unrollednetwork reconstructor (MoDL) demonstrates higher reconstruction accuracy in multiple criteria, we highlight the comparable performance from the MNetUnetCO pair due to the convenience of directly using a cotrained reconstructor without initiating a separate training effort. Both our objectadaptive sampling approach through MNet and the learningbased approach LOUPE are consistently better than the baseline methods that conduct random sampling and equidistance sampling in space, respectively.
The performance of the proposed samplerreconstructor pairs characterized by NMSE and SSIM in this work are on par with the results reported in the FastMRI leaderboard. Few outliers in the boxplots indicates better stability of cotrained Unet reconstructor and separately trained MoDL reconstructor compared to the separately trained Unetreconstructor.
One aspect of comparing these samplers is reproducibility. We point out that after the training is concluded, the MNet mask prediction is deterministic and has reproducibility guarantee, while LOUPE methods and other probabilitybased samplers give varying output due to their stochastic nature.
Method  rel. er.  NMSE  HFEN  SSIM 

MNetUnetCO  0.1480  0.0294  0.2673  0.7510 
MNetUnetSEP  0.1654  0.0378  0.3113  0.7660 
MNetMoDL  0.1437  0.0274  0.2663  0.7584 
LOUPE  0.1540  0.0304  0.3356  0.7381 
rand.Unet  0.2026  0.0646  0.4448  0.7348 
equidist.Unet  0.2084  0.0650  0.4680  0.7261 
Figures 7, 8, and 9 show the reconstruction examples, adaptive or compared mask examples, and reconstruction accuracy comparison results, respectively for the 4 acceleration case. The performance trend between methods in the 4 acceleration setting is similar to that for the 8 acceleration case discussed above.
We observe more concentration of the adaptive mask in lower frequencies in the 4 acceleration setting. This is due to the experiment setting that we start with more low frequencies in the 4 setting than in the 8 setting, and this may create a certain effect of frequency attraction. It can also be due to specific hyperparameter settings for training in the 4 setting.
Iv Conclusions
In this work, we propose an objectadaptive sampler for undersampling space in MRI, while maintaining high quality of image reconstructions. The sampler is realized by a convolutional neural network that takes as its input very limited space measurements (e.g., lowfrequencies) and outputs the corresponding remaining sampling patterns at the desired sampling budget in a single pass. The training labels for the sampler network are generated internally in our framework using the maskbackward training protocol. We implemented the proposed sampler and alternating training framework on the singlecoil knee FastMRI dataset and presented examples of adaptive masks with respect to various input image objects and undersampling factors. Our results show significant improvements in image quality with our singlepass objectadaptive sampling and reconstruction framework compared to other schemes.
Future directions: One central issue in all the learningbased adaptive sampling work is how to coordinate the parametrization of the mask and the challenge posed by the requisite mask binarization on gradient computation. We have resorted to numerical techniques in this work to address this issue. An alternative approach is to exploit delicate encoding schemes. We used the 1D line sampling in this work to demonstrate the capability of our framework, but it can be readily extended to subsampling phase encodes or shots in nonCartesian settings by replacing the initial reconstructor with nonuniform FFTbased ones [7]. To apply the MNet method to predict general highdimensional sampling patterns, some encoding schemes need to be introduced to reduce the complexity of parametrization. For instance, the parametrization format in BJORK [28] can be the target of MNet output, or in other words, the MNet predicts the interpolation coefficients in BJORK, which can immediately render the 2D sampling trajectory to be objectspecific. Meanwhile, developing training algorithms with some theoretical guarantees remains of importance for future work.
In this work, an MNet is trained with respect to fixed amount of initial (lowfrequency) observations and fixed target sampling ratio. The generalizability of such a trained MNet to a different subsampling target remains to be investigated. Although modifying the sampling budget control procedure can automatically adjust the output of the trained MNet to cater for different sampling ratio targets, it is unclear if the cotrained reconstructor can perform well with input images processed at a different subsampling ratio than that the reconstructor was trained for. This could be perhaps partially alleviated by retraining the reconstructor by itself with sampling patterns (postbinarization) output by MNet with different sampling ratios than in training.
We also consider the setting of dynamic data acquisition to be of high interest for our future work. When the object evolves dynamically over time, it would be ideal to have a corresponding sampling scheme that copes with the changing object rather than directly applying the method developed for static MRI. For example, the input to the MNet could include data from the previous frame(s). Similarly, one can replace various terms in the maskbackward objective function (5) with domainknowledge motivated priors or other penalty terms (e.g., reconstruction errors or contrast in regions of interest) that may boost the performance of the reconstructors to better capture certain features in the underlying data. Finally, while our experiments focused on singlecoil data, the proposed MNet method can directly generalize to the multicoil setting, as one simply needs to revise the input format for the networks without changing the rest of the training framework.
Acknowledgments
We thank Dr. Michael McCann at Los Alamos National Laboratory, and Shijun Liang and Siddhant Gautam at Michigan State University for helpful discussions.
References
 [1] (2013) Breaking the coherence barrier: a new theory for compressed sensing. arXiv preprint arXiv:1302.0561. Cited by: §IB.
 [2] (2020) Jmodl: joint modelbased deep learning for optimized sampling and reconstruction. IEEE Journal of Selected Topics in Signal Processing 14 (6), pp. 1151–1162. External Links: Document Cited by: §IC, §IIIA.
 [3] (2018) Model based image reconstruction using deep learned priors (modl). In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Vol. , pp. 671–674. External Links: Document Cited by: item 3.
 [4] (2020) Deeplearningbased optimization of the undersampling pattern in mri. IEEE Transactions on Computational Imaging 6 (), pp. 1139–1152. External Links: Document Cited by: §IC, item 4.
 [5] (201308) Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. arXiv eprints, pp. arXiv:1308.3432. External Links: 1308.3432 Cited by: item 3.
 [6] (201101) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3 (1), pp. 1–122. External Links: ISSN 19358237, Document Cited by: §IA.
 [7] (200710) On NUFFTbased gridding for nonCartesian MRI. J. Mag. Res. 188 (2), pp. 191–5. External Links: Document Cited by: §IV.
 [8] (2013) A mathematical introduction to compressive sensing. Birkhäuser, New York, NY. External Links: Document, ISBN 0817649476 Cited by: §IA.
 [9] (2020) Neumann networks for linear inverse problems in imaging. IEEE Transactions on Computational Imaging 6 (), pp. 328–343. External Links: Document Cited by: §IA.
 [10] (2018) Learningbased compressive MRI. IEEE Transactions on Medical Imaging 37 (6), pp. 1394–1406. Cited by: §IC, §II.
 [11] (2020) Perturbed proximal descent to escape saddle points for nonconvex and nonsmooth objective functions. In Recent Advances in Big Data and Deep Learning, L. Oneto, N. Navarin, A. Sperduti, and D. Anguita (Eds.), Cham, pp. 58–77. External Links: ISBN 9783030168414 Cited by: §IA.
 [12] (202103) Modelbased Reconstruction with Learning: From Unsupervised to Supervised and Beyond. arXiv eprints, pp. arXiv:2103.14528. External Links: 2103.14528 Cited by: §IA.
 [13] (2020) Learning sampling and modelbased signal recovery for compressed sensing mri. In ICASSP 2020  2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. , pp. 8906–8910. External Links: Document Cited by: §IC.
 [14] (2008) Compressed sensing MRI. IEEE Signal Processing Magazine 25 (2), pp. 72–82. Cited by: §IA.
 [15] (2007) Sparse MRI: the application of compressed sensing for rapid MR imaging. Magnetic resonance in medicine 58 (6), pp. 1182–1195. Cited by: §IA, §IB.
 [16] (2019) Deep generative adversarial neural networks for compressive sensing mri. IEEE Transactions on Medical Imaging 38 (1), pp. 167–179. External Links: Document Cited by: §IA.
 [17] (202012) Solving Mixed Integer Programs Using Neural Networks. arXiv eprints, pp. arXiv:2012.13349. External Links: 2012.13349 Cited by: §IB.
 [18] (202005) Deep learning techniques for inverse problems in imaging. IEEE Journal on Selected Areas in Information Theory 1 (1), pp. 39–56. External Links: Document Cited by: §IA.
 [19] (2020) Active mr kspace sampling with reinforcement learning. In International Conference on Medical Image Computing and ComputerAssisted Intervention, Cited by: §IC, §IIIA.
 [20] (2006) Encoding and reconstruction in parallel MRI. NMR in Biomedicine 19 (3), pp. 288–299. Cited by: §II.
 [21] (2011) MR image reconstruction from highly undersampled kspace data by dictionary learning. IEEE Transactions on Medical Imaging 30 (5), pp. 1028–1041. Cited by: §IIIB.
 [22] (2011) Adaptive sampling design for compressed sensing mri. In 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 3751–3755. Cited by: §IB, §IC.
 [23] (2020) Image reconstruction: from sparsity to dataadaptive methods and machine learning. Proceedings of the IEEE 108 (1), pp. 86–109. Cited by: §IA.
 [24] (2015) Unet: convolutional networks for biomedical image segmentation. In Medical Image Computing and ComputerAssisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi (Eds.), Cham, pp. 234–241. External Links: ISBN 9783319245744 Cited by: §IA, §IIIA.
 [25] (2010) Optimization of kspace trajectories for compressed sensing by bayesian experimental design. Magn. Reson. Med. 63 (1), pp. 116–26. Cited by: §IC.
 [26] (2020) Learning the sampling pattern for mri. IEEE Transactions on Medical Imaging 39 (12), pp. 4310–4321. External Links: Document Cited by: §IC.
 [27] (201312) Plugandplay priors for model based reconstruction. In 2013 IEEE Global Conference on Signal and Information Processing, External Links: Document Cited by: §IA.
 [28] (202101) Bspline Parameterized Joint Optimization of Reconstruction and Kspace Trajectories (BJORK) for Accelerated 2D MRI. arXiv eprints, pp. arXiv:2101.11369. External Links: 2101.11369 Cited by: §IC, §IV.
 [29] (2019) PILOT: PhysicsInformed Learned Optimal Trajectories for Accelerated MRI. arXiv eprints. External Links: 1909.05773 Cited by: §IC.
 [30] (2012) Lowdose Xray CT reconstruction via dictionary learning. IEEE Transactions on Medical Imaging 31 (9), pp. 1682–1697. Cited by: §IA.
 [31] (2016) Deep ADMMNet for compressive sensing MRI. In Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 10–18. Cited by: §IA.
 [32] (2021) Unified supervisedunsupervised (super) learning for xray ct image reconstruction. IEEE Transactions on Medical Imaging 40 (11), pp. 2986–3001. External Links: Document Cited by: §IA.
 [33] (202105) EndtoEnd Sequential Sampling and Reconstruction for MR Imaging. arXiv eprints, pp. arXiv:2105.06460. External Links: 2105.06460 Cited by: §IC, item 3.
 [34] (2018) fastMRI: an open dataset and benchmarks for accelerated MRI. External Links: 1811.08839 Cited by: §ID, §IIIA.
 [35] (202007) Extending LOUPE for Kspace Undersampling Pattern Optimization in Multicoil MRI. arXiv eprints, pp. arXiv:2007.14450. External Links: 2007.14450 Cited by: item 3.
Comments
There are no comments yet.