Single-pass Object-adaptive Data Undersampling and Reconstruction for MRI

11/17/2021
by   Zhishen Huang, et al.
Michigan State University
9

There is much recent interest in techniques to accelerate the data acquisition process in MRI by acquiring limited measurements. Often sophisticated reconstruction algorithms are deployed to maintain high image quality in such settings. In this work, we propose a data-driven sampler using a convolutional neural network, MNet, to provide object-specific sampling patterns adaptive to each scanned object. The network observes very limited low-frequency k-space data for each object and rapidly predicts the desired undersampling pattern in one go that achieves high image reconstruction quality. We propose an accompanying alternating-type training framework with a mask-backward procedure that efficiently generates training labels for the sampler network and jointly trains an image reconstruction network. Experimental results on the fastMRI knee dataset demonstrate the ability of the proposed learned undersampling network to generate object-specific masks at fourfold and eightfold acceleration that achieve superior image reconstruction performance than several existing schemes. The source code for the proposed joint sampling and reconstruction learning framework is available at https://github.com/zhishenhuang/mri.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 4

page 8

page 9

page 10

page 11

05/19/2021

Joint Calibrationless Reconstruction and Segmentation of Parallel MRI

The volume estimation of brain regions from MRI data is a key problem in...
03/09/2021

Universal Undersampled MRI Reconstruction

Deep neural networks have been extensively studied for undersampled MRI ...
06/24/2020

MRI Image Reconstruction via Learning Optimization using Neural ODEs

We propose to formulate MRI image reconstruction as an optimization prob...
11/25/2021

VaxNeRF: Revisiting the Classic for Voxel-Accelerated Neural Radiance Field

Neural Radiance Field (NeRF) is a popular method in data-driven 3D recon...
12/31/2021

Calibrated Hyperspectral Image Reconstruction via Graph-based Self-Tuning Network

Recently, hyperspectral imaging (HSI) has attracted increasing research ...
10/30/2020

Adversarial Robust Training in MRI Reconstruction

Deep Learning has shown potential in accelerating Magnetic Resonance Ima...
07/06/2020

Joint Frequency- and Image-Space Learning for Fourier Imaging

We propose a neural network layer structure that combines frequency and ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Magnetic resonance imaging (MRI) is a widely used imaging technique that allows visualization of both anatomical structures and physiological functions. MRI scanners sequentially collect measurements in the frequency domain (or

-space) from which an image is reconstructed. A central challenge in MRI is its time-consuming sequential acquisition process as the scanner needs to densely sample the underlying -space for accurate reconstruction. In order to improve patients’ comfort and safety and alleviate motion artifacts, reconstructing high-quality images from limited measurements is highly desirable. There are two core parts in the conventional MRI pipeline: a sampling pattern deployed to collect the (e.g., limited/undersampled) data in -space and a corresponding reconstruction method (reconstructor) that also enables recovering any missing information. In this work, we use machine-learned models to predict the undersampling pattern and reconstruction in a single shot or pass and in an object-adaptive manner.

I-a Background

Early attempts to improve the sampling efficiency without degrading image quality in MRI leveraged additional structure of the target images to perform reconstruction (e.g., as in compressed sensing [14]). MR images are often structured, e.g., they may be approximated as piecewise smooth functions with relatively few sharp edges. This type of structure leads to sparsity in the wavelet domain, as piecewise smooth functions are compressible when represented in the wavelet basis. The MR image reconstruction problem from subsampled -space measurements often takes the form of a regularized optimization problem:

(1)

where denotes the partial measurements, is the underlying MR image to recover, is a subsampling operator,

is the Fourier transform operator,

is a regularizer, and is a regularization parameter that controls the trade-off between the fidelity to the -space measurements and alignment with structure imposed by the regularizer . When representing an MR image as sparse in the wavelet domain and with the total variation penalty (), the above image reconstruction problem can be formulated as

(2)

Compressive sensing theory reveals that when the measurement operator is sufficiently incoherent with respect to the sparsifying operator, one can exactly recover the underlying image from significantly fewer samples [8]. A possible -space undersampling scheme in MRI is variable density random sampling [15]

, where a higher probability is allocated for sampling at lower frequencies than higher frequencies. This sampling scheme accounts for the fact that most of the energy in MR images is concentrated close to the center of

-space. In this classic MRI protocol, measurements are collected sequentially using subsampling patterns that are chosen a priori, and an iterative solver (e.g., the proximal gradient descent method [11] or the alternating direction method of multipliers, i.e., ADMM [6]) is used as the reconstruction method for the regularized optimization problem.

Data-driven approaches can elevate the performance of image reconstruction methods by designing data-specific regularizers or by introducing deep learning tools as learned reconstructors. For regularizer design, one may replace the penalty for the wavelet coefficients and the total variation penalty in (2) with a learning-based sparsity penalty where a sparsifying transform (or dictionary) is learned in an unsupervised fashion based on a limited number of unpaired clean image patches and used in the regularizer for reconstruction [30, 32]. Alternatively, plug-and-play regularization replaces the proximal step in the typical iterative optimization process for (1) by a denoising neural network (or generic denoiser) that essentially functions as an implicit data-based regularizer [27]. Another related deep learning-inspired approach for reconstruction involves unrolling iterative algorithms and learning the regularization parameters in a supervised manner from paired training sets. Each learnable block in the chain of the unrolling strategy is fulfilled by a network to simulate a certain optimization operation [31, 9]

. Lastly, examples of direct deployment of feed-forward neural networks as denoisers include the U-Net architecture-based networks 

[24] and tandem networks that are trained as generator and discriminator in the GAN framework [16]. We refer readers to [12, 23, 18] for more detailed reviews of learning-based reconstruction methods.

In this work, we develop an adaptive sampler that generates object-specific sampling patterns based on highly limited measurements from varying input objects. We propose to leverage neural networks to generate undersampling patterns in a single pass using highly undersampled initial acquisitions as input. The sampling network is trained jointly with a reconstructor to optimize eventual image reconstruction quality. Such training of neural networks to predict sampling patterns is essentially a supervised task. We propose an end-to-end training procedure that takes in fully sampled -space data and outputs corresponding desired undersampling patterns and image reconstructions. To ignite the process of training a network for sampling pattern or mask prediction, we use an alternating framework that operates between finding object-adaptive binary masks (and a corresponding reconstructor) exploiting ground truth -space data, and updating a network for predicting such sampling masks.

I-B Connection to Bilevel and Mixed-integer Optimization Problems

The drawback of the conventional compressive sensing approaches [15, 1] to accelerating MRI scans is that the measurement operator does not adapt to different objects and thus may not result in optimal image reconstructions in general.

An alternative formulation, first explored in [22], is in the form of a bilevel optimization problem:

(3)

where denotes the underlying parameters of the measurement/sensing operator ; are the ground truth images reconstructed from raw fully-sampled -space data ; are the corresponding measurements when using the measurement operator ; and

is the regularization term for the lower level optimization problem. The goal is to ultimately generate artifact-free images from partial measurement vectors

as embodied in the upper level optimization problem. This particular formulation accommodates variation in the measurement operator, thus rendering the operator to be adaptive to a training dataset. In the MRI setting, the parameters capture the sampling patterns/masks we intend to let neural networks to predict. A major challenge of tackling the bilevel optimization formulation is the implicit dependence of the upper-level objective on the parameters so that it is not immediately clear in both theoretical and practical aspects how to compute the (sub)gradients of the upper-level objective with respect to the pattern parameters. To address this issue, we use a network as the solver for the lower-level problem, which essentially renders the in formulation (I-B) as

, and therefore the gradient computation and back-propagation from the upper-level loss function to the sampling parameters

becomes possible.

In this work, we focus on Cartesian undersampling of -space. In this case, we can write , where is a binary diagonal matrix, and an in a certain entry of indicates the corresponding row in -space is observed. Namely, the parameters in the bilevel formulation (I-B) are the binary elements in . A practitioner can pre-set a sampling budget , which is reflected as . With a sampling budget constraint, the optimization problem (I-B) assumes the flavor of integer programming:

(4)

We note that there has been work devoted to leveraging neural networks to construct solvers for mixed-integer programming (MIP). In [17], a mixed-integer program is represented as a bipartite graph (nodes are variables and constraints of the MIP, and edges connect variables and their corresponding constraints). One neural network is trained to predict integer values in the solution to cater for the integer constraints in a generative style, and another one is trained to learn the policy of branching. Classic solvers are used to tackle sub-problems of the original MIP to provide training data for each network.

The target of predicting a sampling pattern echoes the integer-valued constraint in the optimization problem (I-B

). We address the integer-valued constraint in our approach through a binarization step, which can be considered as a post-processing action applied on the output from the proposed mask-predicting

MNet. While the way we deploy the network for mask prediction does not constitute directly a solver for the MIP, it gives a potential new direction of using networks for integer-constraint problems.

I-C Prior Art in Sampling Design

We briefly review several existing sampling design strategies in the literature. An early work optimizes -space trajectories using an information gain criterion [25]. Another work [22] directly optimized a -space undersampling pattern based on reconstruction error-type loss on a training set. With recent success of deep learning methods, there has been growing interest in the topic of adaptive sampling. A classic recent non-parametric sampling adaptation method for (I-B) is the greedy approach [10], which chooses one -space line or phase encode to add into the sampled set at each step depending on which line in combination with those already added so far leads to the best reconstruction loss (upper level loss). The greedy approach terminates when the sampling budget is reached. The learned sampling mask is population-adaptive (learned on a dataset) but does not vary with different objects (i.e., not object-adaptive). The greedy sampling process can be accelerated by considering random subsets of all candidate high frequencies as cohort and selecting multiple high frequencies into the sampling set at one step.

LOUPE is a parametric model that characterizes the probability of sampling each pixel or row/column in the frequency domain or

-space with underlying parameters that are learned simultaneously with the parameters of a reconstructor [4]. The drawback of this approach is that the learned sampling pattern is generic with respect to the training set (population-adaptive) rather than being adaptive to each individual object’s characteristics. A similar work [26] also assumes that the training data are representative enough of new data acquisitions and learns a fixed undersampling pattern. This latter approach combines the sampling and image reconstruction problem as a combined bilevel optimization problem, and parametrizes the sampling pattern via probability variables to optimize, and resorts to iterative optimization solvers to find the solution.

J(oint)-MoDL [2] assumes the sampling pattern to be rows/columns sampled in

-space and learns a common undersampling pattern or mask along with a neural network reconstructor from a training set. Another way of parametrizing sampling patterns is to assign each frequency a logit as the natural logarithm of the unnormalized class probability so that a categorical distribution with respect to all frequency candidates can be accordingly constructed by applying a softmax function on the associated logits, which enables gradient back-propagation, and the output mask contains the frequencies with top-M weights 

[13]. PILOT parametrizes the sampling patterns as a matrix to coordinate with the non-uniform FFT measurements, and adds in an additional round of optimization to enforce hardware measurement constraints [29]. BJORK method parametrizes sampling patterns as linear combinations of quadratic B-spline kernels, and attempts to reduce the number of parameters needed to characterize sampling patterns through a scheduling plan that gradually reduces the ratio between the total number of

-space samples and the number of B-spline interpolation kernels 

[28]. The approach can handle non-Cartesian sampling but does not learn an object-adaptive sampler.

Sequential decision processes have also been exploited as a path to build samplers. One attempt in the reinforcement learning style formulates the sampling problem as a partially observable Markov decision process and uses policy gradient and double deep Q-network to build the sampling policy 

[19]. Another attempt builds a neural network as the sampler to be repeatedly applied in the sampling process and trained simultaneously with the reconstructor [33]. The sequential methods consume additional time for each prediction/decision (idle scanner time) that can introduce artifacts in real-time imaging.

I-D Contributions

We propose an adaptive sampler, MNet, as a neural network that takes very limited low-frequency information as input per scan or frame and outputs corresponding desired undersampling patterns adaptive to different input objects in a single pass. The core advantage of the proposed sampling-pattern predictor MNet in this work is that it outputs object-specific masks with respect to different input objects, and thus the sampling pattern differs from object to object. This property gives our sampler more potential compared to several earlier proposed samplers as the mask output from our sampler is not only data-driven but also object-specific. Moreover, compared to the sequential-type samplers, our sampling approach determines at once the entirety of the higher frequency samples to acquire, thus significantly reducing the processing time for practical deployment. While the proposed approach is applied for static MRI sampling in this work, it can be readily used for dynamic MRI sampling as well, where variations such as predicting the sampling pattern for a frame based on low-frequency samples of previous frame(s) could also be used to enable more rapid implementation.

We propose an alternating training framework to update the parameters of the sampler network and the reconstruction network. This training framework makes training an MNet possible without the need to resort to computationally expensive greedy methods to provide training labels for MNet. On the other hand, labels are generated internally as part of our alternating training framework. Our numerical experiments show the superior performance of the MNet framework with respect to several alternative schemes for undersampled single-coil acquisitions based on the FastMRI [34] dataset.

I-E Structure of this paper

The rest of this paper is organized as follows. Section II elaborates the design of the mask-backward training framework. Section III discusses the architecture of MNet, the implementation details and experimental results. We summarize our findings and potential new directions for future work in section IV.

Ii Methodology

Fig. 1: Diagram of the proposed alternating training framework.

MRI scanners acquire measurements in the frequency domain (-space). We let denote an underlying (ground truth) image and denotes its Fourier transform. We focus on Cartesian (or line) sampling of -space and single-coil data in the experiments of this work, where we represent the sampling mask parameters (when doing subsampling of rows in -space) via the vector in . In this case, the fully-sampled -space (including noise) is related to the corresponding underlying image as . The subsampling of -space with zeros at non-sampled locations can be represented as . More generally, for multi-coil data, the coil sensitivity maps would scale the image in this process [20]. We define the acceleration factor as the ratio between the number of all available rows (alternatively columns) in -space and the total number of subsampled rows (i.e., ).

In this work, we aim to train a neural network that can generate a subsampling pattern for -space based on very limited initial low-frequency measurements. An immediate ensuing challenge for training such a network in the supervised style is the lack of labels denoting ideal/best subsampling patterns. One plausible option is to generate object-specific masks for each training image using the greedy algorithm [10] for (I-B) and use them as labels. However, the greedy algorithm’s labels are not necessarily optimal (i.e., global minimizers in (I-B)) and obtaining such labels for a large number of training images (from fully-sampled -space) is computationally expensive or infeasible.

Another natural strategy to circumvent the difficulty of obtaining adaptive mask labels is an end-to-end pipeline that is comprised of an encoder, which maps limited (e.g., low-frequency) information in -space to intermediate adaptive masks, and a decoder that attempts to reconstruct the artifact-free image from the -space information sampled according to the adaptive masks from the encoder. After training of the end-to-end pipeline is concluded, one extracts the encoder part from the pipeline, which can function as a desired mask prediction network. We choose not to proceed with this obvious end-to-end approach owing to two folds of consideration. The first concern is that binarizing intermediate masks to meet sampling budget target of the mask prediction network is incompatible with gradient-based methods. While the computation technique can still get around the hard binarization threshold in gradient computation, there are not binary labels in the end-to-end training framework to supervise the mask prediction network, which may incur unexpected behavior in its performance. The second concern is that a deep end-to-end training procedure without intermediate intervention can cause vanishing gradients, which results in slow progress in training. In addition, an end-to-end approach makes it harder to track the performance of each component and thus has less interpretability of the outcome of each component in this framework. We alleviate these issues by introducing a split variable that represents the output of MNet and satisfies the explicit mask constraints in (I-B). A penalty is then introduced to measure the fidelity of the MNet

output to the split binary variables/labels.

Ii-a Proposed Training Scheme

We propose a mask-backward inferring method (see Subroutine 1) to obtain binary label masks to feed the supervised training process of the mask-predicting network (MNet). The key component of the mask-backward method is a parametric reconstructor that serves as a bridge connecting mask parameters and ground truth images. With the mask as a parameter to tune, we first compute a zero-filled image reconstruction via inverse Fourier transform (IFFT) of the observed low-frequency -space data with zeros occupying the unobserved -space. The reconstructor receives this crude initial reconstruction and refines it to be free of aliasing artifacts. The loss function in (5

) compares the refined image with the ground truth image using specific reconstruction quality metrics, and is optimized with respect to object-adaptive sampling masks and the reconstructor. We mainly use a standard 8-block U-Net as the reconstructor with additional residual connections between the downsampling and upsampling blocks. Later, we consider other reconstructors in combination with

MNet.

1:Ground truth image , an initial mask , a reconstructor with parameters , maximal iteration steps , sparsity control parameter , consistency control parameter , sampling budget
2:Initialize
3:for iteration count  do
4:     Sample information in according to binarized to obtain adaptive observations .
5:     Compute the zero-filled IFFT images .
6:     Compute the refined image .
7:     Evaluate the loss function in (5), compute the gradient with respect to and parameters of the reconstructor , make a step of update. See Remark 2 for implementation details.
8:end for
9:Apply sampling budget control and binarize to obtain .
10:return , image reconstructor This subroutine can readily handle a batch of images.
Algorithm 1 Subroutine: mask-backward

In order to train the MNet with guidance from the mask-backward inferring process, we use the following alternating training framework. Let denote the initially observed low-frequency information of image , denotes the parametrization of the MNet, and denotes the parametrization of the reconstructor Recon. The optimization problems involved in this training framework are the following that are solved in an alternating manner (Fig. 1):

(5)
(6)

In loss function (5), the first term characterizes the fidelity between the reconstructed image and the ground truth image using normalized root mean square error (NRMSE), the second term is the regularization term to implicitly control the sparsity of the returned mask, and the third term imposes the consistency between the returned mask and the mask predicted by the current MNet. We add the third term to accelerate the convergence of this alternating training framework. We take to enforce sparsity on the object-adaptive masks. The norm allows computing (sub)gradients, and the exact binary constraint and desired sampling budget (as in (I-B)) are enforced on after the gradient step as shown in Subroutine 1. The function denotes the binary cross entropy loss in both loss functions (5) and (6). The non-negative parameters and are weights of the sparsity regularization and consistency regularization terms respectively.

In this training process, we separate the training of MNet and the image reconstructor Recon into two alternating steps to enable easier training. We first use the mask-backward method to find the ideal mask for each given image along with updating the reconstructor. And then, we use the ideal binary to update the parametrization of the MNet. These two steps are executed alternatively. The complete training algorithm is summarized in Algorithm 2.

1:A mask-predicting network MNet, a reconstructor , ground truth images , initial (low-frequency) measurement count , sampling budget , maximal iteration steps for mask-backward, sparsity control parameter , consistency control parameter , number of iterations for MNet updates.
2:for

 epoch

maximal epoch do
3:     for  do
4:         Compute and extract -space information corresponding to lowest frequencies.
5:         Generate the adaptive masks for image batch .
6:         Apply sampling budget control and binarization on . See remarks 3 and 4.
7:         
8:         if  then
9:              Make steps of supervised training of MNet with labels as the refined mask with respect to objective (6).
10:              
11:         else
12:              
13:         end if
14:     end for
15:end for
16:return MNet, image reconstructor
Algorithm 2 Alternating training framework for sampler network MNet and reconstructor

Ii-B Remarks

  1. Algorithm 2 follows a co-design approach to identify a sampler and a reconstructor simultaneously with respect to a training dataset. Finally, one can also take a trained MNet sampler from the output of Algorithm 2 and train a reconstructor in a separate process. We have investigated the performance of both of these practices in the Section III.

  2. Computing gradient with respect to mask parametrization parameter : When implementing (5), we let where

    is the typical Sigmoid function and

    is the parameter vector in characterizing the mask for image . Compared to parametrizing the adaptive mask corresponding to the image using nonnegative real numbers between 0 and 1 (ideally binary), using Sigmoid improves the numerical stability when updates are made to the free ’s.

  3. Regarding binarization: The ultimate output sampling masks should be binary for practical use. The threshold we use for binarizing vectors is 0.5. During the training process, each entry in the mask characterization is in . We should point out that the reconstructor in Subroutine 1 (in our experiments, the Unet and the MoDL [3] model) sees the subsampled -space information filtered by the binarized mask, while the gradient with respect to unbinarized in (5) can still be computed with the definition of a backward

    function in e.g., the PyTorch implementation of binarization (see 

    [5, 35, 33]).

  4. Regarding sampling budget control: We have used the normalization trick in [4] to ensure the output mask from MNet as well as the refined mask returned by the mask-backward method has the exact target sampling ratio. Assume the sampling budget of lines is . Thus, the sampling ratio is , or for a corresponding , it is . Assume is the mask characterization before normalization, thus . We define , which is the pre-normalization sampling ratio, and thus is the average value of . The normalization is done by

    (7)

    The sampling budget control and binarization are both applied before a refined mask is returned by the mask-backward method to update the MNet model, thus rendering the labels for training the MNet to have the pre-specified sampling ratio.

    For the mask initialization step in Subroutine 1, the input mask is binarized, and we set for frequencies with mask value 1 and for frequencies with mask value 0.

  5. in Algorithm 2 is a quality function that evaluates how good a mask is with a reconstructor for the image . We define . We start this alternating training framework with a warmed-up reconstructor (e.g., Unet) that is pre-trained with its inputs (initial reconstructions) set to zero-filled reconstructions obtained with variable density random undersampling of -space.

    The condition to accept a refined mask in Algorithm 2 is that its quality should be better than that of the pre-refinement mask/direct output from MNet and that of random masks, as we need to ensure the refined mask is improved in terms of contributing more for image reconstruction and the refined mask is non-trivial. The quality of the random mask is always evaluated with the initial reconstructor pre-trained for random-masked observations, and meanwhile the quality of adaptive masks and their refined counterparts is evaluated by the reconstructor that is updated during training.

  6. One can replace a Unet with other parametric reconstructors such as MoDL and unrolling iterative blocks in the mask-backward method in our joint training framework, as long as the reconstructor is differentiable to let the gradient to propagate through.

  7. In Step 4 in Algorithm 2, if are repetitive for more than half of the images in a batch, then we use random masks as the input for the mask-backward process to avoid any degeneracies.

Iii Experiments

Here, we first discuss the network architectures in our framework along with datasets used, hyperparameter choices, and the general experiments, before presenting results and comparisons.

Iii-a Implementation Details

MNet structure

The MNet mask predictor starts with a double-convolution block that is followed by four downsampling blocks, which are used in the classic Unet, and ends with four fully-connected layers. The input to the MNet

is the initially collected appropriately zero-padded limited (low-frequency)

-space information with real and imaginary parts separated into two channels, and the output is the sampling decision with respect to unobserved (higher) frequencies. The design of MNet consists of two parts: the encoding convolutional layers that emulate the encoder part in the Unet structure, and fully connected feedforward layers which originated in classification networks.

Fig. 2: Structure of MNet used in training. The kernel of 2D convolution is of size

, stride and dilation of sizes

and no padding. The Double-Conv layers maintain the size of the image for its input and output. The kernel size of the MaxPool layer is , and the kernel size of the AvgPool layer is .

Reconstructors

For Unet reconstructors, we adopt a standard 8-block Unet architecture following [24]. When carrying out the mask-backward training, both the input to and output from the reconstructor are single-channel real-valued images (to correspond with magnitude of reconstructions). For the input, we take the magnitude of the inverse FFT initial reconstruction. In the separate training process for individual reconstructors, the input images have 2 channels (for a complex-valued image, the real and imaginary parts are separated into 2 channels), which led to slightly better performance, and the outputs are single-channel real-valued images. The Unet reconstructor contains four downsampling blocks and four upsampling blocks, each consisting of two

convolutions separated by ReLU and instance normalization. We let the first downsampling block to have 64 channels which are expanded from the input 1 or 2 channels. The Unet used in the

mask-backward process has the residual structure with skip connections, while the Unet used in the separate training cases does not.

We also briefly review the idea of the MoDL reconstructor [2]. The MoDL reconstructor consists of a neural-network denoiser and a data-consistency-enforcing block. The image reconstruction problem the MoDL reconstructor tackles is of the form , where is a neural-network denoiser. The two successive or alternating steps in implementing the MoDL reconstructor are specifically (the data-consistency-enforcing block) and (the denoising step), where is the Hermitian of operator . One typically uses the conjugate gradient (CG) algorithm to compute the inverse in the data-consistency-enforcing block. In the implementation of MoDL in this work, we set the MoDL reconstructor to have 4 MoDL blocks owing to the consideration of computation complexity in the hardware and the error tolerance of the CG algorithm is set to be . The neural-network denoiser used in the MoDL reconstructor is a Unet as previously described with 64 channels in the first downsampling block without residual structure.

Dataset

We used the single-coil data in the NYU FastMRI dataset [34] for training and testing. In each file, we selected the middle 6 slices containing most prominent image patterns in the knee scan, thus making up a training dataset with 12649 images, and 1287 images were used for validation and 1300 images were used for testing (we follow the setup of [19] and split the original fastMRI validation set into a new validation set and a test set with the same amount of volumes). Each slice has the dimension , and so does its corresponding -space.

Baselines

With the same sampling ratio and low-frequency base (initial) observations, We benchmark our adaptive sampling MNet framework against the recent LOUPE sampler + reconstructor scheme, a random sampler, and an equi-distance sampler. We compare their performance for 4 and 8 acceleration of -space sampling, respectively. For a fair comparison to the random and equi-distance samplers, corresponding reconstructors are trained for both these cases.

Hyper-parameters

We used the normalized loss and the structural similarity index measure (SSIM) for measuring the image reconstruction quality. SSIM is computed using a window size of and hyperparameters , . When training a reconstructor (Unet or MoDL) in a separate manner (with respect to inputs sampled with MNet masks, random masks and equi-distance masks), the loss function is set as . The reconstruction loss function of the LOUPE model includes the unnormalized error to be consistent with the original work.

For the alternating training framework, recall that we solve the optimization problem (5) to provide labels (adaptive masks with respect to particular input) to train MNet. We set the consistency parameter . In practice, it is necessary to adjust the choice of the sparsity control parameter with respect to different image inputs in the training process, as an improper choice of may let the mask-backward protocol to output degenerate masks (masks for multiple different objects being the same) or not improve the initial masks. Hence, we pre-specify an grid and use a common as the initial setting with respect to each batch in the training set. If the returned masks from the mask-backward protocol fail to demonstrate sufficient adaptivity with respect to the given image batch or do not get updated from the input, we check the sampling ratio of the failed refined masks. If the failed refined masks have higher (lower) sampling ratio than the targeted value before they go through the sampling-budget control, we increase (decrease) to the next value in the pre-assigned grid and let the image batch to go through the mask-backward again. For acceleration of -space, the -grid is chosen as a geometric sequence from to with quotient , and for acceleration of -space, it is a geometric sequence from to with quotient .

We use the RMSprop optimizer in PyTorch through all the training instances. In the alternating training framework, for each batch, we carry out

steps of iteration inside the mask-backward process to generate labels of adaptive masks with the learning rate for Unet and for mask characterization parameters . The learning rate for MNet is , and 40 steps of update are made to MNet weights with respect to one batch of adaptive mask labels output by mask-backward. We run the alternating training framework for 10 epochs and obtain the corresponding MNet and Unet-co-trained.

When training separate reconstructors (both Unet and MoDL), we used an initial learning rate and the ReduceLROnPlateau scheduler in PyTorch to regulate the learning rate with patience 5, reduce factor 0.8 and minimal learning rate . Separate training of all reconstructors uses 40 epochs for each reconstructor.

When training the Loupe framework, we use the learning rate to update the mask parameters and to update the Unet in the Loupe framework. The slope parameter is set as 1. The Loupe model is trained for 40 epochs. These settings led to good performance of the LOUPE masks.

Regarding the sampling setting, in the 8 acceleration case, we set the base low-frequency initially observed lines to 8 rows, and remaining sampling budget for high-frequency information as 32 rows; and in the 4 acceleration case, the base low-frequency observation budget was 16 rows and the remaining sampling budget for high-frequency information was 64 rows.

(a) 8 acceleration masks (second row) predicted by MNet for different training slices (first row): Base low-frequency information contains the central 8 rows, and the sampling budget for high-frequency information is 32 rows.
(b) 4 acceleration masks (second row) predicted by MNet for different training slices (first row). Base low-frequency information contains the central 16 rows, and the sampling budget for high-frequency information is 64 rows.
Fig. 3: Adaptively predicted masks from MNet for slice examples in the single-coil FastMRI knee dataset.

Iii-B Results

In Figure 3, we visualize the adaptive masks output from the adaptive sampler MNet trained by Algorithm 2. Knee images in Figure 3 are the ground truth images. The MNet sampler sees the low-frequency information collected in -space and outputs the corresponding full subsampling patterns for sampling high-frequency information in a single pass, which are shown for each image and are object-adaptive.

Figure 4 is an example showing the reconstructed images by several different reconstructors based on information collected from -space according to different masks. The different corresponding masks are shown in Figure 5. We consider the following combinations of sampler-reconstructor pairs: MNet and co-trained Unet (Unet-CO), MNet and follow-up separately-trained Unet (Unet-SEP), MNet and a follow-up separately-trained MoDL reconstructor (MNet-MoDL), LOUPE co-trained sampler and reconstructor, random sampling mask and separately-trained Unet (rand.-Unet), and equi-distance mask and separately-trained Unet (equidist.-Unet). We note the SSIM and NMSE values above the shown images.

Fig. 4: Reconstruction results from various combinations of sampler-reconstructor pairs with 8 acceleration of -space.

One should note that the adaptive MNet mask shown in Figure 5 is for the ground truth object in Figure 4, while the underlying probabilistic parametrization of LOUPE mask in Figure 5 is fixed after the training process completes with respect to the training dataset. The random mask is re-generated for each different input during the training and testing process, so the random mask used in the sampling process can differ from one object to another. The equi-distance mask is fixed with respect to the entire training and testing dataset given the pre-assigned amount of low frequencies observed and the amount of sampling budget for remaining high frequencies.

Fig. 5: Comparison of different masks with respect to the object in Figure 4 with acceleration.

The complete reconstruction accuracy comparison for 8 acceleration is shown in Figure 6 using box plots. We consider four accuracy criteria to characterize reconstruction quality compared to the ground truth image: relative error, normalized mean square error, structural similarity index (SSIM), and high frequency error norm (HFEN) [21].

Fig. 6: Comparison via box plots of reconstruction accuracy of different sampling-reconstructor combinations with acceleration of -space. The triangle-shaped dot inside each box is the mean of the data in the box. Relative error is defined as for denoting the reconstruction and the ground truth image.
Method rel. er. NMSE HFEN SSIM
MNet-Unet-CO 0.1778 0.0415 0.4150 0.6603
MNet-Unet-SEP 0.1949 0.0529 0.4274 0.6965
MNet-MoDL 0.1701 0.0379 0.4087 0.6662
LOUPE 0.1879 0.0451 0.4330 0.6180
rand.-Unet 0.2512 0.1124 0.5735 0.6237
equidist.-Unet 0.2611 0.1128 0.5980 0.6034
TABLE I: Reconstruction accuracy comparisons at 8 acceleration ratio between various sampler-reconstructor combinations. The values in the table are mean values of each box in Figure 6.

Figure 6 shows that advantages of different combination of samplers and reconstructors under different criteria. The MNet-MoDL pair outperformed other combinations in terms of global accuracy (relative error and NMSE) and reconstruction quality of high frequency portions (HFEN), while the MNet-Unet-SEP pair showed better performance in terms of feature characterization (SSIM). Although separately trained sophisticated unrolled-network reconstructor (MoDL) demonstrates higher reconstruction accuracy in multiple criteria, we highlight the comparable performance from the MNet-Unet-CO pair due to the convenience of directly using a co-trained reconstructor without initiating a separate training effort. Both our object-adaptive sampling approach through MNet and the learning-based approach LOUPE are consistently better than the baseline methods that conduct random sampling and equi-distance sampling in -space, respectively.

The performance of the proposed sampler-reconstructor pairs characterized by NMSE and SSIM in this work are on par with the results reported in the FastMRI leaderboard. Few outliers in the boxplots indicates better stability of co-trained Unet reconstructor and separately trained MoDL reconstructor compared to the separately trained Unet-reconstructor.

One aspect of comparing these samplers is reproducibility. We point out that after the training is concluded, the MNet mask prediction is deterministic and has reproducibility guarantee, while LOUPE methods and other probability-based samplers give varying output due to their stochastic nature.

Fig. 7: Reconstruction results from various combinations of sampler-reconstructor pairs with 4 acceleration of -space.
Fig. 8: Comparison of different masks with respect to the object in Figure 7 at acceleration.
Method rel. er. NMSE HFEN SSIM
MNet-Unet-CO 0.1480 0.0294 0.2673 0.7510
MNet-Unet-SEP 0.1654 0.0378 0.3113 0.7660
MNet-MoDL 0.1437 0.0274 0.2663 0.7584
LOUPE 0.1540 0.0304 0.3356 0.7381
rand.-Unet 0.2026 0.0646 0.4448 0.7348
equidist.-Unet 0.2084 0.0650 0.4680 0.7261
TABLE II: Reconstruction accuracy comparisons at 4 acceleration for various sampler-reconstructor combinations. The values in the table are mean values of each box in Figure 9.
Fig. 9: Comparison of reconstruction accuracy of different sampling-reconstructor combinations with acceleration of -space.

Figures 7, 8, and 9 show the reconstruction examples, adaptive or compared mask examples, and reconstruction accuracy comparison results, respectively for the 4 acceleration case. The performance trend between methods in the 4 acceleration setting is similar to that for the 8 acceleration case discussed above.

We observe more concentration of the adaptive mask in lower frequencies in the 4 acceleration setting. This is due to the experiment setting that we start with more low frequencies in the 4 setting than in the 8 setting, and this may create a certain effect of frequency attraction. It can also be due to specific hyperparameter settings for training in the 4 setting.

Iv Conclusions

In this work, we propose an object-adaptive sampler for undersampling -space in MRI, while maintaining high quality of image reconstructions. The sampler is realized by a convolutional neural network that takes as its input very limited -space measurements (e.g., low-frequencies) and outputs the corresponding remaining sampling patterns at the desired sampling budget in a single pass. The training labels for the sampler network are generated internally in our framework using the mask-backward training protocol. We implemented the proposed sampler and alternating training framework on the single-coil knee FastMRI dataset and presented examples of adaptive masks with respect to various input image objects and undersampling factors. Our results show significant improvements in image quality with our single-pass object-adaptive sampling and reconstruction framework compared to other schemes.

Future directions: One central issue in all the learning-based adaptive sampling work is how to coordinate the parametrization of the mask and the challenge posed by the requisite mask binarization on gradient computation. We have resorted to numerical techniques in this work to address this issue. An alternative approach is to exploit delicate encoding schemes. We used the 1D line sampling in this work to demonstrate the capability of our framework, but it can be readily extended to subsampling phase encodes or shots in non-Cartesian settings by replacing the initial reconstructor with non-uniform FFT-based ones [7]. To apply the MNet method to predict general high-dimensional sampling patterns, some encoding schemes need to be introduced to reduce the complexity of parametrization. For instance, the parametrization format in BJORK [28] can be the target of MNet output, or in other words, the MNet predicts the interpolation coefficients in BJORK, which can immediately render the 2D sampling trajectory to be object-specific. Meanwhile, developing training algorithms with some theoretical guarantees remains of importance for future work.

In this work, an MNet is trained with respect to fixed amount of initial (low-frequency) observations and fixed target sampling ratio. The generalizability of such a trained MNet to a different subsampling target remains to be investigated. Although modifying the sampling budget control procedure can automatically adjust the output of the trained MNet to cater for different sampling ratio targets, it is unclear if the co-trained reconstructor can perform well with input images processed at a different subsampling ratio than that the reconstructor was trained for. This could be perhaps partially alleviated by re-training the reconstructor by itself with sampling patterns (post-binarization) output by MNet with different sampling ratios than in training.

We also consider the setting of dynamic data acquisition to be of high interest for our future work. When the object evolves dynamically over time, it would be ideal to have a corresponding sampling scheme that copes with the changing object rather than directly applying the method developed for static MRI. For example, the input to the MNet could include data from the previous frame(s). Similarly, one can replace various terms in the mask-backward objective function (5) with domain-knowledge motivated priors or other penalty terms (e.g., reconstruction errors or contrast in regions of interest) that may boost the performance of the reconstructors to better capture certain features in the underlying data. Finally, while our experiments focused on single-coil data, the proposed MNet method can directly generalize to the multi-coil setting, as one simply needs to revise the input format for the networks without changing the rest of the training framework.

Acknowledgments

We thank Dr. Michael McCann at Los Alamos National Laboratory, and Shijun Liang and Siddhant Gautam at Michigan State University for helpful discussions.

References

  • [1] B. Adcock, A. C. Hansen, C. Poon, and B. Roman (2013) Breaking the coherence barrier: a new theory for compressed sensing. arXiv preprint arXiv:1302.0561. Cited by: §I-B.
  • [2] H. K. Aggarwal and M. Jacob (2020) J-modl: joint model-based deep learning for optimized sampling and reconstruction. IEEE Journal of Selected Topics in Signal Processing 14 (6), pp. 1151–1162. External Links: Document Cited by: §I-C, §III-A.
  • [3] H. K. Aggarwal, M. P. Mani, and M. Jacob (2018) Model based image reconstruction using deep learned priors (modl). In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Vol. , pp. 671–674. External Links: Document Cited by: item 3.
  • [4] C. D. Bahadir, A. Q. Wang, A. V. Dalca, and M. R. Sabuncu (2020) Deep-learning-based optimization of the under-sampling pattern in mri. IEEE Transactions on Computational Imaging 6 (), pp. 1139–1152. External Links: Document Cited by: §I-C, item 4.
  • [5] Y. Bengio, N. Léonard, and A. Courville (2013-08) Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. arXiv e-prints, pp. arXiv:1308.3432. External Links: 1308.3432 Cited by: item 3.
  • [6] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein (2011-01) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3 (1), pp. 1–122. External Links: ISSN 1935-8237, Document Cited by: §I-A.
  • [7] J. A. Fessler (2007-10) On NUFFT-based gridding for non-Cartesian MRI. J. Mag. Res. 188 (2), pp. 191–5. External Links: Document Cited by: §IV.
  • [8] S. Foucart and H. Rauhut (2013) A mathematical introduction to compressive sensing. Birkhäuser, New York, NY. External Links: Document, ISBN 0817649476 Cited by: §I-A.
  • [9] D. Gilton, G. Ongie, and R. Willett (2020) Neumann networks for linear inverse problems in imaging. IEEE Transactions on Computational Imaging 6 (), pp. 328–343. External Links: Document Cited by: §I-A.
  • [10] B. Gözcü, R. K. Mahabadi, Y. Li, E. Ilıcak, T. Çukur, J. Scarlett, and V. Cevher (2018) Learning-based compressive MRI. IEEE Transactions on Medical Imaging 37 (6), pp. 1394–1406. Cited by: §I-C, §II.
  • [11] Z. Huang and S. Becker (2020) Perturbed proximal descent to escape saddle points for non-convex and non-smooth objective functions. In Recent Advances in Big Data and Deep Learning, L. Oneto, N. Navarin, A. Sperduti, and D. Anguita (Eds.), Cham, pp. 58–77. External Links: ISBN 978-3-030-16841-4 Cited by: §I-A.
  • [12] Z. Huang, S. Ye, M. T. McCann, and S. Ravishankar (2021-03) Model-based Reconstruction with Learning: From Unsupervised to Supervised and Beyond. arXiv e-prints, pp. arXiv:2103.14528. External Links: 2103.14528 Cited by: §I-A.
  • [13] I. A.M. Huijben, B. S. Veeling, and R. J.G. van Sloun (2020) Learning sampling and model-based signal recovery for compressed sensing mri. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. , pp. 8906–8910. External Links: Document Cited by: §I-C.
  • [14] M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly (2008) Compressed sensing MRI. IEEE Signal Processing Magazine 25 (2), pp. 72–82. Cited by: §I-A.
  • [15] M. Lustig, D. Donoho, and J. M. Pauly (2007) Sparse MRI: the application of compressed sensing for rapid MR imaging. Magnetic resonance in medicine 58 (6), pp. 1182–1195. Cited by: §I-A, §I-B.
  • [16] M. Mardani, E. Gong, J. Y. Cheng, S. S. Vasanawala, G. Zaharchuk, L. Xing, and J. M. Pauly (2019) Deep generative adversarial neural networks for compressive sensing mri. IEEE Transactions on Medical Imaging 38 (1), pp. 167–179. External Links: Document Cited by: §I-A.
  • [17] V. Nair, S. Bartunov, F. Gimeno, I. von Glehn, P. Lichocki, I. Lobov, B. O’Donoghue, N. Sonnerat, C. Tjandraatmadja, P. Wang, R. Addanki, T. Hapuarachchi, T. Keck, J. Keeling, P. Kohli, I. Ktena, Y. Li, O. Vinyals, and Y. Zwols (2020-12) Solving Mixed Integer Programs Using Neural Networks. arXiv e-prints, pp. arXiv:2012.13349. External Links: 2012.13349 Cited by: §I-B.
  • [18] G. Ongie, A. Jalal, C. A. Metzler, R. G. Baraniuk, A. G. Dimakis, and R. Willett (2020-05) Deep learning techniques for inverse problems in imaging. IEEE Journal on Selected Areas in Information Theory 1 (1), pp. 39–56. External Links: Document Cited by: §I-A.
  • [19] L. Pineda, S. Basu, A. Romero, R. Calandra, and M. Drozdzal (2020) Active mr k-space sampling with reinforcement learning. In International Conference on Medical Image Computing and Computer-Assisted Intervention, Cited by: §I-C, §III-A.
  • [20] K. P. Pruessmann (2006) Encoding and reconstruction in parallel MRI. NMR in Biomedicine 19 (3), pp. 288–299. Cited by: §II.
  • [21] S. Ravishankar and Y. Bresler (2011) MR image reconstruction from highly undersampled k-space data by dictionary learning. IEEE Transactions on Medical Imaging 30 (5), pp. 1028–1041. Cited by: §III-B.
  • [22] S. Ravishankar and Y. Bresler (2011) Adaptive sampling design for compressed sensing mri. In 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 3751–3755. Cited by: §I-B, §I-C.
  • [23] S. Ravishankar, J. C. Ye, and J. A. Fessler (2020) Image reconstruction: from sparsity to data-adaptive methods and machine learning. Proceedings of the IEEE 108 (1), pp. 86–109. Cited by: §I-A.
  • [24] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi (Eds.), Cham, pp. 234–241. External Links: ISBN 978-3-319-24574-4 Cited by: §I-A, §III-A.
  • [25] M. Seeger, H. Nickisch, R. Pohmann, and B. Schölkopf (2010) Optimization of k-space trajectories for compressed sensing by bayesian experimental design. Magn. Reson. Med. 63 (1), pp. 116–26. Cited by: §I-C.
  • [26] F. Sherry, M. Benning, J. C. De los Reyes, M. J. Graves, G. Maierhofer, G. Williams, C. Schönlieb, and M. J. Ehrhardt (2020) Learning the sampling pattern for mri. IEEE Transactions on Medical Imaging 39 (12), pp. 4310–4321. External Links: Document Cited by: §I-C.
  • [27] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg (2013-12) Plug-and-play priors for model based reconstruction. In 2013 IEEE Global Conference on Signal and Information Processing, External Links: Document Cited by: §I-A.
  • [28] G. Wang, T. Luo, J. Nielsen, D. C. Noll, and J. A. Fessler (2021-01) B-spline Parameterized Joint Optimization of Reconstruction and K-space Trajectories (BJORK) for Accelerated 2D MRI. arXiv e-prints, pp. arXiv:2101.11369. External Links: 2101.11369 Cited by: §I-C, §IV.
  • [29] T. Weiss, O. Senouf, S. Vedula, O. Michailovich, M. Zibulevsky, and A. Bronstein (2019) PILOT: Physics-Informed Learned Optimal Trajectories for Accelerated MRI. arXiv e-prints. External Links: 1909.05773 Cited by: §I-C.
  • [30] Q. Xu, H. Yu, X. Mou, L. Zhang, J. Hsieh, and G. Wang (2012) Low-dose X-ray CT reconstruction via dictionary learning. IEEE Transactions on Medical Imaging 31 (9), pp. 1682–1697. Cited by: §I-A.
  • [31] Y. Yang, J. Sun, H. Li, and Z. Xu (2016) Deep ADMM-Net for compressive sensing MRI. In Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 10–18. Cited by: §I-A.
  • [32] S. Ye, Z. Li, M. T. McCann, Y. Long, and S. Ravishankar (2021) Unified supervised-unsupervised (super) learning for x-ray ct image reconstruction. IEEE Transactions on Medical Imaging 40 (11), pp. 2986–3001. External Links: Document Cited by: §I-A.
  • [33] T. Yin, Z. Wu, H. Sun, A. V. Dalca, Y. Yue, and K. L. Bouman (2021-05) End-to-End Sequential Sampling and Reconstruction for MR Imaging. arXiv e-prints, pp. arXiv:2105.06460. External Links: 2105.06460 Cited by: §I-C, item 3.
  • [34] J. Zbontar, F. Knoll, A. Sriram, T. Murrell, Z. Huang, M. J. Muckley, A. Defazio, R. Stern, P. Johnson, M. Bruno, M. Parente, K. J. Geras, J. Katsnelson, H. Chandarana, Z. Zhang, M. Drozdzal, A. Romero, M. Rabbat, P. Vincent, N. Yakubova, J. Pinkerton, D. Wang, E. Owens, C. L. Zitnick, M. P. Recht, D. K. Sodickson, and Y. W. Lui (2018) fastMRI: an open dataset and benchmarks for accelerated MRI. External Links: 1811.08839 Cited by: §I-D, §III-A.
  • [35] J. Zhang, H. Zhang, A. Wang, Q. Zhang, M. Sabuncu, P. Spincemaille, T. D. Nguyen, and Y. Wang (2020-07) Extending LOUPE for K-space Under-sampling Pattern Optimization in Multi-coil MRI. arXiv e-prints, pp. arXiv:2007.14450. External Links: 2007.14450 Cited by: item 3.