Learning the Sampling Pattern for MRI

06/20/2019 ∙ by Ferdia Sherry, et al. ∙ University of Cambridge 5

The discovery of the theory of compressed sensing brought the realisation that many inverse problems can be solved even when measurements are "incomplete". This is particularly interesting in magnetic resonance imaging (MRI), where long acquisition times can limit its use. In this work, we consider the problem of learning a sparse sampling pattern that can be used to optimally balance acquisition time versus quality of the reconstructed image. We use a supervised learning approach, making the assumption that our training data is representative enough of new data acquisitions. We demonstrate that this is indeed the case, even if the training data consists of just 5 training pairs of measurements and ground-truth images; with a training set of brain images of size 192 by 192, for instance, one of the learned patterns samples only 32 a test set of similar images. The proposed framework is general enough to learn arbitrary sampling patterns, including common patterns such as Cartesian, spiral and radial sampling.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 5

page 6

page 7

page 11

page 12

Code Repositories

Learning-Sampling-Pattern

Learning sampling pattern with L-BFGS-B algorithm and reconstructions with PDHG.


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The field of compressed sensing is founded on the realisation that in inverse problems it is often possible to recover signals from incomplete measurements. To do so, the inherent structure of signals and images is exploited. Finding a sparse representation for the unknown signal reduces the number of unknowns and consequently the number of measurements required for reconstruction. This is of great interest in many applications, where external reasons (such as cost or time constraints) typically imply that one should take as few measurements as are required to obtain an adequate reconstruction. A specific example of such an application is magnetic resonance imaging (MRI). In MRI, measurements are modelled as samples of the Fourier transform (points in so-called k-space) of the signal that is to be recovered and taking measurements is a time-intensive procedure. Keeping acquisition times short is important to ensure patient comfort and to mitigate motion artefacts, and it increases patient throughput, thus making MRI effectively cheaper. Hence, MRI is a natural candidate for the application of compressed sensing methodology. While the original theoretical results of compressed sensing (as in

[1], in which exact recovery results are proven for uniform random sampling strategies) do not apply well to MRI, three underlying principles were identified that enable the success of compressed sensing [2][3]: 1) sparsity or compressibility of the signal to be recovered (in some sparsifying transform, such as a wavelet transform), 2) incoherent measurements (with respect to the aforementioned sparsifying transform) and 3) a nonlinear reconstruction algorithm that takes advantage of the sparsity structure in the true signal. The nonlinear reconstruction algorithm often takes the form of a variational regularisation problem:

(1)

with the subsampling operator, the Fourier transform, the subsampled measurements, a regularisation functional that encourages the reconstruction to have a sparsity structure and the regularisation parameter that controls the trade-off between the fit to measurements and fit to structure imposed by . Many previous efforts made towards accelerating MRI have focused on improving how these aspects are treated. The reconstruction algorithm can be changed to more accurately reflect the true structure of the signal: the typical convex reconstruction problem can be replaced by a dictionary learning approach [4]; in multi-contrast imaging, structural information obtained from one contrast can be used to inform a regularisation functional to use in the other contrasts [5]; and in dynamic MRI additional low rank structure can be exploited to improve reconstruction quality [6, 7].

Fig. 1: The importance of a good choice of sampling pattern. Left: uniform random pattern and reconstruction on a test image. Right: an equally sparse pattern learned by our algorithm and reconstruction for the same test image.

It is well known that sampling uniformly at random in k-space (as the original compressed sensing theory suggests [1]) does not work well in practice; using a variable density sampling pattern greatly improves reconstruction quality [2], see Figure 1. In the works [8, 9, 10, 11, 12], subsampling strategies are studied that can be used in practice. On the theoretical side, the compressed sensing assumptions phave been refined to prove bounds on reconstruction errors for variable density sampling [13, 14] and exact recovery results for Cartesian line sampling with anisotropic total variation regularisation [15].

The sampling pattern can be optimised in a given setting to improve reconstruction quality. There are works on fine-tuning sampling patterns [16, 17], greedy algorithms to pick a suitable pattern for a given reconstruction method [18, 19, 20]

, jointly learning a Cartesian line pattern and neural network reconstruction algorithm

[21], and optimal patterns for zero-filling reconstructions can be computed from a training set with little computational effort [22]. We consider the problem of learning an optimal sparse sampling pattern from scratch for a given variational reconstruction method and class of images by solving a bilevel optimisation problem. A similar approach has been used to learn regularisation parameters for variational regularisation models [23], among other things.

I-a Our Contributions

In this work, we propose a novel bilevel learning approach to learn sparse sampling patterns for MRI. We do this within a supervised learning framework, using training sets of ground truth images with the corresponding measurements.

Our approach can accommodate arbitrary sampling patterns and sampling densities. We demonstrate that the parametrisation of the sampling pattern can be chosen to learn a pattern consisting of a scattered set of points as well as Cartesian lines, but other parametrisations can also be designed that result in radial or spiral sampling, for instance. By using a sparsity promoting penalty on the sampling pattern, we can also vary the sampling rates of our learned patterns.

Besides this, it is also possible to use a wide variety of variational reconstruction algorithms, that is various choices of regularisation in Problem (1), and we can simultaneously learn the sampling pattern and the optimal regularisation parameter for reconstruction. This forgoes the need to separately tune the parameters of the reconstruction method.

Our optimal sampling patterns confirm empirically the validity of variable density sampling patterns: the optimal patterns tend to sample more densely around the low frequencies and more sparsely at high frequencies. We investigate the dependence of the shape of the sampling density on the sampling rate and the choice of regularisation functional .

By focusing on a particular region within the body, our approach can be used with very small training sets to learn optimal patterns, that nevertheless generalise well to unseen MRI data. We demonstrate this on a set of brain images; indeed, in this setting we find that a training set of just five image, measurement pairs is sufficient.

Ii Model and Methods

In the bilevel learning framework, the free parameters of a variational regularisation method are learned to optimise a given measure of reconstruction quality. We assume that we are given a variational regularisation method to perform the reconstruction, of a form such as Problem (1). Furthermore, we assume that we are given a training set of pairs of clean images and fully sampled noisy k-space data . With these ingredients and a choice of parametrisation of the sampling pattern and regularisation parameter , we set up a bilevel optimisation problem that can be solved to learn the optimal parameters:

(2)

We will refer to Problem (2) as the upper level problem and will call the variational regularisation problems that make up its constraints the lower level problems. Each

is a loss function that quantifies the discrepancy between the reconstruction from subsampled measurements,

, and the corresponding ground truth and is a penalty on the sampling pattern that encourages its sparsity. Hence, the objective function in Problem (2) is a penalised empirical loss function, the minimiser of which trades off the reconstruction quality against the sparsity of the sampling pattern in an optimal manner. As we show in Section II-C2, it is possible to differentiate the solution maps in our setting, so that Problem (2) is amenable to treatment by first order optimisation methods.

In this section, we describe in more detail the various aspects that make up Problem (2) in our setting, starting with the lower level problems, followed by the upper level problem, after which we describe the methods that can be applied to solve the problem.

Ii-a Variational regularisation models

The lower level problems in Problem (2) are variational regularisation problems. In this section, we specify the class of variational regularisation problems that will be considered. In our application, an image of resolution

is modeled as a nonnegative vector in

by concatenating its columns. The subsampled measurements corresponding to a given image are modeled as . Here takes the real part of a complex signal componentwise, is the unitary discrete Fourier transform,

(3)

is the sampling operator (with and being the standard basis of ), which selects the points in k-space that are included in the measurements (and can be used as a weight on those measurements), and

is complex Gaussian white noise.

The variational regularisation approach to estimating the true image

from measurements proceeds by solving an optimisation problem that balances fitting the measurements with fitting prior knowledge that is available about the image. In this work we consider problems of the following form:

(4)

where is a collection of linear operators , , , and for some convex . Note that this includes a number of common regularisation functionals:

  • and (with discretisations of the partial differential operators along orthogonal directions) gives the isotropic total variation as regularisation term; its use in variational regularisation problems has been studied since [24],

  • and for some sparsifying transform , such as a wavelet or shearlet transform, gives a sparsity penalty on the transform coefficients of the image as regularisation term.

In the above form, these problems do not satisfy the regularity conditions required to apply first order optimisation methods to Problem (2). To get around this issue, we slightly modify the objective function in Problem (4): the nonnegativity constraint is removed and emulated by adding a convex barrier function, , and a strongly convex penalty is added. To be exact, we let

for some large . Furthermore, is assumed to satisfy the following conditions: 1) is increasing, 2) is and 3) as . Note that these conditions exclude , but can be chosen to approximate the absolute value function: for instance, take for some , where

This choice of can be thought of as a version of the Huber loss function [25]. Taking and this choice of , the regularisation term in Problem (4) approximates the total variation regularisation.

With these modifications to the objective function, we define the lower level energy functional , given fully sampled training measurements , as follows:

(5)

with and defined in the next section.

Ii-B The upper level problem

In the upper level problem, we parametrise the sampling pattern and the lower level regularisation parameter by a vector : we let and . This parametrisation allows us to learn a sampling pattern of scattered points in k-space, though it is worth noting that the parametrisation can be generalised to constrain the learned pattern. To prevent the notation from becoming overly cumbersome, we do not consider this generalisation here, but refer the reader to Section -A in the Appendix for the details.

With this parametrisation, a natural choice of the sparsity penalty is as follows:

with a parameter that decides how reconstruction quality is traded off against sparsity of the sampling pattern. Besides encouraging a sparse sampling pattern, this penalty encourages the weights in the sampling pattern to take either the value 0 or 1. For the loss function , we choose , but it is straightforward to replace this by any other smooth loss function. For instance, if one is interested in optimising the quality of the recovered edges one could use the smoothed total variation as a loss function: , with as defined in Section II-A.

Ii-C Methods

As was mentioned in Section II, first order optimisation methods can be used to solve problems like Problem (2), provided that the solution maps of the lower level problems, , can be computed and can be differentiated. In this section we describe the approach taken to computing the solution maps and their derivatives and then describe how these steps are combined to apply first order optimisation methods to Problem (2).

Ii-C1 Computing the solution maps of the lower level problems

In this and the next subsection, we will consider the lower level problem for a fixed , so for the sake of notational clarity, we will drop the subscript and write . The lower level energy functional is convex in and takes the saddle-point structure that is used in the primal-dual hybrid gradient algorithm (PDHG) of Chambolle and Pock [26]. Indeed, we can write

with , , where , (we let and and think of as a real vector space) and

With this splitting, the parameter choices from Section -C2 in the Appendix and an arbitrary initialisation (we can take it to be the zero-filling reconstruction, or warm start the solver) the following iterative algorithm solves the lower level problem with a linear convergence rate:

for  to maxit do
     
     
     
     
     if  then
         break the loop
     end if
end for
Algorithm 1 Solving the lower level problem with PDHG

Ii-C2 Differentiating the solution map

In the previous subsection, we saw that we can compute the solution maps of the lower level problems. To apply first order optimisation methods to Problem (2), we still need to be able to differentiate these solution maps. To this end, note that the solution map of can be defined equivalently by its first order optimality condition:

and that is in our setting. To ease notation, let us write in this subsection. Since is strongly convex in , its Hessian is positive definite and hence invertible. As a consequence, the implicit function theorem tells us that the solution map is and its derivative satisfies the linear system below:

so that

In fact, we do not need the full derivative of the solution map in our application, but just the gradient of a scalar function of the solution map, namely for some ground truth

. The chain rule and the formula above give us a formula for this gradient:

(6)

It is worth noting that this expression for the gradient can also be derived using the Lagrangian formulation of Problem (2), through the adjoint equation, and this is the way in which it is usually derived when an optimal control perspective is taken [23]. To implement this formula in practice, we do not compute the Hessian matrix of and invert it exactly (since the Hessian is very large; it has as many rows and columns as the images we are dealing with have pixels). Instead, we emphasise that the Hessian is symmetric positive definite, so that it is suitable to solve the linear system with an iterative solver such as the conjugate gradient method. For this, we just need to compute the action of the Hessian, for which we can give explicit expressions. These computations have been done in Section -D of the appendix. The expressions derived in the appendix for and can be implemented efficiently in practice and are then used in the conjugate gradient method (CG) to compute the desired gradients.

Ii-C3 Solving the bilevel problem using L-BFGS-B

Recall that the problem that we are trying to solve (Problem (2)) takes the following form:

By the previous sections, we know that the objective function of this problem is , and the constraints that we impose on the parameters form a box constraint, so the optimisation problem that we consider is amenable to treatment by the L-BFGS-B algorithm [27, 28]. Since the objective function splits as a sum over the training set, it is completely straightforward to parallelise the computations of the solution maps and desired gradients over the training set (to emphasise this we refer to the loop over the training set as a parfor loop rather than just a for loop):

,
parfor  to  do
     ,
     Run Algorithm 1 to obtain
     Solve the system in Equation (6) with CG to obtain
end parfor
Algorithm 2 Computing the objective function value and gradient of the bilevel problem

Algorithm 2 can be plugged in to an implementation of the L-BFGS-B algorithm such as the one in [29] to solve Problem (2).

Iii Experiments

Our methods have been implemented in MATLAB, using a publicly available version of the L-BFGS-B algorithm [29]. Since the learning problem is non-convex, care must be taken with the choice of initialisation. We initialise the learning with a full sampling pattern and the corresponding optimal regularisation parameter. This optimal regularisation parameter is learned using our method, keeping the sampling pattern fixed to fully sample k-space; the optimal regularisation parameter is typically found in less than 10 iterations of the L-BFGS-B algorithm.

In this section, we have experiments in which we look at

- varying the sparsity parameter to control the sparsity of the learned pattern,

- learning Cartesian line patterns with our method,

- using different lower level regularisations,

- varying the size of the training set,

- comparing the learned patterns to other sampling patterns,

- learning sampling patterns for high resolution imaging.

Except in the subsection in which we consider different lower level regularisations, we use a total variation type regularisation in all experiments. That is, for a small and in the lower level problem. We refer the reader to the supporting document for figures that may be of interest, but are not crucial to the understanding of the results.

Iii-a Data

The brain images are of size , taken as slices from 7 separate 3D scans. The corresponding noisy measurements are simulated by taking discrete Fourier transforms of these slices and adding Gaussian white noise. The scans were acquired on a Siemens PrismaFit scanner. For all scans except one, TE = 2.97 ms, TR = 2300 ms and the Inversion Time was 1100 ms. For the other scan, TE = 2.98 ms, TR = 2300 ms and the Inversion Time was 900 ms.

The high resolution images are of size , taken as slices from a 3D scan of a test phantom. Again, the noisy measurements are simulated by taking discrete Fourier transforms of these slices and adding Gaussian white noise. The scan was acquired on a GE 3T scanner using spoiled gradient recalled acquisition with TE = 12 ms and TR = 37 ms.

Iii-B Varying the sparsity parameter

Learning with a training set of 7 brain images, we consider the effect of varying the sparsity parameter . Increasing this parameter tends to make the learned patterns sparser, although we do see a slight deviation from this monotone behaviour for large . Figure 2 shows examples of the learned patterns and reconstructions on a test image and in Figure 3, we see the performance of the learned patterns, evaluated on a test set of 70 brain images.

Fig. 2: Learned sampling patterns and the corresponding reconstructions on a test image with TV regularisation in the lower level problem.
Fig. 3: Performance of the learned patterns (measured using the SSIM index) on the test set, and the lower level regularisation parameter that was learned, against the fraction of k-space that is sampled.

We use a Gaussian kernel density estimator to estimate a sampling distribution corresponding to each pattern. That is, we convolve the learned pattern with a Gaussian filter with a small bandwidth and normalise the resulting image to sum to 1. The results of doing this can be seen in Figure 

4: we see that the distributions become more peaked strongly around the origin as the patterns become sparser.

Fig. 4: Gaussian kernel density estimates of the sampling distributions for reconstruction with TV regularisation.

Iii-C Cartesian line sampling

As described in Section -A of the Appendix, we can restrict the learned pattern to sample along Cartesian lines. Similarly to the case of learning scattered points in k-space, we have some control over the sparsity of the learned pattern using the parameter . As is seen in Figure 5 the sparsity penalty does not seem to work as well in this situation in encouraging the weights of the pattern to be binary.

Fig. 5: Learned Cartesian line sampling patterns and the corresponding reconstructions on a test image with TV regularisation in the lower level problem.

Iii-D Other lower level regularisations

Iii-D1 Wavelet regularisation

Instead of the TV type regularisation, we use a sparsity penalty on the wavelet coefficients of the image. We accomplish this by choosing and for an orthogonal wavelet transform (we use Daubechies 4 wavelets). This gives results that are, at a first glance, very similar to those for the total variation regularisation. On closer inspection, when comparing two patterns from the TV and wavelet regularisation with the same sparsity, we find that the pattern for the wavelet regularisation is more strongly peaked around the origin. We can see this in Figures 6 and 7, where we have estimated the sampling distributions for two learned patterns with TV and wavelet regularisation, both of which sample 27% of k-space.

Fig. 6: Gaussian kernel density estimates of the sampling distributions for reconstruction with wavelet and TV regularisation (for the same sparsity in k-space).
Fig. 7: Diagonal slices from each of the distributions in Figure 6, which shows clearly that the sampling distribution for reconstruction with wavelet regularisation is more strongly peaked around the centre.

Iii-D2 regularisation

We use the squared seminorm as lower level regularisation, if we take and in the lower level problem. With this choice, we find that the learned equals and that the learned pattern does not take on just binary values: the weights of the learned pattern are lower at higher frequencies, as can be seen in Figure 8.

Fig. 8: Learned sampling patterns and the corresponding reconstructions on a test image with regularisation in the lower level problem.

Iii-D3 No regularisation

Taking no regularisation in the lower level problem, i.e. and fixing , we find essentially the same results as when we considered the regularisation: the weights in the learned pattern show a decay away from the origin as in Figure 8.

Iii-D4 Comparison of the different regularisations

We compare the performance of the learned patterns with the different lower level regularisations. In Table I, we list the performance of three of these patterns on the test set of brain images, each pattern sampling roughly the same proportion of k-space.

SSIM PSNR
TV regularisation
Wavelet regularisation
regularisation/no regularisation
TABLE I: Performance of the learned patterns with different lower level regularisation functionals.

The TV regularisation is seen to outperform wavelet regularisation, which in turn outperforms regularisation. Figure 9 shows the three patterns that we are comparing and the corresponding reconstructions on a test image. We note that this method can easily be extended to other regularisation functions (such as the Total Generalised Variation) that have been used in the context of MRI [30, 31].

Fig. 9: A comparison of learned sampling patterns for the different lower level regularisations that we have considered.

Iii-E Varying the size of the training set

To investigate the effect of the size of the training set, we ran our method on different training sets of slices of brain images, of sizes 1, 3, 5, 10, 20, 30 to obtain sampling patterns of roughly the same sparsity. As we see in Figure 10, the learned patterns perform similarly well for training sets of size 5 or larger.

Fig. 10: The performance of the learned pattern as it depends on the size of the training set.

Iii-F Comparing with other patterns

We compare the performance of the learned patterns to that of other sampling patterns. Here we consider a radial line sampling pattern and a low-pass sampling pattern, both with a learned optimal regularisation parameter. Table II shows the performance of the learned patterns and the other patterns on the test set of brain images.

SSIM PSNR
Learned free pattern
Learned Cartesian line pattern
Radial line pattern
Low-pass pattern
TABLE II: Comparison of the performance of the learned patterns to that of other sampling patterns.

The learned free pattern is found to perform better than the others. Figure 11 shows the patterns used in the comparison and the corresponding reconstructions on a test image. The arrows in the zoomed views point to a black dot that is better resolved using the learned patterns.

Fig. 11: A comparison of different sampling patterns. From left to right: the learned pattern, learned Cartesian line pattern, radial line pattern, and a low-pass sampling pattern.

Iii-G High resolution example

Up to this point, the experiments have been run on brain images of size . For this experiment, we used a training set of 5 slices taken from a high resolution scan of a phantom. In Figure 12, we consider another slice from this scan to see how well the learned pattern performs. We see fine scale details that are resolved better by sampling according to the learned pattern than using a low-pass sampling pattern.

Fig. 12: A comparison of the learned pattern and a low-pass sampling pattern in the high resolution setting with TV regularisation in the lower level problem.

Iv Discussion and Outlook

All our experiments were carried out on 2D images. With minor modifications, the proposed method can be applied to learn sampling patterns for 3D MRI. To accelerate MRI in practice, it is necessary to take into account the physical constraints imposed on sampling. The scattered pattern of points learned by our method is not immediately useful for accelerating 2D MRI, but it can be used for accelerated 3D MRI. If our method is extended to 3D MRI, the problem of efficiently sampling along these patterns in practice comes up again. In [32], a method is proposed (which has been implemented in practice at NeuroSpin [33]) that can be used to generate practical sampling strategies from a given target density. We can estimate a target density from our learned pattern, and use it as an input to this method.

Besides these extensions to our method, one can consider more general lower level regularisation functionals and allow for more flexibility to learn a custom regulariser as was done for denoising in [34], or unroll the lower level algorithm and use an approach similar to that of the variational network [35].

As we saw in Table II, radial line patterns perform well even without being tuned using the training data. With an appropriate choice of the parametrisation, our method can be used to learn optimal radial line patterns, or other physically feasible optimal sampling patterns.

In our framework, we made smoothness assumptions on the lower level problems in order to differentiate their solution maps. Similar results can be derived assuming partial smoothness of the regularisation functionals [36], which covers total variation regularisation and the wavelet regularisation without needing to smooth them. The non-smooth lower level problems will be harder to solve, but it might be possible to deal directly with non-smooth lower level problems using this approach. Alternatively, one could consider optimality conditions for bilevel optimisation problems with non-smooth lower level problems [37] and attempt to solve the optimality conditions.

Despite being a smooth optimisation problem, the learning procedure is computationally intensive, since the lower level problems have to be solved to high accuracy in each iteration; we found that it can take 10 hours to learn a sampling pattern on images of size on our computers with 12-core Intel Xeon E5 CPUs. These issues are alleviated by warm-starting the lower level solvers and it may be possible to do something similar with the iterative solver used to compute gradients. The learning problem is non-convex, which might explain some of the issues that were found: the relationship between the sparsity of the learned pattern and was observed to be not quite monotone, and with regularisation, the method learns an inferior pattern and non-zero regularisation parameter if is chosen too large. It may be possible to circumvent these issues by replacing L-BFGS-B by an optimisation method that is better suited to the learning problem.

V Conclusions

We have proposed a supervised learning approach to learn high quality sampling patterns for accelerated MRI for a given variational reconstruction method. We have demonstrated that this approach is highly flexible, allowing for a wide variety of regularisation functionals to be used and allowing constraints to be imposed on the learned sampling patterns. Furthermore, we have shown that the method can be used successfully with small training sets. The learned patterns perform favourably compared to standard sampling patterns such as low-pass patterns and radial line patterns, both quantitatively (measured by SSIM and PSNR on a test set) and qualitatively (by comparing the resolution of fine scale details).

This work shows that it is feasible to learn sampling patterns by applying continuous optimisation methods to a bilevel optimisation problem and suggests multiple ways in which this methodology can be extended to work in different settings.

Acknowledgments

CBS acknowledges support from the Leverhulme Trust project on Breaking the non-convexity barrier, the Philip Leverhulme Prize, the EPSRC grant Nr. EP/M00483X/1, the EPSRC Centre Nr. EP/N014588/1, the European Union Horizon 2020 research and innovation programmes under the Marie Skłodowska-Curie grant agreement No 777826 NoMADS and No 691070 CHiPS and the Alan Turing Institute. GM acknowledges support from the LMS grant URB 16-40. MB acknowledges support from the Leverhulme Trust Early Career Fellowship ’Learning from mistakes: a supervised feedback-loop for imaging applications’. FS and CBS acknowledge support from the Cantab Capital Institute for the Mathematics of Information.

References

  • [1] E. J. Candès, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Communications on Pure and Applied Mathematics, vol. 59, no. 8, pp. 1207–1223, 2006.
  • [2] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” MRM, vol. 58, no. 6, pp. 1182–1195, 2007.
  • [3] D. K. Sodickson et al., “The rapid imaging renaissance: sparser samples, denser dimensions, and glimmerings of a grand unified tomography,” in Proceedings of SPIE, B. Gimi and R. C. Molthen, Eds., vol. 9417.   International Society for Optics and Photonics, 2015, p. 94170G.
  • [4] S. Ravishankar and Y. Bresler, “MR Image Reconstruction From Highly Undersampled k-Space Data by Dictionary Learning,” IEEE TMI, vol. 30, no. 5, pp. 1028–1041, 2011.
  • [5] M. J. Ehrhardt and M. M. Betcke, “Multicontrast MRI Reconstruction with Structure-Guided Total Variation,” SIIMS, vol. 9, no. 3, pp. 1084–1106, 2016.
  • [6] S. G. Lingala et al., “Accelerated dynamic MRI exploiting sparsity and low-rank structure: k-t SLR,” IEEE TMI, vol. 30, no. 5, pp. 1042–1054, 2011.
  • [7] B. Trémoulhéac et al., “Dynamic MR Image Reconstruction-Separation From Undersampled (k, t)-Space via Low-Rank Plus Sparse Prior,” IEEE TMI, vol. 33, no. 8, pp. 1689–1701, 2014.
  • [8] D. Piccini et al., “Spiral phyllotaxis: The natural way to construct a 3D radial trajectory in MRI,” MRM, vol. 66, no. 4, pp. 1049–1056, 2011.
  • [9] L. Feng et al., “Golden-angle radial sparse parallel MRI: Combination of compressed sensing, parallel imaging, and golden-angle radial sampling for fast and flexible dynamic volumetric MRI,” MRM, vol. 72, no. 3, pp. 707–717, 2014.
  • [10] J. Liu and D. Saloner, “Accelerated MRI with CIRcular Cartesian UnderSampling (CIRCUS): a variable density Cartesian sampling strategy for compressed sensing and parallel imaging.” QIMS, vol. 4, no. 1, pp. 57–67, 2014.
  • [11] M. Paquette et al., “Comparison of sampling strategies and sparsifying transforms to improve compressed sensing diffusion spectrum imaging,” MRM, vol. 73, no. 1, pp. 401–416, 2015.
  • [12] M. Usman and P. G. Batchelor, “Optimized Sampling Patterns for Practical Compressed MRI,” SampTA’09, 2009.
  • [13] B. Adcock et al., “Breaking the coherence barrier: a new theory for compressed sensing,” Forum of Mathematics, Sigma, vol. 5, p. e4, 2017.
  • [14] F. Krahmer and R. Ward, “Stable and Robust Sampling Strategies for Compressive Imaging,” IEEE TIP, vol. 23, no. 2, pp. 612–622, 2014.
  • [15] C. Poon, “On Cartesian line sampling with anisotropic total variation regularization,” 2016, arXiv:1602.02415.
  • [16] S. Ravishankar and Y. Bresler, “Adaptive sampling design for compressed sensing MRI,” in 2011 Annual International Conference of the IEEE EMBS.   IEEE, 2011, pp. 3751–3755.
  • [17] M. Seeger et al., “Optimization of k-space trajectories for compressed sensing by Bayesian experimental design,” MRM, vol. 63, no. 1, pp. 116–126, 2009.
  • [18] B. Gozcu et al., “Learning-Based Compressive MRI,” IEEE TMI, vol. 37, no. 6, pp. 1394–1406, 2018.
  • [19] B. Gözcü, T. Sanchez, and V. Cevher, “Rethinking Sampling in Parallel MRI: A Data-Driven Approach,” 2019.
  • [20] J. P. Haldar and D. Kim, “OEDIPUS: An Experiment Design Framework for Sparsity-Constrained MRI,” IEEE TMI, pp. 1–1, 2019.
  • [21] T. Weiss et al., “Learning Fast Magnetic Resonance Imaging,” 2019, arXiv:1905.09324.
  • [22] Y.-H. Li and V. Cevher, “Learning data triage: Linear decoding works for compressive MRI,” in 2016 IEEE ICASSP.   IEEE, 2016, pp. 4034–4038.
  • [23] J. C. De los Reyes, C.-B. Schönlieb, and T. Valkonen, “Bilevel Parameter Learning for Higher-Order Total Variation Regularisation Models,” JMIV, vol. 57, no. 1, pp. 1–25, 2017.
  • [24] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D, vol. 60, pp. 259–268, 1992.
  • [25] P. J. Huber, “Robust Estimation of a Location Parameter,” The Annals of Mathematical Statistics, vol. 35, no. 1, pp. 73–101, 1964.
  • [26] A. Chambolle and T. Pock, “A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging,” JMIV, vol. 40, no. 1, pp. 120–145, 2011.
  • [27] R. H. Byrd et al., “A Limited Memory Algorithm for Bound Constrained Optimization,” SISC, vol. 16, no. 5, pp. 1190–1208, 1995.
  • [28] C. Zhu et al., “Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization,” ACM Transactions on Mathematical Software, vol. 23, no. 4, pp. 550–560, 1997.
  • [29] S. Becker, “github.com/stephenbeckr/L-BFGS-B-C,” 2018.
  • [30] F. Knoll et al., “Second order total generalized variation (TGV) for MRI,” MRM, vol. 65, no. 2, pp. 480–491, 2011.
  • [31] M. Benning et al., “Phase reconstruction from velocity-encoded MRI measurements - A survey of sparsity-promoting variational approaches,” JMR, vol. 238, pp. 26–43, 2014.
  • [32] C. Boyer et al., “On the Generation of Sampling Schemes for Magnetic Resonance Imaging,” SIIMS, vol. 9, no. 4, pp. 2039–2072, 2016.
  • [33] C. Lazarus et al., “SPARKLING: variable-density k-space filling curves for accelerated -weighted MRI,” MRM, vol. 81, no. 6, pp. 3643–3661, 2019.
  • [34] Y. Chen, “Learning fast and effective image restoration models,” Ph.D. dissertation, Graz University of Technology, 2014.
  • [35] K. Hammernik et al., “Learning a variational network for reconstruction of accelerated MRI data,” MRM, vol. 79, no. 6, pp. 3055–3071, 2018.
  • [36] S. Vaiter et al.

    , “The degrees of freedom of partly smooth regularizers,”

    AISM, vol. 69, no. 4, pp. 791–832, aug 2017.
  • [37] S. Dempe and A. B. Zemkoho, “The Generalized Mangasarian-Fromowitz Constraint Qualification and Optimality Conditions for Bilevel Programs,” JOTA, vol. 148, no. 1, pp. 46–68, jan 2011.
  • [38] A. Chambolle, “An Algorithm for Total Variation Minimization and Applications,” JMIV, vol. 20, pp. 89–97, 2004.