The discovery of the theory of compressed sensing brought the realisation that many inverse problems can be solved even when measurements are "incomplete". This is particularly interesting in magnetic resonance imaging (MRI), where long acquisition times can limit its use. In this work, we consider the problem of learning a sparse sampling pattern that can be used to optimally balance acquisition time versus quality of the reconstructed image. We use a supervised learning approach, making the assumption that our training data is representative enough of new data acquisitions. We demonstrate that this is indeed the case, even if the training data consists of just 5 training pairs of measurements and ground-truth images; with a training set of brain images of size 192 by 192, for instance, one of the learned patterns samples only 32 a test set of similar images. The proposed framework is general enough to learn arbitrary sampling patterns, including common patterns such as Cartesian, spiral and radial sampling.READ FULL TEXT VIEW PDF
Magnetic resonance imaging (MRI) is an essential medical tool with inher...
Acquisition of Magnetic Resonance Imaging (MRI) scans can be accelerated...
Magnetic Resonance Imaging (MRI) has long been considered to be among th...
Novel data acquisition schemes have been an emerging need for scanning
In a broad range of fields it may be desirable to reuse a supervised
Sparse sampling schemes have the potential to dramatically reduce image
In various imaging problems, we only have access to compressed measureme...
Learning sampling pattern with L-BFGS-B algorithm and reconstructions with PDHG.
The field of compressed sensing is founded on the realisation that in inverse problems it is often possible to recover signals from incomplete measurements. To do so, the inherent structure of signals and images is exploited. Finding a sparse representation for the unknown signal reduces the number of unknowns and consequently the number of measurements required for reconstruction. This is of great interest in many applications, where external reasons (such as cost or time constraints) typically imply that one should take as few measurements as are required to obtain an adequate reconstruction. A specific example of such an application is magnetic resonance imaging (MRI). In MRI, measurements are modelled as samples of the Fourier transform (points in so-called k-space) of the signal that is to be recovered and taking measurements is a time-intensive procedure. Keeping acquisition times short is important to ensure patient comfort and to mitigate motion artefacts, and it increases patient throughput, thus making MRI effectively cheaper. Hence, MRI is a natural candidate for the application of compressed sensing methodology. While the original theoretical results of compressed sensing (as in, in which exact recovery results are proven for uniform random sampling strategies) do not apply well to MRI, three underlying principles were identified that enable the success of compressed sensing , : 1) sparsity or compressibility of the signal to be recovered (in some sparsifying transform, such as a wavelet transform), 2) incoherent measurements (with respect to the aforementioned sparsifying transform) and 3) a nonlinear reconstruction algorithm that takes advantage of the sparsity structure in the true signal. The nonlinear reconstruction algorithm often takes the form of a variational regularisation problem:
with the subsampling operator, the Fourier transform, the subsampled measurements, a regularisation functional that encourages the reconstruction to have a sparsity structure and the regularisation parameter that controls the trade-off between the fit to measurements and fit to structure imposed by . Many previous efforts made towards accelerating MRI have focused on improving how these aspects are treated. The reconstruction algorithm can be changed to more accurately reflect the true structure of the signal: the typical convex reconstruction problem can be replaced by a dictionary learning approach ; in multi-contrast imaging, structural information obtained from one contrast can be used to inform a regularisation functional to use in the other contrasts ; and in dynamic MRI additional low rank structure can be exploited to improve reconstruction quality [6, 7].
It is well known that sampling uniformly at random in k-space (as the original compressed sensing theory suggests ) does not work well in practice; using a variable density sampling pattern greatly improves reconstruction quality , see Figure 1. In the works [8, 9, 10, 11, 12], subsampling strategies are studied that can be used in practice. On the theoretical side, the compressed sensing assumptions phave been refined to prove bounds on reconstruction errors for variable density sampling [13, 14] and exact recovery results for Cartesian line sampling with anisotropic total variation regularisation .
The sampling pattern can be optimised in a given setting to improve reconstruction quality. There are works on fine-tuning sampling patterns [16, 17], greedy algorithms to pick a suitable pattern for a given reconstruction method [18, 19, 20]
, jointly learning a Cartesian line pattern and neural network reconstruction algorithm, and optimal patterns for zero-filling reconstructions can be computed from a training set with little computational effort . We consider the problem of learning an optimal sparse sampling pattern from scratch for a given variational reconstruction method and class of images by solving a bilevel optimisation problem. A similar approach has been used to learn regularisation parameters for variational regularisation models , among other things.
In this work, we propose a novel bilevel learning approach to learn sparse sampling patterns for MRI. We do this within a supervised learning framework, using training sets of ground truth images with the corresponding measurements.
Our approach can accommodate arbitrary sampling patterns and sampling densities. We demonstrate that the parametrisation of the sampling pattern can be chosen to learn a pattern consisting of a scattered set of points as well as Cartesian lines, but other parametrisations can also be designed that result in radial or spiral sampling, for instance. By using a sparsity promoting penalty on the sampling pattern, we can also vary the sampling rates of our learned patterns.
Besides this, it is also possible to use a wide variety of variational reconstruction algorithms, that is various choices of regularisation in Problem (1), and we can simultaneously learn the sampling pattern and the optimal regularisation parameter for reconstruction. This forgoes the need to separately tune the parameters of the reconstruction method.
Our optimal sampling patterns confirm empirically the validity of variable density sampling patterns: the optimal patterns tend to sample more densely around the low frequencies and more sparsely at high frequencies. We investigate the dependence of the shape of the sampling density on the sampling rate and the choice of regularisation functional .
By focusing on a particular region within the body, our approach can be used with very small training sets to learn optimal patterns, that nevertheless generalise well to unseen MRI data. We demonstrate this on a set of brain images; indeed, in this setting we find that a training set of just five image, measurement pairs is sufficient.
In the bilevel learning framework, the free parameters of a variational regularisation method are learned to optimise a given measure of reconstruction quality. We assume that we are given a variational regularisation method to perform the reconstruction, of a form such as Problem (1). Furthermore, we assume that we are given a training set of pairs of clean images and fully sampled noisy k-space data . With these ingredients and a choice of parametrisation of the sampling pattern and regularisation parameter , we set up a bilevel optimisation problem that can be solved to learn the optimal parameters:
We will refer to Problem (2) as the upper level problem and will call the variational regularisation problems that make up its constraints the lower level problems. Each
is a loss function that quantifies the discrepancy between the reconstruction from subsampled measurements,, and the corresponding ground truth and is a penalty on the sampling pattern that encourages its sparsity. Hence, the objective function in Problem (2) is a penalised empirical loss function, the minimiser of which trades off the reconstruction quality against the sparsity of the sampling pattern in an optimal manner. As we show in Section II-C2, it is possible to differentiate the solution maps in our setting, so that Problem (2) is amenable to treatment by first order optimisation methods.
In this section, we describe in more detail the various aspects that make up Problem (2) in our setting, starting with the lower level problems, followed by the upper level problem, after which we describe the methods that can be applied to solve the problem.
The lower level problems in Problem (2) are variational regularisation problems. In this section, we specify the class of variational regularisation problems that will be considered. In our application, an image of resolution
is modeled as a nonnegative vector inby concatenating its columns. The subsampled measurements corresponding to a given image are modeled as . Here takes the real part of a complex signal componentwise, is the unitary discrete Fourier transform,
is the sampling operator (with and being the standard basis of ), which selects the points in k-space that are included in the measurements (and can be used as a weight on those measurements), and
is complex Gaussian white noise.
The variational regularisation approach to estimating the true imagefrom measurements proceeds by solving an optimisation problem that balances fitting the measurements with fitting prior knowledge that is available about the image. In this work we consider problems of the following form:
where is a collection of linear operators , , , and for some convex . Note that this includes a number of common regularisation functionals:
and (with discretisations of the partial differential operators along orthogonal directions) gives the isotropic total variation as regularisation term; its use in variational regularisation problems has been studied since ,
and for some sparsifying transform , such as a wavelet or shearlet transform, gives a sparsity penalty on the transform coefficients of the image as regularisation term.
In the above form, these problems do not satisfy the regularity conditions required to apply first order optimisation methods to Problem (2). To get around this issue, we slightly modify the objective function in Problem (4): the nonnegativity constraint is removed and emulated by adding a convex barrier function, , and a strongly convex penalty is added. To be exact, we let
for some large . Furthermore, is assumed to satisfy the following conditions: 1) is increasing, 2) is and 3) as . Note that these conditions exclude , but can be chosen to approximate the absolute value function: for instance, take for some , where
With these modifications to the objective function, we define the lower level energy functional , given fully sampled training measurements , as follows:
with and defined in the next section.
In the upper level problem, we parametrise the sampling pattern and the lower level regularisation parameter by a vector : we let and . This parametrisation allows us to learn a sampling pattern of scattered points in k-space, though it is worth noting that the parametrisation can be generalised to constrain the learned pattern. To prevent the notation from becoming overly cumbersome, we do not consider this generalisation here, but refer the reader to Section -A in the Appendix for the details.
With this parametrisation, a natural choice of the sparsity penalty is as follows:
with a parameter that decides how reconstruction quality is traded off against sparsity of the sampling pattern. Besides encouraging a sparse sampling pattern, this penalty encourages the weights in the sampling pattern to take either the value 0 or 1. For the loss function , we choose , but it is straightforward to replace this by any other smooth loss function. For instance, if one is interested in optimising the quality of the recovered edges one could use the smoothed total variation as a loss function: , with as defined in Section II-A.
As was mentioned in Section II, first order optimisation methods can be used to solve problems like Problem (2), provided that the solution maps of the lower level problems, , can be computed and can be differentiated. In this section we describe the approach taken to computing the solution maps and their derivatives and then describe how these steps are combined to apply first order optimisation methods to Problem (2).
In this and the next subsection, we will consider the lower level problem for a fixed , so for the sake of notational clarity, we will drop the subscript and write . The lower level energy functional is convex in and takes the saddle-point structure that is used in the primal-dual hybrid gradient algorithm (PDHG) of Chambolle and Pock . Indeed, we can write
with , , where , (we let and and think of as a real vector space) and
With this splitting, the parameter choices from Section -C2 in the Appendix and an arbitrary initialisation (we can take it to be the zero-filling reconstruction, or warm start the solver) the following iterative algorithm solves the lower level problem with a linear convergence rate:
In the previous subsection, we saw that we can compute the solution maps of the lower level problems. To apply first order optimisation methods to Problem (2), we still need to be able to differentiate these solution maps. To this end, note that the solution map of can be defined equivalently by its first order optimality condition:
and that is in our setting. To ease notation, let us write in this subsection. Since is strongly convex in , its Hessian is positive definite and hence invertible. As a consequence, the implicit function theorem tells us that the solution map is and its derivative satisfies the linear system below:
In fact, we do not need the full derivative of the solution map in our application, but just the gradient of a scalar function of the solution map, namely for some ground truth
. The chain rule and the formula above give us a formula for this gradient:
It is worth noting that this expression for the gradient can also be derived using the Lagrangian formulation of Problem (2), through the adjoint equation, and this is the way in which it is usually derived when an optimal control perspective is taken . To implement this formula in practice, we do not compute the Hessian matrix of and invert it exactly (since the Hessian is very large; it has as many rows and columns as the images we are dealing with have pixels). Instead, we emphasise that the Hessian is symmetric positive definite, so that it is suitable to solve the linear system with an iterative solver such as the conjugate gradient method. For this, we just need to compute the action of the Hessian, for which we can give explicit expressions. These computations have been done in Section -D of the appendix. The expressions derived in the appendix for and can be implemented efficiently in practice and are then used in the conjugate gradient method (CG) to compute the desired gradients.
Recall that the problem that we are trying to solve (Problem (2)) takes the following form:
By the previous sections, we know that the objective function of this problem is , and the constraints that we impose on the parameters form a box constraint, so the optimisation problem that we consider is amenable to treatment by the L-BFGS-B algorithm [27, 28]. Since the objective function splits as a sum over the training set, it is completely straightforward to parallelise the computations of the solution maps and desired gradients over the training set (to emphasise this we refer to the loop over the training set as a parfor loop rather than just a for loop):
Our methods have been implemented in MATLAB, using a publicly available version of the L-BFGS-B algorithm . Since the learning problem is non-convex, care must be taken with the choice of initialisation. We initialise the learning with a full sampling pattern and the corresponding optimal regularisation parameter. This optimal regularisation parameter is learned using our method, keeping the sampling pattern fixed to fully sample k-space; the optimal regularisation parameter is typically found in less than 10 iterations of the L-BFGS-B algorithm.
In this section, we have experiments in which we look at
- varying the sparsity parameter to control the sparsity of the learned pattern,
- learning Cartesian line patterns with our method,
- using different lower level regularisations,
- varying the size of the training set,
- comparing the learned patterns to other sampling patterns,
- learning sampling patterns for high resolution imaging.
Except in the subsection in which we consider different lower level regularisations, we use a total variation type regularisation in all experiments. That is, for a small and in the lower level problem. We refer the reader to the supporting document for figures that may be of interest, but are not crucial to the understanding of the results.
The brain images are of size , taken as slices from 7 separate 3D scans. The corresponding noisy measurements are simulated by taking discrete Fourier transforms of these slices and adding Gaussian white noise. The scans were acquired on a Siemens PrismaFit scanner. For all scans except one, TE = 2.97 ms, TR = 2300 ms and the Inversion Time was 1100 ms. For the other scan, TE = 2.98 ms, TR = 2300 ms and the Inversion Time was 900 ms.
The high resolution images are of size , taken as slices from a 3D scan of a test phantom. Again, the noisy measurements are simulated by taking discrete Fourier transforms of these slices and adding Gaussian white noise. The scan was acquired on a GE 3T scanner using spoiled gradient recalled acquisition with TE = 12 ms and TR = 37 ms.
Learning with a training set of 7 brain images, we consider the effect of varying the sparsity parameter . Increasing this parameter tends to make the learned patterns sparser, although we do see a slight deviation from this monotone behaviour for large . Figure 2 shows examples of the learned patterns and reconstructions on a test image and in Figure 3, we see the performance of the learned patterns, evaluated on a test set of 70 brain images.
We use a Gaussian kernel density estimator to estimate a sampling distribution corresponding to each pattern. That is, we convolve the learned pattern with a Gaussian filter with a small bandwidth and normalise the resulting image to sum to 1. The results of doing this can be seen in Figure4: we see that the distributions become more peaked strongly around the origin as the patterns become sparser.
As described in Section -A of the Appendix, we can restrict the learned pattern to sample along Cartesian lines. Similarly to the case of learning scattered points in k-space, we have some control over the sparsity of the learned pattern using the parameter . As is seen in Figure 5 the sparsity penalty does not seem to work as well in this situation in encouraging the weights of the pattern to be binary.
Instead of the TV type regularisation, we use a sparsity penalty on the wavelet coefficients of the image. We accomplish this by choosing and for an orthogonal wavelet transform (we use Daubechies 4 wavelets). This gives results that are, at a first glance, very similar to those for the total variation regularisation. On closer inspection, when comparing two patterns from the TV and wavelet regularisation with the same sparsity, we find that the pattern for the wavelet regularisation is more strongly peaked around the origin. We can see this in Figures 6 and 7, where we have estimated the sampling distributions for two learned patterns with TV and wavelet regularisation, both of which sample 27% of k-space.
We use the squared seminorm as lower level regularisation, if we take and in the lower level problem. With this choice, we find that the learned equals and that the learned pattern does not take on just binary values: the weights of the learned pattern are lower at higher frequencies, as can be seen in Figure 8.
Taking no regularisation in the lower level problem, i.e. and fixing , we find essentially the same results as when we considered the regularisation: the weights in the learned pattern show a decay away from the origin as in Figure 8.
We compare the performance of the learned patterns with the different lower level regularisations. In Table I, we list the performance of three of these patterns on the test set of brain images, each pattern sampling roughly the same proportion of k-space.
The TV regularisation is seen to outperform wavelet regularisation, which in turn outperforms regularisation. Figure 9 shows the three patterns that we are comparing and the corresponding reconstructions on a test image. We note that this method can easily be extended to other regularisation functions (such as the Total Generalised Variation) that have been used in the context of MRI [30, 31].
To investigate the effect of the size of the training set, we ran our method on different training sets of slices of brain images, of sizes 1, 3, 5, 10, 20, 30 to obtain sampling patterns of roughly the same sparsity. As we see in Figure 10, the learned patterns perform similarly well for training sets of size 5 or larger.
We compare the performance of the learned patterns to that of other sampling patterns. Here we consider a radial line sampling pattern and a low-pass sampling pattern, both with a learned optimal regularisation parameter. Table II shows the performance of the learned patterns and the other patterns on the test set of brain images.
|Learned free pattern|
|Learned Cartesian line pattern|
|Radial line pattern|
The learned free pattern is found to perform better than the others. Figure 11 shows the patterns used in the comparison and the corresponding reconstructions on a test image. The arrows in the zoomed views point to a black dot that is better resolved using the learned patterns.
Up to this point, the experiments have been run on brain images of size . For this experiment, we used a training set of 5 slices taken from a high resolution scan of a phantom. In Figure 12, we consider another slice from this scan to see how well the learned pattern performs. We see fine scale details that are resolved better by sampling according to the learned pattern than using a low-pass sampling pattern.
All our experiments were carried out on 2D images. With minor modifications, the proposed method can be applied to learn sampling patterns for 3D MRI. To accelerate MRI in practice, it is necessary to take into account the physical constraints imposed on sampling. The scattered pattern of points learned by our method is not immediately useful for accelerating 2D MRI, but it can be used for accelerated 3D MRI. If our method is extended to 3D MRI, the problem of efficiently sampling along these patterns in practice comes up again. In , a method is proposed (which has been implemented in practice at NeuroSpin ) that can be used to generate practical sampling strategies from a given target density. We can estimate a target density from our learned pattern, and use it as an input to this method.
Besides these extensions to our method, one can consider more general lower level regularisation functionals and allow for more flexibility to learn a custom regulariser as was done for denoising in , or unroll the lower level algorithm and use an approach similar to that of the variational network .
As we saw in Table II, radial line patterns perform well even without being tuned using the training data. With an appropriate choice of the parametrisation, our method can be used to learn optimal radial line patterns, or other physically feasible optimal sampling patterns.
In our framework, we made smoothness assumptions on the lower level problems in order to differentiate their solution maps. Similar results can be derived assuming partial smoothness of the regularisation functionals , which covers total variation regularisation and the wavelet regularisation without needing to smooth them. The non-smooth lower level problems will be harder to solve, but it might be possible to deal directly with non-smooth lower level problems using this approach. Alternatively, one could consider optimality conditions for bilevel optimisation problems with non-smooth lower level problems  and attempt to solve the optimality conditions.
Despite being a smooth optimisation problem, the learning procedure is computationally intensive, since the lower level problems have to be solved to high accuracy in each iteration; we found that it can take 10 hours to learn a sampling pattern on images of size on our computers with 12-core Intel Xeon E5 CPUs. These issues are alleviated by warm-starting the lower level solvers and it may be possible to do something similar with the iterative solver used to compute gradients. The learning problem is non-convex, which might explain some of the issues that were found: the relationship between the sparsity of the learned pattern and was observed to be not quite monotone, and with regularisation, the method learns an inferior pattern and non-zero regularisation parameter if is chosen too large. It may be possible to circumvent these issues by replacing L-BFGS-B by an optimisation method that is better suited to the learning problem.
We have proposed a supervised learning approach to learn high quality sampling patterns for accelerated MRI for a given variational reconstruction method. We have demonstrated that this approach is highly flexible, allowing for a wide variety of regularisation functionals to be used and allowing constraints to be imposed on the learned sampling patterns. Furthermore, we have shown that the method can be used successfully with small training sets. The learned patterns perform favourably compared to standard sampling patterns such as low-pass patterns and radial line patterns, both quantitatively (measured by SSIM and PSNR on a test set) and qualitatively (by comparing the resolution of fine scale details).
This work shows that it is feasible to learn sampling patterns by applying continuous optimisation methods to a bilevel optimisation problem and suggests multiple ways in which this methodology can be extended to work in different settings.
CBS acknowledges support from the Leverhulme Trust project on Breaking the non-convexity barrier, the Philip Leverhulme Prize, the EPSRC grant Nr. EP/M00483X/1, the EPSRC Centre Nr. EP/N014588/1, the European Union Horizon 2020 research and innovation programmes under the Marie Skłodowska-Curie grant agreement No 777826 NoMADS and No 691070 CHiPS and the Alan Turing Institute. GM acknowledges support from the LMS grant URB 16-40. MB acknowledges support from the Leverhulme Trust Early Career Fellowship ’Learning from mistakes: a supervised feedback-loop for imaging applications’. FS and CBS acknowledge support from the Cantab Capital Institute for the Mathematics of Information.
, “The degrees of freedom of partly smooth regularizers,”AISM, vol. 69, no. 4, pp. 791–832, aug 2017.