Log In Sign Up

Self-Supervised Deep Active Accelerated MRI

We propose to simultaneously learn to sample and reconstruct magnetic resonance images (MRI) to maximize the reconstruction quality given a limited sample budget, in a self-supervised setup. Unlike existing deep methods that focus only on reconstructing given data, thus being passive, we go beyond the current state of the art by considering both the data acquisition and the reconstruction process within a single deep-learning framework. As our network learns to acquire data, the network is active in nature. In order to do so, we simultaneously train two neural networks, one dedicated to reconstruction and the other to progressive sampling, each with an automatically generated supervision signal that links them together. The two supervision signals are created through Monte Carlo tree search (MCTS). MCTS returns a better sampling pattern than what the current sampling network can give and, thus, a better final reconstruction. The sampling network is trained to mimic the MCTS results using the previous sampling network, thus being enhanced. The reconstruction network is trained to give the highest reconstruction quality, given the MCTS sampling pattern. Through this framework, we are able to train the two networks without providing any direct supervision on sampling.


page 1

page 2

page 4

page 5

page 10

page 11

page 13


Learning Optimal K-space Acquisition and Reconstruction using Physics-Informed Neural Networks

The inherent slow imaging speed of Magnetic Resonance Image (MRI) has sp...

SANTIS: Sampling-Augmented Neural neTwork with Incoherent Structure for MR image reconstruction

Deep learning holds great promise in the reconstruction of undersampled ...

Non-Learning based Deep Parallel MRI Reconstruction (NLDpMRI)

Fast data acquisition in Magnetic Resonance Imaging (MRI) is vastly in d...

Self-supervised deep learning MRI reconstruction with Noisier2Noise

In recent years, there has been attention on leveraging the statistical ...

DSFormer: A Dual-domain Self-supervised Transformer for Accelerated Multi-contrast MRI Reconstruction

Multi-contrast MRI (MC-MRI) captures multiple complementary imaging moda...

Efficient Reflectance Capture with a Deep Gated Mixture-of-Experts

We present a novel framework to efficiently acquire near-planar anisotro...

1 Introduction

Magnetic resonance imaging (MRI) is an important noninvasive way to investigate the human body, for example to diagnose cancer. To obtain data, one performs scans in the frequency domain which is referred to as k-space in the biomedical-imaging literature [1]. Once this k-space data are acquired, they are then converted into the spatial domain, which provides images that physicians can use for diagnosis. When performing these scans, it is important that one makes the most out of each scan, as patients can remain still and hold their breadth for only a limited amount of time.

Fig. 1: Reconstruction using (top) the method of variable density sampling (VDS) [2] , (middle) our method, compared to (bottom) the ground-truth image. (left) sampling patterns, (middle) reconstruction result, (right) zoom-in of the region denoted with the rectangle in the middle. By learning to jointly sample and reconstruct, our method gives a PSNR of 33.52 dB, to be compared to 27.73 dB in [2].

To obtain high-quality MRI within a limited sampling budget, researchers have investigated compressed-sensing-based methods [1, 3]. This problem, often referred to as accelerated MRI, is traditionally tackled by exploiting the fact that MRI images, even when sampled fully, are sparse in nature once transformed into, for example, wavelets [1, 4], and by considering the statistics of the target data [5]. Recently, as in many other areas, deep-learning-based methods have shown promising results [6, 7, 8, 9]. However, these methods mainly focus on the reconstruction part of the problem and assume a fixed sampling pattern. In other words, they do not consider how data should be acquired and are therefore passive in nature.

As the acquisition pattern plays a critical role in the final reconstruction quality, researchers have also focused on optimizing the sampling pattern in k-space. For example, compressive-sensing-based techniques have shown promising outcomes with variable-density sampling [10]. The work in [11] goes beyond variable-density patterns by taking advantage of the statistics of the data. However, when learning or deriving these sampling patterns, existing methods consider the reconstruction process as fixed and, therefore, disregard the intertwined nature of the problem.

A possible reason why existing methods do not consider the two problems together is that it is not straightforward to incorporate the sampling design in an optimization framework. Sampling is a categorical variable and is not differentiable, which prevents the use of simple gradient-based methods. An exhaustive search is inaccessible due to the size of as the solution space. Greedy methods like

[11] are not viable either because they would require one to investigate too many cases, considering that both sampling and reconstruction are optimized jointly.

In this paper, we incorporate data acquisition (sampling) into our learning framework and to learn to jointly sample and reconstruct. By doing so, our method learns to actively obtain data for high-quality reconstruction and is able to go beyond what can be done by doing either separately. Specifically, we propose to simultaneously train two neural networks, each dedicated to reconstruction and to sampling. To incorporate the sampling process in our training setup, we take inspiration from [12]. We learn a sampling network that determines the next sampling position, in other words, we learn a progressive sampler. The training supervision for this network is self-generated through Monte Carlo tree search (MCTS) [13, 14], where the search is guided by the current sampling network and results in a sampling pattern that is better than what is possible with the sampling network alone. However, unlike [12], where a network is used in place of a rollout, we perform rollout as in the original MCTS and use the peak signal to noise ratio (PSNR) given by the reconstruction network linked with the performance of the reconstruction network. In case of the reconstruction network, we train it with the signal sampled through MCTS as input and the ground-truth fully sampled image as the desired output. At test time, the policy network progressively generates the sampling pattern by looking at the current reconstruction of the network. As shown in Fig. 1, this allows our network to outperform the state of the art. To the best of our knowledge, our work is the first successful attempt at learning the accelerated MRI pipeline as a whole.

The remainder of the paper is as follows: We first discuss related works in Section 2. We then formalize the problem in Section 3. In Section 4, we explain our method, and in Section 5 we provide implementation details. We report experimental results in Section 6 and conclude in Section 7.

2 Related Works

As shown in Fig. 2, for accelerated MRI, one samples k-space partially, but then reconstructs the images to higher resolution. Since the pioneering work of [1], a large body of works exists for accelerated MRI. Here, we first present the compressive sensing for accelerated MRI, then works related to learning optimal sampling patterns, and, finally, works that focus on the reconstruction phase of the pipeline.

2.1 Compressive Sensing in Accelerated MRI

In [1], compressed sensing [15, 16] is applied to reconstruct sub-Nyquist measurements. The premise behind the reconstruction process is that typical MRI images have sparse coefficients in some transform domain, such as wavelets [1], and can therefore be iteratively reconstructed with a sparsity constraint as prior [15]. By doing so, the authors achieve a faster acquisition time and an alias-free image reconstruction under random sampling. Specifically, in [1]

, random variable-density sampling is used to sample the low-frequency components more densely and improve the image quality.

Since its first introduction, several variable-density sampling methods have been proposed. In [17], a Poisson-disc distribution is exploited to reduce correlated effects caused by closely located samples while retaining a uniform distance between samples. In [18], continuous trajectories have been proposed. In [19], the authors argue that taking into account both shared and non-shared components when sparsifying is important when reconstructing.

Those sampling and reconstruction methods have all shown superior quality over [1]. However, they all suffer from the fact that the optimal choice of the variable density depends highly on the target dataset. Therefore, manual intervention per dataset and, sometimes, per image, is necessary if one is to obtain good reconstruction quality, making it less practical.

Fig. 2: Accelerated MRI. (a) Fully sampled MRI collects all k-space lines to obtain a target image. (b) Accelerated MRI acquires fewer k-space lines and then reconstructs downsampled acquisitions from an aliased image. Compressed-sensing (CS) reconstructions rely on the sparsity of the image domain.

2.2 Learned Sampling patterns for Accelerated MRI

Beyond handcrafted sampling patterns, learning-based methods have shown promising results. One of the very first method to do so successfully is [20], where the authors define finite non-overlapped cells with a certain width in k-space and then switch their locations according to reconstruction performances in a greedy fashion based on a sorted list from infinite-p norm. To evaluate the reconstruction, they rely on a dictionary-based reconstruction  [5]. Their method performs better than previous compressed-sensing methods but suffers from two drawbacks. One is that the optimal choice of p for the infinite-p norm depends highly on the dataset. The other is that there is no guarantee that the pretrained dictionary is applicable to the data in question.

Instead of an infinite-p norm and pretrained dictionaries, researchers have also considered the benefit of working directly with the k-space power spectrum [21, 22]. A standard assumption is that the k-space components that hold more power are more important than the ones with less power. It leads to a selection of the sampling pattern based on the k-space energy distribution from the reference. Again, these methods suffer from the fact that there is no guarantee that the power spectrum of a reference would be similar to testset, as the selection is performed off-line. Furthermore, as we show later, a simple sampling of the high-energy signals does not necessarily yield better reconstructions.

As another direction of research, the relationship between the forward model matrix and trusted sampling positions [10, 23] has been investigated. A performance guarantee based on basis components of the forward model has been derived. Still, these approaches have a limited consideration on signal energy distribution of each dataset.

Recently, methods based on statistical theory have gained interest. The authors of [11] use the PSNR or the structural similarity index (SSIM) [24]

, which are typical image-quality measures, and greedily search for the sampling pattern that maximizes either one. Their method is greedy and progressive. It looks at how the measure of interest evolves when a new sample position is added to the sampling pool. Statistical error bounds for the optimal estimation of sampling patterns are provided in the form of theorems. They require the reconstruction pipeline to be fixed throughout the greedy selection process; but there is no guarantee that the greedy selection would lead to an optimal solution.

2.3 Neural Networks for Accelerated MRI

Recently, as in many other areas, deep learning has become a popular approach for accelerated MRI reconstruction. Typically, convolutional neural networks (CNN) are trained in a fully supervised manner to regress to high-resolution reconstructions [6, 7, 25, 26]. In [7], multi-resolution CNN with residual learning enhances the quality of accelerated MRI by encapsulating physical models before input is fed. In [6] and [26], cascaded chains of CNN with data consistency constraints yield performance gain over dictionary-based reconstructions [5]. In [25], inherently complex MRI are split into magnitude and phase components. For each component, a distinct network is trained to remove distortions from acceleration. The author of [25] have shown promising results and achieved higher reconstruction quality over traditional methods. Beyond simple regression, generative adversarial networks have also been recently exploited to avoid over-fitting [27] .

While effective, existing methods still rely on fixed sampling patterns for data acquisition. As we illustrate in Section 6 through experiments, this limits the capabilities of these advanced reconstruction methods.

3 Problem Formulation

Before we detail our method, we first formalize the accelerated MRI problem. Throughout the formalization, for simplicity, we assume a finite, discrete, and complete signal model. We write the vectorized full-resolution signal in the image domain as

, where is the number of pixels in the image.

We denote the discrete Fourier transform by the matrix

. For accelerated MRI, not all frequencies are sampled. Denoting the sampling process by the Boolean matrix , where is the number of samples, the raw measurements satisfy


Given Eq. (1), we want to obtain an accurate estimate of by using the observation . Formally, if we denote the reconstruction process as , where is the vector of parameters that define the reconstruction process, and a quality measure, for example the PSNR, Euclidean distance, or SSIM [24], as , then the problem of learning a system for accelerated MRI can be formulated as the problem of finding and such that


where denotes expectation over . This formulation involves both and . It is therefore a problem that involves both sampling and reconstruction. What makes it difficult is that

is Boolean, which leads to a combinatorial optimization.

Existing works can also be understood in the formulation Eq. (2). In the traditional compressive-sensing setup [1], would be a fixed realization of a random-variable density. For the reconstruction part driven by , it would be a nonparametric process that involves the solution of a sparse optimization problem. Formally,


where is a sparsifying transform, for example the wavelet transform or the finite differences corresponding to total variation (TV). In the case of deep-learning-based methods [6, 7, 25], would still remain fixed and would be a deep network with now being the parameters of the network. Thus, one would only optimize for in Eq. (2). Conversely, in works related to finding the optimal sampling pattern [11, 20] represented by the optimal sampling matrix , the reconstruction process remains fixed and, therefore, one optimizes only for .

Unlike the existing methods, we optimize jointly and . This allows our method to outperform what can be done by training them separately.

4 Method

Fig. 3: Overall framework of our method. We train two deep neural networks. One learns to reconstruct a high-resolution MRI, and the other learns to estimate the policy for determining the position of the next sample. We progressively sample, with SampleNet, based on the reconstructed outcome of ReconNet using collected data.

We first provide an overview of the architecture and the training framework. We further detail how self-supervision signals are obtained through MCTS.

4.1 Overall Framework

In order to learn to jointly sample and reconstruct, as shown in Fig. 3, we learn two deep networks that are tied together. One network learns to reconstruct a high-resolution image given the sampled data. The other learns to estimate the position of the next sample, given the previous reconstruction result. We refer to the former as ReconNet and to the latter as SampleNet.

4.1.1 ReconNet

In more details, in case of ReconNet, as in other recent deep-learning-based methods for reconstruction [7, 8], we operate in the image domain instead of the k-space domain, starting from the backprojection .If we denote the reconstruction network as , then the reconstruction achieved by ReconNet is


A is a matrix of ones and zeros that represents sampling, the multiplication with

undoes the sampling process by padding the unobserved signal with zeros.

4.1.2 SampleNet

For SampleNet, we opt for a progressive-sampling strategy where we draw one sample at a time using all data acquired up to the instant we sample. In other words, the learning is such that SampleNet outputs a probabilistic policy for determining the next sampling point given the current reconstruction. SampleNet would be called a policy network in reinforcement-learning literature [12, 28]. At sampling time , if we denote SampleNet with parameter and the probabilistic policy as , then, using Eq. (4), we write that


We can then write a recursive formula for the sampling pattern at time as


where denotes concatenation in column direction between and , and is a Boolean vector with all values except for a single element that is at .

With Eqs. (4), (5), and (6), we can now rewrite the learning problem in Eq. (2) as an optimization problem that involves the training of two networks. We write that


The expectation is now also on , as we are performing progressive sampling, and the two networks should also be able to deal with various sampling times.

This formulation, however, cannot be solved with conventional gradient-based methods because Eq. (7) is non-differentiable. Instead, we propose to train the two networks in an alternative way through self-supervision.

4.2 Self-Supervised Learning

Fig. 4:

Our training self-supervised learning framework. At each sampling round, MCTS returns a policy distribution using both the reconstruction network (ReconNet) and the sampling network (SampleNet). A sample position is chosen from this distribution and sampled. We then proceed to the next sampling round. A single search tree is kept throughout the entire sampling process. Afterwards, we train the ReconNet to reconstruct high-quality signals using these sampling patterns and data, and SampleNet to emulate the final outcome of MCTS.

To train ReconNet and SampleNet, we draw inspiration from AlphaGo [12, 28]. We propose to train our two networks with their own separate objectives, which will eventually lead to satisfying the original objective. Conceptually, we train ReconNet so that it learns to improve on a given sampling pattern, and also learn SampleNet so that it learns to provide better sampling patterns given the reconstruction, which will then result in consistently better final reconstruction quality.

In more detail, as shown in Fig. 4, we rely on MCTS [13, 14] for the improvement of these networks through the optimization rounds. We explain in detail how we use MCTS in Section 4.3. For now, assume that MCTS is a black-box component that gives a sample policy that improves on , the policy from SampleNet with current network parameters and , where is the round of optimization. Formally, we write that


Then, with MCTS, we create sampling patterns recursively by performing


which leads to better reconstruction results


With , , , and , we train both ReconNet and SampleNet.

For ReconNet, we train to obtain high-quality reconstructions, given the sampling patterns from MCTS. In other words, we simply replace with in Eq. (7). By doing so, we remove the dependency of , leading to the per-round optimization


where is the starting point of the optimization. This can now be trained with a gradient-based solver such as ADAM [29]. The standard PSNR measure in Eq. (14) is used for .

We train SampleNet to emulate the outcome of MCTS. Thus, with Eq. (5), and denoting cross entropy as , we write that


where is the initial value of for the optimization. This objective can also be optimized through gradient descent.

In practice, to avoid overfitting, we optimize Eq. (11) and Eq. (12) for only steps per round. We also apply experience replay, by creating the training batch from randomly selected past examples. This further prevents overfitting and ensures smooth optimization [12, 28]. A summary of the process is provided in Algorithm 1, with subscripts for the optimization round and the reconstruction episode added for clarity.

1:: ground-truth data, : experience memory, : initial sampling pattern, : number of optimization rounds, : number of reconstruction episode per optimization round, : sampling budget, and : number of optimization steps per round
2:function SelfTrain(, , )
3:     for  to  do
4:         for  to  do
5:              // Initialize
7:               Initialize search tree
8:              // Build self-supervision through MCTS
9:              for  to  do
10:                   (from Eq. (8))
11:                   (from Eq. (9))
13:                                            (from Eq. (10))
15:              end for
16:         end for
17:         // Train
18:          Random examples from
19:          Optimize K times with (from Eq. (11))
20:          Optimize K times with (from Eq. (12))
21:     end for
22:     return ,
23:end function
Algorithm 1 Self-supervised learning with Monte Carlo tree search and experience replay.

4.3 Monte Carlo Tree Search for Sampling

Fig. 5: Our Monte Carlo tree search. Each node in the tree denotes a sample pattern, while a move down the tree involves sampling a new position. We traverse the tree until a leaf node has been reached, where we simulate sampling with the sampling network (SampleNet) until some sampling budget has been reached. We then use the actual reconstruction performance as the reward, which is then backed up to all parent nodes. When performing the backup, we save the average and the maximum to consider the best reconstruction within all child nodes as well as the average reconstruction quality.

The key insight that allows the learning strategy of Section 4.2 to work is that, during training, the sampling results with MCTS should always be better than what the two networks could provide without, thus providing guidance. This is possible because MCTS simulates plays according to a search policy (in our case, sampling using SampleNet and ReconNet) and finds out how good a random move is by looking at the ultimate outcomes of the simulations—in our case, the reconstruction results. With this look-ahead behavior, MCTS arrives at a decision that is better than that of the original search policy.

As shown in Fig. 5, we perform MCTS in the traditional four stages: selection, expansion, simulation, and backup. The selection follows the upper-confidence-bound strategy of typical MCTS setups, and the expansion is guided by SampleNet and ReconNet. We perform the simulation until we reach the sampling budget , so that the final reconstruction quality becomes the reward that is backed up to update the node statistic in the search tree.

In more details, as we are focusing on retrieving the best reconstruction, we use implicit minimax backups [30] with SampleNet and ReconNet together guiding the selection and exploration, as in [12, 28]. Specifically, for each node in the search tree identified with the sampling pattern , we store , the average of the rewards of all child nodes, , the maximum reward of all child nodes, and , the number of visits to the node. Then, if we denote the policy for a certain movement as , where is from Eq. (5), then we define the upper confidence bound for movement at state as


where and are hyper-parameters that control whether the search favors the average or the maximum and control the degree of exploration performed by MCTS, and is the Dirichlet noise to encourage exploration as in [12], while is the hyper-parameter that controls the degree of exploration. Note that the Dirichlet noise is applied whenever is used. Algorithm 2 is a summary of the entire MCTS process. We detail the hyper-parameter settings in Section 5.3.

In [12], a deep network that estimates the reward—referred to as the value network—was used instead of simulation due to the long nature of Go games. However, we found that the use of a value network gives worse reconstructions as shown in the experiments of Section 6.

1: : sampling pattern, : ground-truth data, : parameters of the reconstruction network, : parameters of the sampling network, : average reward of all child nodes, : maximum reward of all child nodes, : number of visits to a node, : sampling budget, and : number of MCTS rounds
2:function Search(, , , , , , )
3:     // Depending on the availability of sampling budget
4:     if  then
5:         // Select next sample
7:         if  then
8:              // New node, simulate
12:         else
13:              // Existing node, traverse down tree
14:               (from Eq. (13))
15:         end if
16:         // Recursively evaluate next sample
17:         Search(, , , , , , )
18:     else
20:         // Use actual reconstruction result
23:     end if
24:     // Backup
28:     return
29:end function
30:function MCTS(, , , )
31:     // Search tree multiple times
32:     for  to  do
33:         Search(, , , , , , )
34:     end for
36:     // Return policy from search outcomes
38:     return
39:end function
Algorithm 2 Monte Carlo tree search with implicit minimax backup.

5 Implementation Details

We now discuss the details required to implement the proposed framework into an actual system for accelerated MRI. An implementation of the method is available online.111We shall provide a link to the repository once the paper gets accepted.

5.1 Sampling with 1D Readout

Up until now, for the sake of ease in explanation, we motivated our method with the case where a single position was sampled in the k-space domain. In practice, MRI with Cartesian trajectory [1] is typically performed by sampling one line of frequency in the k-space, through a process called readout. Therefore, in our implementation, in Eq. (5) is in fact a vector in , and is a matrix in , whose rows each correspond to a sampling vector that samples elements in the row or column of .

5.2 Network Architectures

Fig. 6: Architecture of the two networks. Note that the sampling network (SampleNet) takes as input the output of the reconstruction network (ReconNet). Both of our networks are based on residual blocks [2].

The overall architecture of the two coupled networks is shown in Fig. 6. As discussed in Section 4.1

, ReconNet takes images in the spatial domain as input, and SampleNet takes the output of ReconNet as input. Here, the input image is in tensor form to take advantage of existing deep-learning libraries such as TensorFlow 


5.2.1 ReconNet

In order to recover high-quality MRI, we use a fully convolutional network, as in other recent deep-learning-based reconstructors [6, 7, 8, 25], but with a simpler architecture. In our early experiments (not shown), we also employed a more complex U-Net [32] type of architecture as in [7], but achieved similar performance with significantly higher computation cost. We therefore use the simpler architecture.

We use eight residual blocks [2]. Before the residual blocks, we employ a () linear convolutional layer to allow the network to easily adapt to the input data range, if needed. Each block contains a (

) convolutional layer followed by batch normalization and leaky-ReLU 

[33] activation. All convolutions are zero-padded to have the same output size as the input, and have 64 output channels. After the residual blocks, a () linear convolutional layer is used to convert the output back to the same number of channels as input.

5.2.2 SampleNet

To obtain a probabilistic policy on where to sample next, we use an architecture that is similar to the policy network in [12] for our SampleNet. As in ReconNet, each convolutional block contains a zero-padded () convolutional layer followed by batch normalization and leaky-ReLU activation. After each convolutional block, we apply (

) max-pooling, as well as double the number of channels until it becomes 256. The first convolutional block has 64 channels. We repeat this structure until the output of the network becomes (

). We then flatten and apply a fully connected layer with neurons, again followed by batch normalization and leaky-ReLU activation. Finally, another fully connected layer with softmax activation is used to turn the output into a probability. As noted in Section 5.1, each element in this probability represents a line.

5.3 Training Setup

We use ADAM [29] to optimize the networks in Algorithm 1, with the learning rate of . We apply weight decay constraining the -norm of the weight parameters of the network, where the hyper-parameter for the decay is empirically set to

. To prevent the networks from overfitting, we train at max 1000 iterations per round, and at max 3 epochs for the samples in current experience memory. In addition, we only keep examples from the ten most recent rounds of optimization.

For MCTS, the number of repetitions of simulation is 10. The minimax-backup parameter is set to and is set to . The parameter for exploration extent is .

To reduce the time taken to generate supervision, we create these examples in parallel with 16 threads. We train our model for rounds. It takes around four days to train our model on an Intel i7-7820X (3.60GHz) CPU and an NVidia Titan X (Pascal) GPU, or an Intel Xeon E5-2690 v3 (2.60GHz) CPU and an NVidia GeForce GTX 1080Ti GPU.

6 Results

6.1 Datasets

We evaluate our method on two datasets. The first one is composed of cardiac images, and the second one composed of knee images.

The cardiac dataset is from the cardiac atlas project [34]. From this dataset, we discard images that are corrupted with severe noise. Since the dataset is provided in the spatial domain, we bring it into the Fourier domain numerically. From the original image size of , we retain the central () crop to avoid the artificial image boundaries that exist due to the characteristics of the capture devices. We split the training and test sets based on patients, where 10 patients are assigned to train, and another 10 patients assigned to test. This results in a total of 4999 training images and 6963 test images.

The knee dataset is from an open data platform222 The dataset is measured as a 3D volume, and we use the central 2D slice in our experiments. The size of the original images is (), which we again crop to their central () region to accelerate simulations. The original data is collected from 8 coils, therefore producing an 8-channel image in k-space, which we compress down to a single channel to keep the framework identical for both datasets. The knee dataset contains fewer images than the cardiac dataset. Among the 20 patients, we use 17 patients for training and the remaining 3 for testing. This amounts to 2550 training images and 450 for testing. We again keep a small portion of the training data for validation.

For the cardiac dataset, we use our dual-channel approach to train our networks while, for the knee dataset, we use a square-root of sum-of-squares operation. We empirically found that, for the cardiac dataset composed of real-valued images, the dual-channel representation helps: for the knee dataset consists of complex-valued images, it does not.

6.2 Baselines and Evaluation Metric

6.2.1 Baselines

To validate the effectiveness of our method, both SampleNet and ReconNet, as well as the joint training framework, we compare our method against three baseline methods:

  • VDS+TV [35]: Variable density sampling (VDS) [1] with reconstruction using TV minimization [35]. This is a standard baseline for accelerated MRI.

  • LCS [11]+TV [35]: Learning-based compressive sensing (LCS) from [11], with their total-variation reconstruction [35]. We use the publicly available source code with tuned parameters.

  • VDS+FBPConv [7]: A state-of-the-art reconstruction pipeline that uses deep CNNs. It uses variable density sampling for the sampling pattern.

We also include variants using our reconstruction network instead of the original reconstruction strategies to demonstrate the effectiveness of joint training.

  • LPF + Our Recon

    : Lowpass filtering (LPF) with our reconstruction network. A simple baseline to demonstrate the performance of ReconNet on a super-resolution setup.

  • VDS + Our Recon: VDS with our reconstruction network. We use this baseline to demonstrate the importance of a sampling pattern when using a deep reconstructor.

  • LCS [11] + Our Recon: We use the sampling pattern from [11] and train ReconNet. We use this baseline to show that the pattern learned with a different reconstruction method is not the optimal pattern for deep reconstructors.

All methods were trained with the same training configurations. The downsampling factor equals 4 for all subsequent experiments.

6.2.2 Evaluation Protocol

To evaluate the effectiveness of each method, we observe the reconstruction quality in a scenario where the sample budget is one-fourth of the full-resolution signal. In other words, we evaluate the case where the reconstruction needs to upsample by a factor of four.

As evaluation metric, we use the standard PSNR measure


where and are the reconstructed and reference signals, respectively.

6.3 Quantitative Results

Sampling VDS LCS LPF Ours
Dataset Recon
zf-IFT 25.42 28.99 30.19 27.54
TV 32.33 33.72 31.49 -
FBPConv 30.21 33.77 23.79 -
Cardiac Ours 27.91 32.56 29.00 34.22
zf-IFT 22.51 26.96 27.41 26.79
TV 26.17 28.93 27.9 -
FBPConv 26.55 28.95 28.04 -
Knee Ours 25.04 28.43 28.11 29.1
TABLE I: Quantitative results in terms of PSNR for the cardiac and knee datasets. PSNR of both zero-filled inverse Fourier transform on the sampled data and that of using the reconstruction methods are shown. Best results are shown in bold. We show both the results of the original method and using our reconstruction network (ReconNet) trained with respective sampling patterns. Our method performs best for both datasets. On the other hand, our method performs well with a single hyper-parameter setup.

We provide the quantitative results in Table I. In addition to the reconstruction performance, we report the results of simple reconstructions through an inverse Fourier transform of zero-filled measurements (zf-IFT) to single out the effect of the sampling pattern. Our method outperforms all other compared methods.

As shown by the comparisons with LCS and FBPConv, learning only the sampling pattern or, simply, the reconstruction network alone, does not provide optimal performances. Furthermore, by comparing with the variants that use the sampling patterns discovered by LCS, we see that the simultaneous learning of both components favorably impacts the performance. Note that our method does not require per-dataset parameter tuning and is able to outperform the state of the art with a single hyper-parameter setup. By contrast, methods relying on TV [35] require per-dataset tuning.

One interesting thing here is that LPF, which is sampling just the low-frequency part, gives the highest reconstruction quality. However, training with this sampling pattern results in over-fitting, as demonstrated by a decrease in performance when the reconstruction network is added. This is expected as this sampling pattern does not give any information about the high-frequency parts of the signal. In a nutshell, the reconstruction network learns to interpolate the components that were not observed, which there is nothing to interpolate for this sampling pattern. However, this is not the case of the other sampling patterns, as demonstrated by the fact that ReconNet always improves performance.

Besides outperforming the other baselines, our method is also easily scalable as it is based on a stochastic gradient-based optimization.

6.4 Qualitative Results

Fig. 7: Qualitative examples for the cardiac dataset. (First row) reconstruction results; (second row) sampling patterns; (third row) color-coded residual between reconstruction and ground-truth, where red denotes positive and blue denotes negative errors. See text for details.
Fig. 8: Qualitative examples for the knee dataset. (First row) reconstruction results; (second row) sampling patterns; (third row) color-coded residual between reconstruction and ground-truth, where red denotes positive and blue denotes negative errors.

Qualitative examples for the cardiac dataset are shown in Fig. 7, and for the knee dataset in Fig. 8. In all examples, the proposed method gives the highest reconstruction quality.

In Fig. 7, as shown from the reconstruction examples of TV (VDS+TV) and LCS (LCS+TV), cartoon-like textures are apparent. However, these artifacts are well removed by our method. This is most apparent in Fig. 8 bottom. Similarly, in TV and LCS of Fig. 8 (top), the textures of the knee is blurry and appears as flat textures. In FBPConv (VDS+FBPConv) and V-Net (Ours+V-Net) , folded artifacts still remain. Our method is able to clearly reconstruct the inner structures.

6.5 t-SNE Analysis

Fig. 9: t-SNE [36] embedding of the reconstructions for the testset of the cardiac dataset.

For biomedical applications, it is critical that the reconstructed data indeed represents the original signal. We therefore perform a t-SNE analysis [36] to show that the reconstructed signals indeed overlap the ground truth, distribution-wise. We use the cardiac dataset for this analysis. It provides the largest corpus of test data for a meaningful t-SNE analysis.

We show in Fig. 9 the t-SNE map of the test set of the cardiac

dataset. Each point in the t-SNE map corresponds to an image reconstructed from a different configuration. To avoid clutter, we only show the results of one-fifth images, chosen randomly from a uniform distribution. As shown in Fig. 

9, the distribution and structure of our reconstructions are very close to the ground truth.

6.6 Sampling Evolution

Fig. 10: Evolution of the sampling patterns across optimization. The vertical axis corresponds to sampling positions. White pixels denote that the position is sampled, whereas black ones denote it is not. The sampling pattern starts from random noise at the beginning, and converges to a main sampling pattern accompanied with minor changes depending on the image.

In Fig. 10, we show how the final sampling pattern evolves as learning proceeds. At first, it is essentially random, eventually converging to a main sampling pattern that is shared for all training images plus minor components that are image-dependent. The fact that the main component is dominant is not surprising, given that the MRI data have a similar appearance.

6.7 Ablation Study

We conduct ablation studies to motivate our design choices. We first show that approximating the simulation phase with a deep network does not provide the best performance. We then demonstrate the importance of implicit minimax backup and, finally, the effectiveness of linking SampleNet with ReconNet.

6.7.1 Importance of MCTS Rollout

Dataset With V-Net [12] Our Method
cardiac 31.48 34.22
knee 26.49 29.10
TABLE II: Quantitative results in terms of PSNR, with (V-Net) and without (our method) the deep network approximation for the reward in [12]. Best results are shown in bold.

In [12] it was suggested that using a value network, referred to as the V-Net, that approximates the simulation outcomes can speed-up the learning and lead to better results. We have also tried that strategy. We provide in Table II the results compared to using a full simulation, also dubbed roll-out. As the deployment of V-net gives significantly worse results, we have not retained this strategy in our method.

6.7.2 Effectiveness of Implicit Minimax Backup

minimax backup cardiac
TABLE III: Quantitative results in terms of PSNR, with different MCTS values. Best results are shown in bold.

In Table III, we run our method with the backup parameter . In principle, lies between 0 and 1. We observed that our method performs best.

6.7.3 Using ReconNet Output as SampleNet Input

Dataset Without ReconNet Our Method
cardiac 33.53 34.22
TABLE IV: Quantitative results in terms of PSNR, with ReconNet (our method) and without ReconNet. Best results are shown in bold.

In Table IV, we compare two results. On one hand, we let the output of ReconNet be the input the SampleNet. On the other hand, we bypass the ReconNet process when training SampleNet. We use the cardiac dataset for this experiment. We see a faster333The time to convergence was four days. convergence rate when ReconNet is used, as well as better final results. Thus, we conclude that linking the two networks is helpful when learning.

7 Conclusion

We have proposed a self-supervised deep-learning framework that learns to jointly sample and reconstruct accelerated MRI. Our method is composed of two networks, namely, SampleNet and ReconNet. They learn to sample and reconstruct progressively. The two networks are trained with supervision signals that are generated through Monte Carlo tree search. This method provides on-the-fly sampling patterns that improve the quality of reconstruction. ReconNet also uses these sampling patterns to learn to reconstruct. By learning to do jointly sampling and reconstruction, our framework is able to outperform the state of the art.

To the best of our knowledge, our framework is the first to incorporate data acquisition in the training process of deep networks in the context of MRI. We believe this is a promising direction that goes beyond the current deep-learning-based methods. In the future, we hope to implement our method in an actual MRI device, for high-speed data acquisition.


This work was supported in part by the European Research Council under Grant 692726 (H2020-ERC Project GlobalBioIm), by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant “Deep Visual Geometry Machines” (RGPIN-2018-03788), and by systems supplied by Compute Canada.


  • [1] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” Magnetic Resonance in Medicine, vol. 58, no. 6, pp. 1182–1195, 2007.
  • [2] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in

    Conference on Computer Vision and Pattern Recognition

    , 2016, pp. 770–778.
  • [3] J. A. Fessler, “Model-based image reconstruction for MRI,” IEEE Signal Processing Magazine, vol. 27, no. 4, pp. 81–89, 2010.
  • [4] M. Guerquin-Kern, M. Haberlin, K. P. Pruessmann, and M. Unser, “A fast wavelet-based reconstruction method for magnetic resonance imaging,” IEEE Transactions on Medical Imaging, vol. 30, no. 9, pp. 1649–1660, 2011.
  • [5] S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Transactions on Medical Imaging, vol. 30, no. 5, pp. 1028–1041, 2011.
  • [6] J. Schlemper, J. Caballero, J. V. Hajnal, A. N. Price, and D. Rueckert, “A deep cascade of convolutional neural networks for dynamic MR image reconstruction,” IEEE Transactions on Medical Imaging, vol. 37, no. 2, pp. 491–503, 2018.
  • [7] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,” IEEE Transactions on Image Processing, vol. 26, no. 9, pp. 4509–4522, 2017.
  • [8] D. Lee, J. Yoo, and J. C. Ye, “Deep residual learning for compressed sensing MRI,” in International Symposium on Biomedical Imaging.   IEEE, 2017, pp. 15–18.
  • [9] H. K. Aggarwal, M. P. Mani, and M. Jacob, “MoDL: Model based deep learning architecture for inverse problems,” IEEE Transactions on Medical Imaging, pp. 1–1, 2018.
  • [10] K. Lee, Y. Li, K. H. Jin, and J. C. Ye, “Unified theory for recovery of sparse signals in a general transform domain,” IEEE Transactions on Information Theory, vol. 64, no. 8, pp. 5457–5477, 2018.
  • [11] B. Gözcü, R. K. Mahabadi, Y.-H. Li, E. Ilıcak, T. Cukur, J. Scarlett, and V. Cevher, “Learning-based compressive MRI,” IEEE Transactions on Medical Imaging, vol. 37, no. 6, pp. 1394–1406, 2018.
  • [12] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, “Mastering the game of Go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, 2017.
  • [13] L. Kocsis and C. Szepesvári, “Bandit based Monte-Carlo planning,” in

    European Conference on Machine Learning

    .   Springer, 2006, pp. 282–293.
  • [14] R. Coulom, “Efficient selectivity and backup operators in Monte-Carlo tree search,” in International Conference on Computers and Games.   Springer, 2006, pp. 72–83.
  • [15] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Transactions on Information Theory, vol. 52, no. 2, pp. 489–509, 2006.
  • [16] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, 2006.
  • [17] S. Vasanawala, M. Murphy, M. T. Alley, P. Lai, K. Keutzer, J. M. Pauly, and M. Lustig, “Practical parallel imaging compressed sensing MRI: Summary of two years of experience in accelerating body MRI of pediatric patients,” in International Symposium on Biomedical Imaging.   IEEE, 2011, pp. 1039–1043.
  • [18] N. Chauffert, P. Ciuciu, J. Kahn, and P. Weiss, “Variable density sampling with continuous trajectories,” SIAM Journal on Imaging Sciences, vol. 7, no. 4, pp. 1962–1992, 2014.
  • [19] B. Adcock, A. C. Hansen, and B. Roman, “The quest for optimal sampling: Computationally efficient, structure-exploiting measurements for compressed sensing,” in Compressed Sensing and its Applications.   Springer, 2015, pp. 143–167.
  • [20] S. Ravishankar and Y. Bresler, “Adaptive sampling design for compressed sensing MRI,” in Annual International Conference of the IEEE, Engineering in Medicine and Biology Society, EMBC, 2011, pp. 3751–3755.
  • [21] F. Knoll, C. Clason, C. Diwoky, and R. Stollberger, “Adapted random sampling patterns for accelerated MRI,” Magnetic Resonance Materials in Physics, Biology and Medicine, vol. 24, no. 1, pp. 43–50, 2011.
  • [22] J. Vellagoundar and R. R. Machireddy, “A robust adaptive sampling method for faster acquisition of MR images,” Magnetic Resonance Imaging, vol. 33, no. 5, pp. 635–643, 2015.
  • [23] F. Krahmer and R. Ward, “Stable and robust sampling strategies for compressive imaging,” IEEE Transactions on Image Processing, vol. 23, no. 2, pp. 612–622, 2014.
  • [24] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
  • [25] D. Lee, J. Yoo, S. Tak, and J. C. Ye, “Deep residual learning for accelerated MRI using magnitude and phase networks,” IEEE Transactions on Biomedical Engineering, vol. 65, no. 9, pp. 1985–1995, Sept 2018.
  • [26] K. Hammernik, T. Klatzer, E. Kobler, M. P. Recht, D. K. Sodickson, T. Pock, and F. Knoll, “Learning a variational network for reconstruction of accelerated MRI data,” Magnetic Resonance in Medicine, vol. 79, no. 6, pp. 3055–3071.
  • [27] G. Yang, S. Yu, H. Dong, G. Slabaugh, P. L. Dragotti, X. Ye, F. Liu, S. Arridge, J. Keegan, Y. Guo, and F. David, “Dagan: Deep de-aliasing generative adversarial networks for fast compressed sensing MRI reconstruction,” IEEE Transactions on Medical Imaging, vol. 37, no. 6, pp. 1310–1321, 2018.
  • [28] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
  • [29] D. Kingma and J. Ba, “Adam: A method for stochastic optimisation,” in International Conference on Learning Representations, vol. 5, 2015.
  • [30]

    M. Lanctot, M. H. M. Winands, T. Pepels, and N. R. Sturtevant, “Monte Carlo tree search with heuristic evaluations using implicit minimax backups,”

    IEEE Conference on Computational Intelligence and Games, pp. 1–8, 2014.
  • [31] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng, “Tensorflow: A system for large-scale machine learning,” in USENIX Conference on Operating Systems Design and Implementation, vol. 16, 2016, pp. 265–283.
  • [32] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Conference on Medical Image Computing and Computer Assisted Intervention.   Springer, 2015, pp. 234–241.
  • [33] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in International Conference on Machine Learning, vol. 30, no. 1, 2013, p. 3.
  • [34] P. Radau, Y. Lu, K. Connelly, G. Paul, A. Dick, and G. Wright, “Evaluation framework for algorithms segmenting short axis cardiac MRI,” The MIDAS Journal-Cardiac MR Left Ventricle Segmentation Challenge, vol. 49, 2009.
  • [35] S. Becker, J. Bobin, and E. J. Candès, “NESTA: A fast and accurate first-order method for sparse recovery,” SIAM Journal on Imaging Sciences, vol. 4, no. 1, pp. 1–39, 2011.
  • [36] L. v. d. Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of machine learning research, vol. 9, pp. 2579–2605, 2008.