1 Introduction
Magnetic resonance imaging (MRI) is an important noninvasive way to investigate the human body, for example to diagnose cancer. To obtain data, one performs scans in the frequency domain which is referred to as kspace in the biomedicalimaging literature [1]. Once this kspace data are acquired, they are then converted into the spatial domain, which provides images that physicians can use for diagnosis. When performing these scans, it is important that one makes the most out of each scan, as patients can remain still and hold their breadth for only a limited amount of time.
To obtain highquality MRI within a limited sampling budget, researchers have investigated compressedsensingbased methods [1, 3]. This problem, often referred to as accelerated MRI, is traditionally tackled by exploiting the fact that MRI images, even when sampled fully, are sparse in nature once transformed into, for example, wavelets [1, 4], and by considering the statistics of the target data [5]. Recently, as in many other areas, deeplearningbased methods have shown promising results [6, 7, 8, 9]. However, these methods mainly focus on the reconstruction part of the problem and assume a fixed sampling pattern. In other words, they do not consider how data should be acquired and are therefore passive in nature.
As the acquisition pattern plays a critical role in the final reconstruction quality, researchers have also focused on optimizing the sampling pattern in kspace. For example, compressivesensingbased techniques have shown promising outcomes with variabledensity sampling [10]. The work in [11] goes beyond variabledensity patterns by taking advantage of the statistics of the data. However, when learning or deriving these sampling patterns, existing methods consider the reconstruction process as fixed and, therefore, disregard the intertwined nature of the problem.
A possible reason why existing methods do not consider the two problems together is that it is not straightforward to incorporate the sampling design in an optimization framework. Sampling is a categorical variable and is not differentiable, which prevents the use of simple gradientbased methods. An exhaustive search is inaccessible due to the size of as the solution space. Greedy methods like
[11] are not viable either because they would require one to investigate too many cases, considering that both sampling and reconstruction are optimized jointly.In this paper, we incorporate data acquisition (sampling) into our learning framework and to learn to jointly sample and reconstruct. By doing so, our method learns to actively obtain data for highquality reconstruction and is able to go beyond what can be done by doing either separately. Specifically, we propose to simultaneously train two neural networks, each dedicated to reconstruction and to sampling. To incorporate the sampling process in our training setup, we take inspiration from [12]. We learn a sampling network that determines the next sampling position, in other words, we learn a progressive sampler. The training supervision for this network is selfgenerated through Monte Carlo tree search (MCTS) [13, 14], where the search is guided by the current sampling network and results in a sampling pattern that is better than what is possible with the sampling network alone. However, unlike [12], where a network is used in place of a rollout, we perform rollout as in the original MCTS and use the peak signal to noise ratio (PSNR) given by the reconstruction network linked with the performance of the reconstruction network. In case of the reconstruction network, we train it with the signal sampled through MCTS as input and the groundtruth fully sampled image as the desired output. At test time, the policy network progressively generates the sampling pattern by looking at the current reconstruction of the network. As shown in Fig. 1, this allows our network to outperform the state of the art. To the best of our knowledge, our work is the first successful attempt at learning the accelerated MRI pipeline as a whole.
2 Related Works
As shown in Fig. 2, for accelerated MRI, one samples kspace partially, but then reconstructs the images to higher resolution. Since the pioneering work of [1], a large body of works exists for accelerated MRI. Here, we first present the compressive sensing for accelerated MRI, then works related to learning optimal sampling patterns, and, finally, works that focus on the reconstruction phase of the pipeline.
2.1 Compressive Sensing in Accelerated MRI
In [1], compressed sensing [15, 16] is applied to reconstruct subNyquist measurements. The premise behind the reconstruction process is that typical MRI images have sparse coefficients in some transform domain, such as wavelets [1], and can therefore be iteratively reconstructed with a sparsity constraint as prior [15]. By doing so, the authors achieve a faster acquisition time and an aliasfree image reconstruction under random sampling. Specifically, in [1]
, random variabledensity sampling is used to sample the lowfrequency components more densely and improve the image quality.
Since its first introduction, several variabledensity sampling methods have been proposed. In [17], a Poissondisc distribution is exploited to reduce correlated effects caused by closely located samples while retaining a uniform distance between samples. In [18], continuous trajectories have been proposed. In [19], the authors argue that taking into account both shared and nonshared components when sparsifying is important when reconstructing.
Those sampling and reconstruction methods have all shown superior quality over [1]. However, they all suffer from the fact that the optimal choice of the variable density depends highly on the target dataset. Therefore, manual intervention per dataset and, sometimes, per image, is necessary if one is to obtain good reconstruction quality, making it less practical.
2.2 Learned Sampling patterns for Accelerated MRI
Beyond handcrafted sampling patterns, learningbased methods have shown promising results. One of the very first method to do so successfully is [20], where the authors define finite nonoverlapped cells with a certain width in kspace and then switch their locations according to reconstruction performances in a greedy fashion based on a sorted list from infinitep norm. To evaluate the reconstruction, they rely on a dictionarybased reconstruction [5]. Their method performs better than previous compressedsensing methods but suffers from two drawbacks. One is that the optimal choice of p for the infinitep norm depends highly on the dataset. The other is that there is no guarantee that the pretrained dictionary is applicable to the data in question.
Instead of an infinitep norm and pretrained dictionaries, researchers have also considered the benefit of working directly with the kspace power spectrum [21, 22]. A standard assumption is that the kspace components that hold more power are more important than the ones with less power. It leads to a selection of the sampling pattern based on the kspace energy distribution from the reference. Again, these methods suffer from the fact that there is no guarantee that the power spectrum of a reference would be similar to testset, as the selection is performed offline. Furthermore, as we show later, a simple sampling of the highenergy signals does not necessarily yield better reconstructions.
As another direction of research, the relationship between the forward model matrix and trusted sampling positions [10, 23] has been investigated. A performance guarantee based on basis components of the forward model has been derived. Still, these approaches have a limited consideration on signal energy distribution of each dataset.
Recently, methods based on statistical theory have gained interest. The authors of [11] use the PSNR or the structural similarity index (SSIM) [24]
, which are typical imagequality measures, and greedily search for the sampling pattern that maximizes either one. Their method is greedy and progressive. It looks at how the measure of interest evolves when a new sample position is added to the sampling pool. Statistical error bounds for the optimal estimation of sampling patterns are provided in the form of theorems. They require the reconstruction pipeline to be fixed throughout the greedy selection process; but there is no guarantee that the greedy selection would lead to an optimal solution.
2.3 Neural Networks for Accelerated MRI
Recently, as in many other areas, deep learning has become a popular approach for accelerated MRI reconstruction. Typically, convolutional neural networks (CNN) are trained in a fully supervised manner to regress to highresolution reconstructions [6, 7, 25, 26]. In [7], multiresolution CNN with residual learning enhances the quality of accelerated MRI by encapsulating physical models before input is fed. In [6] and [26], cascaded chains of CNN with data consistency constraints yield performance gain over dictionarybased reconstructions [5]. In [25], inherently complex MRI are split into magnitude and phase components. For each component, a distinct network is trained to remove distortions from acceleration. The author of [25] have shown promising results and achieved higher reconstruction quality over traditional methods. Beyond simple regression, generative adversarial networks have also been recently exploited to avoid overfitting [27] .
While effective, existing methods still rely on fixed sampling patterns for data acquisition. As we illustrate in Section 6 through experiments, this limits the capabilities of these advanced reconstruction methods.
3 Problem Formulation
Before we detail our method, we first formalize the accelerated MRI problem. Throughout the formalization, for simplicity, we assume a finite, discrete, and complete signal model. We write the vectorized fullresolution signal in the image domain as
, where is the number of pixels in the image.We denote the discrete Fourier transform by the matrix
. For accelerated MRI, not all frequencies are sampled. Denoting the sampling process by the Boolean matrix , where is the number of samples, the raw measurements satisfy(1) 
Given Eq. (1), we want to obtain an accurate estimate of by using the observation . Formally, if we denote the reconstruction process as , where is the vector of parameters that define the reconstruction process, and a quality measure, for example the PSNR, Euclidean distance, or SSIM [24], as , then the problem of learning a system for accelerated MRI can be formulated as the problem of finding and such that
(2) 
where denotes expectation over . This formulation involves both and . It is therefore a problem that involves both sampling and reconstruction. What makes it difficult is that
is Boolean, which leads to a combinatorial optimization.
Existing works can also be understood in the formulation Eq. (2). In the traditional compressivesensing setup [1], would be a fixed realization of a randomvariable density. For the reconstruction part driven by , it would be a nonparametric process that involves the solution of a sparse optimization problem. Formally,
(3)  
where is a sparsifying transform, for example the wavelet transform or the finite differences corresponding to total variation (TV). In the case of deeplearningbased methods [6, 7, 25], would still remain fixed and would be a deep network with now being the parameters of the network. Thus, one would only optimize for in Eq. (2). Conversely, in works related to finding the optimal sampling pattern [11, 20] represented by the optimal sampling matrix , the reconstruction process remains fixed and, therefore, one optimizes only for .
Unlike the existing methods, we optimize jointly and . This allows our method to outperform what can be done by training them separately.
4 Method
We first provide an overview of the architecture and the training framework. We further detail how selfsupervision signals are obtained through MCTS.
4.1 Overall Framework
In order to learn to jointly sample and reconstruct, as shown in Fig. 3, we learn two deep networks that are tied together. One network learns to reconstruct a highresolution image given the sampled data. The other learns to estimate the position of the next sample, given the previous reconstruction result. We refer to the former as ReconNet and to the latter as SampleNet.
4.1.1 ReconNet
In more details, in case of ReconNet, as in other recent deeplearningbased methods for reconstruction [7, 8], we operate in the image domain instead of the kspace domain, starting from the backprojection .If we denote the reconstruction network as , then the reconstruction achieved by ReconNet is
(4)  
A is a matrix of ones and zeros that represents sampling, the multiplication with
undoes the sampling process by padding the unobserved signal with zeros.
4.1.2 SampleNet
For SampleNet, we opt for a progressivesampling strategy where we draw one sample at a time using all data acquired up to the instant we sample. In other words, the learning is such that SampleNet outputs a probabilistic policy for determining the next sampling point given the current reconstruction. SampleNet would be called a policy network in reinforcementlearning literature [12, 28]. At sampling time , if we denote SampleNet with parameter and the probabilistic policy as , then, using Eq. (4), we write that
(5) 
We can then write a recursive formula for the sampling pattern at time as
(6) 
where denotes concatenation in column direction between and , and is a Boolean vector with all values except for a single element that is at .
With Eqs. (4), (5), and (6), we can now rewrite the learning problem in Eq. (2) as an optimization problem that involves the training of two networks. We write that
(7) 
The expectation is now also on , as we are performing progressive sampling, and the two networks should also be able to deal with various sampling times.
This formulation, however, cannot be solved with conventional gradientbased methods because Eq. (7) is nondifferentiable. Instead, we propose to train the two networks in an alternative way through selfsupervision.
4.2 SelfSupervised Learning
To train ReconNet and SampleNet, we draw inspiration from AlphaGo [12, 28]. We propose to train our two networks with their own separate objectives, which will eventually lead to satisfying the original objective. Conceptually, we train ReconNet so that it learns to improve on a given sampling pattern, and also learn SampleNet so that it learns to provide better sampling patterns given the reconstruction, which will then result in consistently better final reconstruction quality.
In more detail, as shown in Fig. 4, we rely on MCTS [13, 14] for the improvement of these networks through the optimization rounds. We explain in detail how we use MCTS in Section 4.3. For now, assume that MCTS is a blackbox component that gives a sample policy that improves on , the policy from SampleNet with current network parameters and , where is the round of optimization. Formally, we write that
(8) 
Then, with MCTS, we create sampling patterns recursively by performing
(9) 
which leads to better reconstruction results
(10) 
With , , , and , we train both ReconNet and SampleNet.
For ReconNet, we train to obtain highquality reconstructions, given the sampling patterns from MCTS. In other words, we simply replace with in Eq. (7). By doing so, we remove the dependency of , leading to the perround optimization
(11) 
where is the starting point of the optimization. This can now be trained with a gradientbased solver such as ADAM [29]. The standard PSNR measure in Eq. (14) is used for .
We train SampleNet to emulate the outcome of MCTS. Thus, with Eq. (5), and denoting cross entropy as , we write that
(12) 
where is the initial value of for the optimization. This objective can also be optimized through gradient descent.
In practice, to avoid overfitting, we optimize Eq. (11) and Eq. (12) for only steps per round. We also apply experience replay, by creating the training batch from randomly selected past examples. This further prevents overfitting and ensures smooth optimization [12, 28]. A summary of the process is provided in Algorithm 1, with subscripts for the optimization round and the reconstruction episode added for clarity.
4.3 Monte Carlo Tree Search for Sampling
The key insight that allows the learning strategy of Section 4.2 to work is that, during training, the sampling results with MCTS should always be better than what the two networks could provide without, thus providing guidance. This is possible because MCTS simulates plays according to a search policy (in our case, sampling using SampleNet and ReconNet) and finds out how good a random move is by looking at the ultimate outcomes of the simulations—in our case, the reconstruction results. With this lookahead behavior, MCTS arrives at a decision that is better than that of the original search policy.
As shown in Fig. 5, we perform MCTS in the traditional four stages: selection, expansion, simulation, and backup. The selection follows the upperconfidencebound strategy of typical MCTS setups, and the expansion is guided by SampleNet and ReconNet. We perform the simulation until we reach the sampling budget , so that the final reconstruction quality becomes the reward that is backed up to update the node statistic in the search tree.
In more details, as we are focusing on retrieving the best reconstruction, we use implicit minimax backups [30] with SampleNet and ReconNet together guiding the selection and exploration, as in [12, 28]. Specifically, for each node in the search tree identified with the sampling pattern , we store , the average of the rewards of all child nodes, , the maximum reward of all child nodes, and , the number of visits to the node. Then, if we denote the policy for a certain movement as , where is from Eq. (5), then we define the upper confidence bound for movement at state as
(13) 
where and are hyperparameters that control whether the search favors the average or the maximum and control the degree of exploration performed by MCTS, and is the Dirichlet noise to encourage exploration as in [12], while is the hyperparameter that controls the degree of exploration. Note that the Dirichlet noise is applied whenever is used. Algorithm 2 is a summary of the entire MCTS process. We detail the hyperparameter settings in Section 5.3.
5 Implementation Details
We now discuss the details required to implement the proposed framework into an actual system for accelerated MRI. An implementation of the method is available online.^{1}^{1}1We shall provide a link to the repository once the paper gets accepted.
5.1 Sampling with 1D Readout
Up until now, for the sake of ease in explanation, we motivated our method with the case where a single position was sampled in the kspace domain. In practice, MRI with Cartesian trajectory [1] is typically performed by sampling one line of frequency in the kspace, through a process called readout. Therefore, in our implementation, in Eq. (5) is in fact a vector in , and is a matrix in , whose rows each correspond to a sampling vector that samples elements in the row or column of .
5.2 Network Architectures
The overall architecture of the two coupled networks is shown in Fig. 6. As discussed in Section 4.1
, ReconNet takes images in the spatial domain as input, and SampleNet takes the output of ReconNet as input. Here, the input image is in tensor form to take advantage of existing deeplearning libraries such as TensorFlow
[31].5.2.1 ReconNet
In order to recover highquality MRI, we use a fully convolutional network, as in other recent deeplearningbased reconstructors [6, 7, 8, 25], but with a simpler architecture. In our early experiments (not shown), we also employed a more complex UNet [32] type of architecture as in [7], but achieved similar performance with significantly higher computation cost. We therefore use the simpler architecture.
We use eight residual blocks [2]. Before the residual blocks, we employ a () linear convolutional layer to allow the network to easily adapt to the input data range, if needed. Each block contains a (
) convolutional layer followed by batch normalization and leakyReLU
[33] activation. All convolutions are zeropadded to have the same output size as the input, and have 64 output channels. After the residual blocks, a () linear convolutional layer is used to convert the output back to the same number of channels as input.5.2.2 SampleNet
To obtain a probabilistic policy on where to sample next, we use an architecture that is similar to the policy network in [12] for our SampleNet. As in ReconNet, each convolutional block contains a zeropadded () convolutional layer followed by batch normalization and leakyReLU activation. After each convolutional block, we apply (
) maxpooling, as well as double the number of channels until it becomes 256. The first convolutional block has 64 channels. We repeat this structure until the output of the network becomes (
). We then flatten and apply a fully connected layer with neurons, again followed by batch normalization and leakyReLU activation. Finally, another fully connected layer with softmax activation is used to turn the output into a probability. As noted in Section 5.1, each element in this probability represents a line.5.3 Training Setup
We use ADAM [29] to optimize the networks in Algorithm 1, with the learning rate of . We apply weight decay constraining the norm of the weight parameters of the network, where the hyperparameter for the decay is empirically set to
. To prevent the networks from overfitting, we train at max 1000 iterations per round, and at max 3 epochs for the samples in current experience memory. In addition, we only keep examples from the ten most recent rounds of optimization.
For MCTS, the number of repetitions of simulation is 10. The minimaxbackup parameter is set to and is set to . The parameter for exploration extent is .
To reduce the time taken to generate supervision, we create these examples in parallel with 16 threads. We train our model for rounds. It takes around four days to train our model on an Intel i77820X (3.60GHz) CPU and an NVidia Titan X (Pascal) GPU, or an Intel Xeon E52690 v3 (2.60GHz) CPU and an NVidia GeForce GTX 1080Ti GPU.
6 Results
6.1 Datasets
We evaluate our method on two datasets. The first one is composed of cardiac images, and the second one composed of knee images.
The cardiac dataset is from the cardiac atlas project [34]. From this dataset, we discard images that are corrupted with severe noise. Since the dataset is provided in the spatial domain, we bring it into the Fourier domain numerically. From the original image size of , we retain the central () crop to avoid the artificial image boundaries that exist due to the characteristics of the capture devices. We split the training and test sets based on patients, where 10 patients are assigned to train, and another 10 patients assigned to test. This results in a total of 4999 training images and 6963 test images.
The knee dataset is from an open data platform^{2}^{2}2http://mridata.org/. The dataset is measured as a 3D volume, and we use the central 2D slice in our experiments. The size of the original images is (), which we again crop to their central () region to accelerate simulations. The original data is collected from 8 coils, therefore producing an 8channel image in kspace, which we compress down to a single channel to keep the framework identical for both datasets. The knee dataset contains fewer images than the cardiac dataset. Among the 20 patients, we use 17 patients for training and the remaining 3 for testing. This amounts to 2550 training images and 450 for testing. We again keep a small portion of the training data for validation.
For the cardiac dataset, we use our dualchannel approach to train our networks while, for the knee dataset, we use a squareroot of sumofsquares operation. We empirically found that, for the cardiac dataset composed of realvalued images, the dualchannel representation helps: for the knee dataset consists of complexvalued images, it does not.
6.2 Baselines and Evaluation Metric
6.2.1 Baselines
To validate the effectiveness of our method, both SampleNet and ReconNet, as well as the joint training framework, we compare our method against three baseline methods:

VDS+FBPConv [7]: A stateoftheart reconstruction pipeline that uses deep CNNs. It uses variable density sampling for the sampling pattern.
We also include variants using our reconstruction network instead of the original reconstruction strategies to demonstrate the effectiveness of joint training.

LPF + Our Recon
: Lowpass filtering (LPF) with our reconstruction network. A simple baseline to demonstrate the performance of ReconNet on a superresolution setup.

VDS + Our Recon: VDS with our reconstruction network. We use this baseline to demonstrate the importance of a sampling pattern when using a deep reconstructor.
All methods were trained with the same training configurations. The downsampling factor equals 4 for all subsequent experiments.
6.2.2 Evaluation Protocol
To evaluate the effectiveness of each method, we observe the reconstruction quality in a scenario where the sample budget is onefourth of the fullresolution signal. In other words, we evaluate the case where the reconstruction needs to upsample by a factor of four.
As evaluation metric, we use the standard PSNR measure
(14) 
where and are the reconstructed and reference signals, respectively.
6.3 Quantitative Results
Sampling  VDS  LCS  LPF  Ours  

Dataset  Recon  
zfIFT  25.42  28.99  30.19  27.54  
TV  32.33  33.72  31.49    
FBPConv  30.21  33.77  23.79    
Cardiac  Ours  27.91  32.56  29.00  34.22 
zfIFT  22.51  26.96  27.41  26.79  
TV  26.17  28.93  27.9    
FBPConv  26.55  28.95  28.04    
Knee  Ours  25.04  28.43  28.11  29.1 
We provide the quantitative results in Table I. In addition to the reconstruction performance, we report the results of simple reconstructions through an inverse Fourier transform of zerofilled measurements (zfIFT) to single out the effect of the sampling pattern. Our method outperforms all other compared methods.
As shown by the comparisons with LCS and FBPConv, learning only the sampling pattern or, simply, the reconstruction network alone, does not provide optimal performances. Furthermore, by comparing with the variants that use the sampling patterns discovered by LCS, we see that the simultaneous learning of both components favorably impacts the performance. Note that our method does not require perdataset parameter tuning and is able to outperform the state of the art with a single hyperparameter setup. By contrast, methods relying on TV [35] require perdataset tuning.
One interesting thing here is that LPF, which is sampling just the lowfrequency part, gives the highest reconstruction quality. However, training with this sampling pattern results in overfitting, as demonstrated by a decrease in performance when the reconstruction network is added. This is expected as this sampling pattern does not give any information about the highfrequency parts of the signal. In a nutshell, the reconstruction network learns to interpolate the components that were not observed, which there is nothing to interpolate for this sampling pattern. However, this is not the case of the other sampling patterns, as demonstrated by the fact that ReconNet always improves performance.
Besides outperforming the other baselines, our method is also easily scalable as it is based on a stochastic gradientbased optimization.
6.4 Qualitative Results
Qualitative examples for the cardiac dataset are shown in Fig. 7, and for the knee dataset in Fig. 8. In all examples, the proposed method gives the highest reconstruction quality.
In Fig. 7, as shown from the reconstruction examples of TV (VDS+TV) and LCS (LCS+TV), cartoonlike textures are apparent. However, these artifacts are well removed by our method. This is most apparent in Fig. 8 bottom. Similarly, in TV and LCS of Fig. 8 (top), the textures of the knee is blurry and appears as flat textures. In FBPConv (VDS+FBPConv) and VNet (Ours+VNet) , folded artifacts still remain. Our method is able to clearly reconstruct the inner structures.
6.5 tSNE Analysis
For biomedical applications, it is critical that the reconstructed data indeed represents the original signal. We therefore perform a tSNE analysis [36] to show that the reconstructed signals indeed overlap the ground truth, distributionwise. We use the cardiac dataset for this analysis. It provides the largest corpus of test data for a meaningful tSNE analysis.
We show in Fig. 9 the tSNE map of the test set of the cardiac
dataset. Each point in the tSNE map corresponds to an image reconstructed from a different configuration. To avoid clutter, we only show the results of onefifth images, chosen randomly from a uniform distribution. As shown in Fig.
9, the distribution and structure of our reconstructions are very close to the ground truth.6.6 Sampling Evolution
In Fig. 10, we show how the final sampling pattern evolves as learning proceeds. At first, it is essentially random, eventually converging to a main sampling pattern that is shared for all training images plus minor components that are imagedependent. The fact that the main component is dominant is not surprising, given that the MRI data have a similar appearance.
6.7 Ablation Study
We conduct ablation studies to motivate our design choices. We first show that approximating the simulation phase with a deep network does not provide the best performance. We then demonstrate the importance of implicit minimax backup and, finally, the effectiveness of linking SampleNet with ReconNet.
6.7.1 Importance of MCTS Rollout
Dataset  With VNet [12]  Our Method 

cardiac  31.48  34.22 
knee  26.49  29.10 
In [12] it was suggested that using a value network, referred to as the VNet, that approximates the simulation outcomes can speedup the learning and lead to better results. We have also tried that strategy. We provide in Table II the results compared to using a full simulation, also dubbed rollout. As the deployment of Vnet gives significantly worse results, we have not retained this strategy in our method.
6.7.2 Effectiveness of Implicit Minimax Backup
minimax backup  cardiac 

32.99  
34.22  
28.96 
In Table III, we run our method with the backup parameter . In principle, lies between 0 and 1. We observed that our method performs best.
6.7.3 Using ReconNet Output as SampleNet Input
Dataset  Without ReconNet  Our Method 

cardiac  33.53  34.22 
In Table IV, we compare two results. On one hand, we let the output of ReconNet be the input the SampleNet. On the other hand, we bypass the ReconNet process when training SampleNet. We use the cardiac dataset for this experiment. We see a faster^{3}^{3}3The time to convergence was four days. convergence rate when ReconNet is used, as well as better final results. Thus, we conclude that linking the two networks is helpful when learning.
7 Conclusion
We have proposed a selfsupervised deeplearning framework that learns to jointly sample and reconstruct accelerated MRI. Our method is composed of two networks, namely, SampleNet and ReconNet. They learn to sample and reconstruct progressively. The two networks are trained with supervision signals that are generated through Monte Carlo tree search. This method provides onthefly sampling patterns that improve the quality of reconstruction. ReconNet also uses these sampling patterns to learn to reconstruct. By learning to do jointly sampling and reconstruction, our framework is able to outperform the state of the art.
To the best of our knowledge, our framework is the first to incorporate data acquisition in the training process of deep networks in the context of MRI. We believe this is a promising direction that goes beyond the current deeplearningbased methods. In the future, we hope to implement our method in an actual MRI device, for highspeed data acquisition.
Acknowledgments
This work was supported in part by the European Research Council under Grant 692726 (H2020ERC Project GlobalBioIm), by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant “Deep Visual Geometry Machines” (RGPIN201803788), and by systems supplied by Compute Canada.
References
 [1] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” Magnetic Resonance in Medicine, vol. 58, no. 6, pp. 1182–1195, 2007.

[2]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in
Conference on Computer Vision and Pattern Recognition
, 2016, pp. 770–778.  [3] J. A. Fessler, “Modelbased image reconstruction for MRI,” IEEE Signal Processing Magazine, vol. 27, no. 4, pp. 81–89, 2010.
 [4] M. GuerquinKern, M. Haberlin, K. P. Pruessmann, and M. Unser, “A fast waveletbased reconstruction method for magnetic resonance imaging,” IEEE Transactions on Medical Imaging, vol. 30, no. 9, pp. 1649–1660, 2011.
 [5] S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled kspace data by dictionary learning,” IEEE Transactions on Medical Imaging, vol. 30, no. 5, pp. 1028–1041, 2011.
 [6] J. Schlemper, J. Caballero, J. V. Hajnal, A. N. Price, and D. Rueckert, “A deep cascade of convolutional neural networks for dynamic MR image reconstruction,” IEEE Transactions on Medical Imaging, vol. 37, no. 2, pp. 491–503, 2018.
 [7] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,” IEEE Transactions on Image Processing, vol. 26, no. 9, pp. 4509–4522, 2017.
 [8] D. Lee, J. Yoo, and J. C. Ye, “Deep residual learning for compressed sensing MRI,” in International Symposium on Biomedical Imaging. IEEE, 2017, pp. 15–18.
 [9] H. K. Aggarwal, M. P. Mani, and M. Jacob, “MoDL: Model based deep learning architecture for inverse problems,” IEEE Transactions on Medical Imaging, pp. 1–1, 2018.
 [10] K. Lee, Y. Li, K. H. Jin, and J. C. Ye, “Unified theory for recovery of sparse signals in a general transform domain,” IEEE Transactions on Information Theory, vol. 64, no. 8, pp. 5457–5477, 2018.
 [11] B. Gözcü, R. K. Mahabadi, Y.H. Li, E. Ilıcak, T. Cukur, J. Scarlett, and V. Cevher, “Learningbased compressive MRI,” IEEE Transactions on Medical Imaging, vol. 37, no. 6, pp. 1394–1406, 2018.
 [12] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis, “Mastering the game of Go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, 2017.

[13]
L. Kocsis and C. Szepesvári, “Bandit based MonteCarlo planning,” in
European Conference on Machine Learning
. Springer, 2006, pp. 282–293.  [14] R. Coulom, “Efficient selectivity and backup operators in MonteCarlo tree search,” in International Conference on Computers and Games. Springer, 2006, pp. 72–83.
 [15] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Transactions on Information Theory, vol. 52, no. 2, pp. 489–509, 2006.
 [16] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, 2006.
 [17] S. Vasanawala, M. Murphy, M. T. Alley, P. Lai, K. Keutzer, J. M. Pauly, and M. Lustig, “Practical parallel imaging compressed sensing MRI: Summary of two years of experience in accelerating body MRI of pediatric patients,” in International Symposium on Biomedical Imaging. IEEE, 2011, pp. 1039–1043.
 [18] N. Chauffert, P. Ciuciu, J. Kahn, and P. Weiss, “Variable density sampling with continuous trajectories,” SIAM Journal on Imaging Sciences, vol. 7, no. 4, pp. 1962–1992, 2014.
 [19] B. Adcock, A. C. Hansen, and B. Roman, “The quest for optimal sampling: Computationally efficient, structureexploiting measurements for compressed sensing,” in Compressed Sensing and its Applications. Springer, 2015, pp. 143–167.
 [20] S. Ravishankar and Y. Bresler, “Adaptive sampling design for compressed sensing MRI,” in Annual International Conference of the IEEE, Engineering in Medicine and Biology Society, EMBC, 2011, pp. 3751–3755.
 [21] F. Knoll, C. Clason, C. Diwoky, and R. Stollberger, “Adapted random sampling patterns for accelerated MRI,” Magnetic Resonance Materials in Physics, Biology and Medicine, vol. 24, no. 1, pp. 43–50, 2011.
 [22] J. Vellagoundar and R. R. Machireddy, “A robust adaptive sampling method for faster acquisition of MR images,” Magnetic Resonance Imaging, vol. 33, no. 5, pp. 635–643, 2015.
 [23] F. Krahmer and R. Ward, “Stable and robust sampling strategies for compressive imaging,” IEEE Transactions on Image Processing, vol. 23, no. 2, pp. 612–622, 2014.
 [24] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
 [25] D. Lee, J. Yoo, S. Tak, and J. C. Ye, “Deep residual learning for accelerated MRI using magnitude and phase networks,” IEEE Transactions on Biomedical Engineering, vol. 65, no. 9, pp. 1985–1995, Sept 2018.
 [26] K. Hammernik, T. Klatzer, E. Kobler, M. P. Recht, D. K. Sodickson, T. Pock, and F. Knoll, “Learning a variational network for reconstruction of accelerated MRI data,” Magnetic Resonance in Medicine, vol. 79, no. 6, pp. 3055–3071.
 [27] G. Yang, S. Yu, H. Dong, G. Slabaugh, P. L. Dragotti, X. Ye, F. Liu, S. Arridge, J. Keegan, Y. Guo, and F. David, “Dagan: Deep dealiasing generative adversarial networks for fast compressed sensing MRI reconstruction,” IEEE Transactions on Medical Imaging, vol. 37, no. 6, pp. 1310–1321, 2018.
 [28] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
 [29] D. Kingma and J. Ba, “Adam: A method for stochastic optimisation,” in International Conference on Learning Representations, vol. 5, 2015.

[30]
M. Lanctot, M. H. M. Winands, T. Pepels, and N. R. Sturtevant, “Monte Carlo tree search with heuristic evaluations using implicit minimax backups,”
IEEE Conference on Computational Intelligence and Games, pp. 1–8, 2014.  [31] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng, “Tensorflow: A system for largescale machine learning,” in USENIX Conference on Operating Systems Design and Implementation, vol. 16, 2016, pp. 265–283.
 [32] O. Ronneberger, P. Fischer, and T. Brox, “UNet: Convolutional networks for biomedical image segmentation,” in Conference on Medical Image Computing and Computer Assisted Intervention. Springer, 2015, pp. 234–241.
 [33] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in International Conference on Machine Learning, vol. 30, no. 1, 2013, p. 3.
 [34] P. Radau, Y. Lu, K. Connelly, G. Paul, A. Dick, and G. Wright, “Evaluation framework for algorithms segmenting short axis cardiac MRI,” The MIDAS JournalCardiac MR Left Ventricle Segmentation Challenge, vol. 49, 2009.
 [35] S. Becker, J. Bobin, and E. J. Candès, “NESTA: A fast and accurate firstorder method for sparse recovery,” SIAM Journal on Imaging Sciences, vol. 4, no. 1, pp. 1–39, 2011.
 [36] L. v. d. Maaten and G. Hinton, “Visualizing data using tSNE,” Journal of machine learning research, vol. 9, pp. 2579–2605, 2008.