Magnetic Resonance Imaging (MRI) is a powerful diagnostic tool for a variety of disorders, but its utility is often limited by its slow speed compared to competing modalities like CT or X-Rays. Reducing the time required for a scan would decrease the cost of MR imaging, and allow for acquiring scans in situations where a patient cannot stay still for the current minimum scan duration. One approach to accelerating MRI acquisition, called Parallel Imaging (PI) [13, 8, 3], utilizes multiple receiver coils to simultaneously acquire multiple views of the underlying anatomy, which are then combined in software. Multi-coil imaging is widely used in current clinical practice. A complementary approach to accelerating MRIs acquires only a subset of measurements and utilizes Compressed Sensing (CS) [1, 7]
methods to reconstruct the final image from these undersampled measurements. The combination of PI and CS, which involves acquiring undersampled measurements from multiple views of the anatomy, has the potential to allow faster scans than is possible by either method alone. Reconstructing MRIs from such undersampled multi-coil measurements has remained an active area of research. MRI reconstruction can be viewed as an inverse problem and previous research has proposed neural networks whose design is inspired by the optimization procedure to solve such a problem[4, 9, 10, 6]. A limitation of such an approach is that it assumes the forward process is completely known, which is an unrealistic assumption for the multi-coil reconstruction problem. In this paper, we present a novel technique for reconstructing MRI images from undersampled multi-coil data that does not make this assumption. We extend previously proposed variational methods by learning the forward process in conjunction with reconstruction, alleviating this limitation. We show through experiments on the fastMRI dataset that such an approach yields higher fidelity reconstructions. Our contributions are as follows: 1) we extend the previously proposed variational network model by learning completely end-to-end; 2) we explore the design space for the variational networks to determine the optimal intermediate representations and neural network architectures for better reconstruction quality; and 3) we perform extensive experiments using our model on the fastMRI dataset and obtain new state-of-the-art results for both the knee and the brain MRIs.
2 Background and Related Work
2.1 Accelerated MRI acquisition
An MR scanner images a patient’s anatomy by acquiring measurements in the frequency domain, calledk-space
, using a measuring instrument called a receiver coil. The image can then be obtained by applying an inverse multidimensional Fourier transformto the measured k-space samples. The underlying image is related to the measured k-space samples as
where is the measurement noise and is the fourier transform operator. Most modern scanners contain multiple receiver coils. Each coil acquires k-space samples that are modulated by the sensitivity of the coil to the MR signal arising from different regions of the anatomy. Thus, the -th coil measures:
where is a complex-valued diagonal matrix encoding the position dependent sensitivity map of the -th coil and is the number of coils. The sensitivity maps are normalized to satisfy :
The speed of MRI acquisition is limited by the number of k-space samples obtained. This acquisition process can be accelerated by obtaining undersampled k-space data, , where is a binary mask operator that selects a subset of the k-space points and denotes the measured k-space data. The same mask is used for all coils. Applying an inverse Fourier transform naively to this under-sampled k-space data results in aliasing artifacts. Parallel Imaging can be used to accelerate imaging by exploiting redundancies in k-space samples measured by different coils. The sensitivity maps
can be estimated using the central region of k-space corresponding to low frequencies, called theAuto-Calibration Signal (ACS), which is typically fully sampled. To accurately estimate these sensitivity maps, the ACS must be sufficiently large, which limits the maximum possible acceleration.
2.2 Compressed Sensing for Parallel MRI Reconstruction
Compressed Sensing  enables reconstruction of images by using fewer k-space measurements than is possible with classical signal processing methods by enforcing suitable priors. Classical compressed sensing methods solve the following optimization problem:
where is a regularization function that enforces a sparsity constraint, is the linear forward operator that multiplies by the sensitivity maps, applies 2D fourier transform and then under-samples the data, and
is the vector of masked k-space data from all coils. This problem can be solved by iterative gradient descent methods. In the-th step the image is updated from to using:
where is the learning rate, is the gradient of with respect to , and is the hermitian of the forward operator .
2.3 Deep Learning for Parallel MRI Reconstruction
In the past few years, there has been rapid development of deep learning based approaches to MRI reconstruction [4, 10, 9, 5, 6, 12]. A comprehensive survey of recent developments in using deep learning for parallel MRI reconstruction can be found in . Our work builds upon the Variational Network (VarNet) , which consists of multiple layers, each modeled after a single gradient update step in equation 6. Thus, the -th layer of the VarNet takes as input and computes using:
is a small convolutional neural network that maps complex-valued images to complex-valued images of the same shape. Thevalues as well as the parameters of the s are learned from data. The and operators involve the use of sensitivity maps which are computed using a traditional PI method and fed in as additional inputs. As noted in section 2.1, these sensitivity maps cannot be estimated accurately when the number of auto-calibration lines is small, which is necessary to achieve higher acceleration factors. As a result, the performance of the VarNet degrades significantly at higher accelerations. We alleviate this problem in our model by learning to predict the sensitivity maps from data as part of the network.
3 End-to-End Variational Network
Let be the vector of masked multi-coil k-space data. Similar to the VarNet, our model takes this masked k-space data as input and applies a number of refinement steps of the same form. We refer to each of these steps as a cascade (following ), to avoid overloading the term "layer" which is already heavily used. Unlike the VN, however, our model uses k-space intermediate quantities rather than image-space quantities. We call the resulting method the End-to-End Variational Network or E2E-VarNet.
To simplify notation, we first define two operators: the expand operator () and the reduce operator (). The expand operator () takes the image and sensitivity maps as input and computes the corresponding image seen by each coil in the idealized noise-free case:
where is the sensitivity map of coil . We do not explicitly represent the sensitivity maps as inputs for the sake of readability. The inverse operator, called the reduce operator () combines the individual coil images:
Using the expand and reduce operators, and can be written succinctly as and .
Each cascade in our model applies a refinement step similar to the gradient descent step in equation 7, except that the intermediate quantities are in k-space. Applying to both sides of 7 gives the corresponding update equation in k-space:
where is the refinement module given by:
Here, we use the fact that ). can be any parametric function that takes a complex image as input and maps it to another complex image. Since the CNN is applied after combining all coils into a single complex image, the same network can be used for MRIs with different number of coils. Each cascade applies the function represented by equation 10 to refine the k-space. In our experiments, we use a U-Net  for the .
3.3 Learned sensitivity maps
The expand and reduce operators in equation 11 take sensitivity maps as inputs. In the original VarNet model, these sensitivity maps are computed using the ESPIRiT algorithm  and fed in to the model as additional inputs. In our model, however, we estimate the sensitivity maps as part of the reconstruction network using a Sensitivity Map Estimation (SME) module:
The operator zeros out all lines except for the autocalibration or ACS lines (described in Section 2.1). This is similar to classical parallel imaging approaches which estimate sensitivity maps from the ACS lines. The CNN follows the same architecture as the CNN in the cascades, except with fewer channels and thus fewer parameters in intermediate layers. This CNN is applied to each coil image independently. Finally, the dSS operator normalizes the estimated sensitivity maps to ensure that the property in equation 3 is satisfied.
3.4 E2E-VarNet model architecture
As previously described, our model takes the masked multi-coil k-space as input. First, we apply the SME module to to compute the sensitivity maps. Next we apply a series of cascades, each of which applies the function in equation 10, to the input k-space to obtain the final k-space representation . This final k-space representation is converted to image space by applying an inverse Fourier transform followed by a root-sum-squares (RSS) reduction for each pixel:
where and is the k-space representation for coil . The model is illustrated in figure 1. All of the parameters of the network, including the parameters of the CNN model in SME, the parameters of the CNN in each cascade along with the s, are estimated from the training data by minimizing the structural similarity loss, , where SSIM is the Structural Similarity index  and , are the reconstruction and ground truth images, respectively.
4.1 Experimental setup
We designed and validated our method using the multicoil track of the fastMRI dataset  which is a large and open dataset of knee and brain MRIs. To validate the various design choices we made, we evaluated the following models on the knee dataset:
Variational network  ()
Variational network with the shallow CNNs replaced with U-Nets ()
Similar to , but with k-space intermediate quantities ()
Our proposed end-to-end variational network model ()
The model employs shallow convolutional networks with RBF kernels that have about 150K parameters in total. replaces these shallow networks with U-Nets to ensure a fair comparison with our model. is similar to our proposed model but uses fixed sensitivity maps computed using classical parallel imaging methods. The difference in reconstruction quality between and shows the value of using k-space intermediate quantities for reconstruction, while the difference between and shows the importance of learning sensitivity maps as part of the network. We used the same model architecture and training procedure for the model as in the original VarNet  paper. For each of the other models, we used cascades, containing a total of about 29.5M parameters. The
model contained an additional 0.5M parameters in the SME module, taking the total number of parameters to 30M. We trained these models using the Adam optimizer with a learning rate of 0.0003 for 50 epochs, without using any regularization or data augmentation techniques. We used two types of under-sampling masks:equispaced masks , which sample low-frequency lines from the center of k-space and every -th line from the remaining k-space; and random masks , which sample a fraction of the full width of k-space for the ACS lines in addition to a subset of higher frequency lines, selected uniformly at random, to make the overall acceleration equal to . These random masks are identical to those used in . We also use equispaced masks as they are easier to implement in MRI machines.
Tables 2 and 2 show the results of our experiments for equispaced and random masks respectively, over a range of down-sampling mask parameters. The model outperforms the baseline model by a large margin due to its larger capacity and the multi-scale modeling ability of the U-Nets. outperforms demonstrating the value of using k-space intermediate quantities. outperforms showing the importance of learning sensitivity maps as part of the network. It is worth noting that the relative performance does not depend on the type of mask or the mask parameters. Some example reconstructions are shown in figure 2.
4.2.1 Significance of learning sensitivity maps
Figure 3 shows the SSIM values for each model with various equispaced mask parameters. In all cases, learning the sensitivity maps improves the SSIM score. Notably, this improvement in SSIM is larger when the number of low frequency lines is smaller. As previously stated, the quality of the estimated sensitivity maps tends to be poor when there are few ACS lines, which leads to a degradation in the final reconstruction quality. The model is able to overcome this limitation and generate good reconstructions even with a small number of ACS lines.
4.2.2 Experiments on test data
|Dataset||Model||4 Acceleration||8 Acceleration|
Table 3 shows our results on the test datasets for both the brain and knee MRIs compared with the best models on the fastMRI leaderboard222http://fastmri.org/leaderboards. To obtain these results, we used the same training procedure as our previous experiments, except that we trained on both the training and validation sets for 100 epochs. We used the same type of masks that are used for the fastMRI paper . Our model outperforms all other models published on the fastMRI leaderboard for both anatomies.
In this paper, we introduced End-to-End Variational Networks for multi-coil MRI reconstruction. While MRI reconstruction can be posed as an inverse problem, multi-coil MRI reconstruction is particularly challenging because the forward process (which is determined by the sensitivity maps) is not completely known. We alleviate this problem by estimating the sensitivity maps within the network, and learning fully end-to-end. Further, we explored the architecture space to identify the best neural network layers and intermediate representation for this problem, which allowed our model to obtain new state-of-the art results on both brain and knee MRIs. The quantitative measures we have used only provide a rough estimate for the quality of the reconstructions. Many clinically important details tend to be subtle and limited to small regions of the MR image. Rigorous clinical validation needs to be performed before such methods can be used in clinical practice to ensure that there is no degradation in the quality of diagnosis.
-  (2006) Compressive sampling. In Proceedings of the international congress of mathematicians, Vol. 3, pp. 1433–1452. Cited by: §1.
-  (2006) Compressed sensing. IEEE Transactions on Information Theory 52 (4), pp. 1289–1306. Cited by: §2.2.
-  (2002) Generalized autocalibrating partially parallel acquisitions (GRAPPA). Magnetic Resonance in Medicine 47 (6), pp. 1202–1210. Cited by: §1.
-  (2018) Learning a variational network for reconstruction of accelerated MRI data. Magnetic Resonance in Medicine. Cited by: §1, §2.3, item 1, §4.1, §4.2.2.
-  (2019) Deep learning methods for parallel magnetic resonance image reconstruction. CoRR abs/1904.01112. Cited by: §2.3.
-  (2019) Deep mri reconstruction: unrolled optimization algorithms meet neural networks. arXiv preprint arXiv:1907.11711. Cited by: §1, §2.3.
-  (2007) Sparse MRI: The Application of Compressed Sensing for Rapid MR Imaging. Magnetic Resonance in Medicine 58 (6). Cited by: §1.
-  (1999) SENSE: sensitivity encoding for fast MRI. Magnetic resonance in medicine 42 (5). Cited by: §1.
-  (2019) I-rim applied to the fastmri challenge. arXiv preprint arXiv:1910.08952. Cited by: §1, §2.3.
-  (2019) Invert to learn to invert. In Advances in Neural Information Processing Systems, pp. 444–454. Cited by: §1, §2.3.
-  (2015) U-Net: convolutional networks for biomedical image segmentation. In MICCAI 2015: Medical Image Computing and Computer-Assisted Intervention, Lecture Notes in Computer Science, Vol. 9351, pp. 234–241. Cited by: §3.2.
-  (2018) A deep cascade of convolutional neural networks for dynamic MR image reconstruction. IEEE Transactions on medical imaging 37 (2). Cited by: §2.3, §3.
-  (1997) Simultaneous acquisition of spatial harmonics (SMASH): fast imaging with radiofrequency coil arrays. Magnetic resonance in medicine 38 (4). Cited by: §1.
-  (2019) GrappaNet: combining parallel imaging with deep learning for multi-coil mri reconstruction. arXiv preprint arXiv:1910.12325. Cited by: §6.1.
ESPIRiT -an eigenvalue approach to autocalibrating parallel MRI: where SENSE meets GRAPPA. Magnetic resonance in medicine 71 (3). Cited by: §2.1, §3.3.
-  (2004) Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13 (4), pp. 600–612. Cited by: §6.1.
-  (2003) Multiscale structural similarity for image quality assessment. In Asilomar Conference on Signals, Systems & Computers, Cited by: §3.4.
-  (2018) FastMRI: an open dataset and benchmarks for accelerated MRI. CoRR abs/1811.08839. External Links: Cited by: End-to-End Variational Networks for Accelerated MRI Reconstruction, §4.1.
6 Supplementary Materials
6.1 Dithering as post-processing
The Structural Similarity (SSIM) loss 
we used to train our models has a tendency to produce overly smooth reconstructions even when all of the diagnostic content is preserved. We noticed a similar behavior with other frequently used loss functions like mean squared error, mean absolute error, etc. Sriram et al. found that dithering the image by adding a small amount of random gaussian noise helped enhance the perceived sharpness of their reconstructions. We found that the same kind of dithering helped improve the sharpness of our reconstructions, but we tuned the scale of the noise by manual inspection. Similar to 
, we adjusted the scale of the added noise to the brightness of the image around each pixel to avoid obscuring dark areas of the reconstruction. Specifically, we first normalize the image by dividing each pixel by the maximum pixel intensity. Then we add zero-mean random gaussian noise to each pixel. The standard deviation of the noise at a given pixel location is equal totimes the square root of the local median computed over a patch of pixels around that pixel location. We set for the brain images and non fat suppressed knee images, and for the fat suppressed knee images. Example reconstructions with and without noise are shown in creftypeplural 7654. The dithered images look more natural, especially at higher accelerations.