I Introduction
Dynamic MR imaging is a noninvasive imaging technique which could provide both spatial and temporal information for the underlying anatomy. Nevertheless, both physiological and hardware constraints have made it suffer from slow imaging speed or long imaging time, which may lead to patients’ discomfort or sometimes cause severe motion artifacts. Therefore, it is of great necessity to accelerate MR imaging.
To accelerate dynamic MR scan, there have been three main directions of efforts, namely in developing physics based fast imaging sequences [1], hardware based parallel imaging techniques [2] and signal processing based MR image reconstruction methods from incomplete kspace data. Our specific focus here is the undersampled MR image reconstruction, which requires prior information to solve the aliasing artifacts caused by the violation of the Nyquist sampling theorem. Specifically, the reconstruction task is normally formulated as solving an optimization problem with two terms i.e. data fidelity and prior regularization. Popular prior information includes sparsity, which prompts image to be sparsely represented in a certain transform domain while being reconstructed from incoherently undersampled kspace data. These techniques are wellknown as compressed sensing MRI (CSMRI) [3, 4]. For example, kt FOCUSS [5] is asymptotically optimal from the compressed sensing theory by using a FOCUSS algorithm with a random kt sampling pattern. It encompasses the celebrated kt BLAST and kt SENSE [6] as special cases. Liang et al [7] propose a kt iterative support detection (kt ISD) method to improve the CS reconstruction for dynamic MR. Besides, lowrank is also a prior regularization. It can use lowrank and incoherence conditions to complete missing or corrupted entries of a matrix. A typical example on lowrank is L+S [8], which takes the temporal frame as each column of the spacetime matrix for dynamic MRI. And kt SLR [9] exploits the correlations in the dynamic imaging dataset by modeling the data to have a compact representation in the Karhunen Louve transform (KLT) domain. Dictionary learning (DL) has also been investigated. DLTG [10] can learn redundancy in the data via an auxiliary constraint on temporal gradients (TG) sparsity. Wang et al [11] employs a patchbased 3D spatiotemporal dictionary for sparse representations of dynamic image sequence. These methods have made great progresses in dynamic imaging and achieved improved results. Nevertheless, these methods only draw prior knowledge from limited samples. Furthermore, the reconstruction is iterative and sometimes timeconsuming.
On the other hand, deep learning has shown great potential in accelerating MR imaging. There have been quite a few newly proposed methods, which can be roughly categorized into two types, modelbased unrolling methods [12, 13, 14] and endtoend learning methods [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]. The model based unrolling methods are to formulate the iterative procedure of traditional optimization algorithms to network learning. They adaptively learn all the parameters of regularization terms and transforms in the model by network training. For example, in VNNet [13], generalized compressed sensing reconstruction formulated as a variational model is embedded in an unrolled gradient descent scheme. ADMMNet [12] is defined over a data flow graph, which is derived from the iterative procedures in Alternating Direction Method of Multipliers (ADMM) algorithm for optimizing a CSbased MRI model. The other type utilizes the big data information to learn a network that map between the undersampled and fully sampled data pairs. Wang et al. [15]
train a deep convolutional neural network (CNN) to learn the mapping relationship between undersampled brain MR images and fully sampled brain MR images. AUTOMAP
[18] learns a mapping between the sensor and the image domain from an appropriate training data. Despite all the successes, there are only two works that specifically apply to dynamic MR imaging [22, 23]. Both of these two works use a cascade of neural networks to learn the mapping between undersampled and fully sampled cardiac MR images. For example, a deep cascade of convolutional neural networks (DCCNN) [22]is designed for dynamic MR reconstruction. Both works make great contributions to dynamic MR imaging. Nevertheless, the reconstruction results can still be improved based on two observations. Firstly, they have adopted a single supervised loss function which only considers the fidelity between the final output and the ground truth. The intermediate results have not been utilized for the supervision. Furthermore, only a dataconsistency layer is considered for the direct kspace correction. Previous research has shown that the combination of kspace domain networks and image domain networks is superior to singledomain CNN
[19]. There are still more valuable prior knowledge regarding kspace and different levels of reconstruction to be utilized for accurate MR image reconstruction.In this work, we propose a DynamIc MR imaging method with both kspacE aNd Spatial prior knowledge integrated via multIsupervised netwOrk traiNing, dubbed as DIMENSION. The improvements are mainly reflected in the crossdomain network structures and the multisupervised loss function strategy. Our contributions could be summarized as follows:

In the present study, a dynamic MR imaging method with both kspace and spatial prior knowledge integrated via multisupervised network training is proposed, which can combine frequential domain and spatial domain information sufficiently.

We propose a multisupervised loss function strategy, which can constrain the frequency domain information and reconstruction results at different levels. Such loss function strategy is designed to get better network predicted kspace with the frequency domain learning, and can also prompt the reconstruction results at different levels in the spatial domain learning to be closer to the fully sampled MR images.

Experimental results show that the proposed method is superior to conventional CSbased methods such as kt FOCUSS, kt SLR and L+S, as well as the stateoftheart CNNbased method, DCCNN. These demonstrate the effectiveness of the crossdomain learning and the multisupervised loss function strategy in cardiac MR imaging.
Ii Methodology
Iia CSMRI and CNNMRI
According to compressed sensing (CS) [3, 4], MR images with a sparse representation in some transform domain can be reconstructed from randomly undersampled kspace data. Let represent a complexvalued dynamic MR image. The problem can be described by the following formula:
(1) 
where is the undersampled measurements in kspace and the unsampled points are filled with zeros. is an undersampled Fourier encoding matrix, and is the acquisition noise. We want to reconstruct S by solving the inverse problem of Eq. 1. However, the inverse problem is illposed, resulting in that the reconstruction is not unique. In order to reconstruct S, we constrain this inverse problem by adding some prior knowledge and solve the following optimization problem:
(2) 
The first term is the data fidelity, which ensures that the kspace of reconstruction is consistent with the actual measurements in kspace. The second term is often referred to as the prior regularization. In the methods of CS, is usually a sparse prior of S in some transform domains, e.g. finite difference, wavelet transform and discrete cosine transformation.
In CNNbased methods, is a CNN prior of S , which force S to match the output of the networks:
(3) 
where is the undersampled image and is the output of the networks under the parameters . The training process of the networks is to find the optimal parameters . Once the network are trained, the networks’ output is the reconstruction we want.
IiB The Proposed Method
IiB1 The Proposed DIMENSION Network
In this work, we propose a convolutional neural network termed as DIMENSION for cardiac MR images reconstruction shown in Fig.1
. The DIMENSION network consists of two main parts: a frequency domain network for updating the kspace with its network prediction termed as FDN and a spatial domain network term as SDN, which is used to extract highlevel features of images. The FDN and the SDN are connected by a Fourier inversion (see Inverse Fast Fourier Transform (IFFT) in Fig.
1).Specifically, the FDN consists of frequency domain blocks . Each block contains 3D convolutional layers and a kspace domain data consistency (KDC) layer. The forwardpass starts from the undersampled kspace . The forwardpass equations of the first block in the FDN could be described as:
(4) 
The forwardpass equations of the later blocks in the FDN could be described as:
(5) 
where the KDC is defined as the following equation:
(6) 
, are the th convolution filters, biases in the th block of the FDN respectively with . denotes the output of the th convolutional layer in the
th block in the FDN. Each convolutional layer is followed by an activation function
for nonlinearity except for the last layer, which projects the extracted features to the kspace domain. After the convolution operation, the kspace domain data consistency is utilized to correct network predicted kspace with the actual sampled kspace as shown in Eq. 6, where denotes the correction of . The set consisting of indices of sampled in kspace is defined as . If kspace indices is in set , the would be corrected with the actual sampled kspace . is used to control the degree of the data consistency. If , we replace the unsampled points directly with the actual sampled points.The final output of the FDN is . Then the inverse Fourier transform of the is performed to obtain the MR image, which is also the input of the SDN:
(7) 
The SDN consists of imagedomain blocks , each of which contains
3D convolutional layers, a residual connection
[26] and an imagedomain data consistency layer (IDC). The forwardpass equations of the first block in the SDN are:(8) 
The forwardpass equations of the later blocks in the SDN could be described as:
(9) 
where the IDC is defined as the following equations:
(10) 
, are the th convolution filters, biases in the th block of the SDN respectively with . denotes the output of the th convolutional layer in the th block. Like FDN, each convolutional layer is nonlinear with the activation function except for the last convolutional layer, which projects the extracted spatial domain features to the image domain. After convolution operation, a residual connection is followed, which sums the output of each block with its input. is the result of the residual learning. Then the imagedomain data consistency (IDC) is performed on to obtain the corrected as shown in Eq. 10. There are three steps, which respectively represents the fast Fourier transform (FFT), kspace correction, and inverse fast Fourier transform (IFFT). It’s noticeable that for the KDC layer, no frequency domain and spatial domain transformations are necessary, because the output is already a prediction of kspace. Through the above processes, we can get the reconstruction results of the different blocks. come from different network depths. So the reconstruction results are at different levels. For example, comes from the shallowest block, so the reconstruction level is the lowest. is the final output of the SDN, which is at the highest reconstruction level.
In [22], a DCCNN model has been proposed to reconstruct undersampled cardiac MR images. The DCCNN model consists of imagedomain blocks, each of which contains convolutional layers, a residual connection and an imagedomain data consistency layer (IDC). Despite favorable reconstruction quality has been achieved, there are still more valuable prior knowledge regarding kspace can be learned and utilized for improving MR image reconstruction. The DIMENSION model introduces kspace learning into the networks, so that the networks can not only extract features in the spatial domain, but also make better use of kspace prior knowledge.
IiB2 The Proposed MultiSupervised Loss
In this work, we introduce a multisupervised loss function strategy. The multisupervised loss functions are composed of primary loss, kspace loss and spatial loss shown in Fig. 2. In the usual CNNMRI models, only the primary loss exists, which measures the distance between the reconstruction and the ground truth. In this work, the kspace loss and the spatial loss are proposed to constrain the frequency domain information and reconstruction results at different levels. Let be the fully sampled kspace. The kspace loss can be expressed as the formula:
(11) 
And the spatial loss can be expressed as the formula:
(12) 
where and are the respective nonnegative weights of each loss. The primary loss is the mean square error (MSE) between the final reconstruction () and corresponding fully sampled image S:
(13) 
Finally, the total training loss functions consist of these three terms:
(14) 
The weights in Eq. 14 will be discussed in the later section. To facilitate the better understanding of this model, we provide an intuitive explanation of the kspace loss and the spatial loss.
The kspace loss: The kspace points can be divided into two subsets: if a point has already been sampled, is a member of set , otherwise
. The purpose of the FDN is to update the kspace with its network prediction. The quality of prediction directly affects the image domain feature extraction in the SDN and the final reconstruction. Therefore, it is necessary to enhance the fidelity between the network predicted kspace and the fully sampled kspace. The combination of the KDC and the kspace loss can improve the quality of the predicted kspace. If
, the KDC layer is good at ensuring that the predicted kspace is consistent with the actual sampled kspace. If , the kspace loss can make the unsampled kspace as close as possible to the fully sampled kspace.The spatial loss: the outputs of the spatial blocks from different network depths can be regarded as the reconstruction at different levels. Then the final output of the entire network can be seen as the final level reconstruction. However, previous studies only make constraints on the final level reconstruction and use them as the supervision of the entire networks. Those approaches do not take full advantage of reconstruction information at other levels. Here, we put forward a spatial loss, which can constrain the reconstruction at different levels to be closer to the ground truths. As can be seen in Eq. 14, reconstruction at each level contributes to the final result.
Iii Experimental Results
Iiia Setup
IiiA1 Data acquisition
We collected 101 fully sampled cardiac MR data using 3T scanner (SIMENS MAGNETOM Trio) with T1weighted FLASH sequence. Written informed consent was obtained from all human subjects. Each scan contains a single slice FLASH acquisition with 25 temporal frames. The following parameters were used for FLASH scans: FOV mm, acquisition matrix , slice thickness = 6 mm, TR = 3 ms, TE = 50 ms and 24 receiving coils. The raw multicoil data of each frame was combined by adaptive coil combine method [27] to produce a singlechannel complexvalued image. We randomly selected 90% of the entire dataset for training and 10% for testing. Deep learning has a high demand for data volume [28]. Therefore, some data augmentation strategies have been applied. We shear the original images along the and direction. The sheared size is
, and the stride along the three directions is 7, 7 and 5 respectively. Finally, we obtained 17500 3D complexvalued cardiac MR data with the size of
.For each frame, the original kspace was retrospectively undersampled with 6 ACS lines. Specifically, we fully samples frequencyencodes (along ) and randomly undersamples the phase encodes (along ) according to a zeromean Gaussian variable density function [5].
IiiA2 Network training
For network training, we divide each data into two channels, where the channels store real and imaginary parts of the data. So the inputs of the network are undersampled kspaces and the outputs are reconstruction images . In this work, we focus on a D5C5 model, which works pretty well for the DCCNN model. The D5C5 model consists of five blocks (C5) and each block has five convolutional layers (D5). In order to simplify the parameters and make a fair comparison with the D5C5 model, the FDN contains one frequency domain block () and the SDN consists of four imagedomain blocks (). Every block contains five convolutional layers (). Therefore, both the proposed model and the D5C5 model have 25 convolutional layers in total. The details of the FDN and the SDN are shown in Table. I and II respectively. He initialization [29]
was used to initialize the network weights. Rectifier Linear Units (ReLU)
[30] were selected as the nonlinear activation functions. The minibatch size was 20. The exponential decay learning rate [31] was used in all CNNbased experiments and the initial learning rate was set to 0.0001 with a decay of 0.95. All the models were trained by the Adam optimizer [32] with parameters and .Layer of K5  Input size  Number of filter  Filter size  Stride  Activation  Output 
Complex conv1  117*120*6*2  64  3*3*3  [1, 1, 1]  ReLU  117*120*6*64 
Complex conv2  117*120*6*64  64  3*3*3  [1, 1, 1]  ReLU  117*120*6*64 
Complex conv3  117*120*6*64  64  3*3*3  [1, 1, 1]  ReLU  117*120*6*64 
Complex conv4  117*120*6*64  64  3*3*3  [1, 1, 1]  ReLU  117*120*6*64 
Complex conv5  117*120*6*64  2  3*3*3  [1, 1, 1]  None  117*120*6*2 
Kspace data consistency  117*120*6*2  /  /  /  /  117*120*6*2 
Layer of D5C4  Input size  Number of filter  Filter size  Stride  Activation  Output 
Complex conv1  117*120*6*2  64  3*3*3  [1, 1, 1]  ReLU  117*120*6*64 
Complex conv2  117*120*6*64  64  3*3*3  [1, 1, 1]  ReLU  117*120*6*64 
Complex conv3  117*120*6*64  64  3*3*3  [1, 1, 1]  ReLU  117*120*6*64 
Complex conv4  117*120*6*64  64  3*3*3  [1, 1, 1]  ReLU  117*120*6*64 
Complex conv5  117*120*6*64  2  3*3*3  [1, 1, 1]  None  117*120*6*2 
Residual  117*120*6*2  /  /  /  /  117*120*6*2 
Kspace data consistency  117*120*6*2  /  /  /  /  117*120*6*2 
The models were implemented on an Ubuntu 16.04 LTS (64bit) operating system equipped with an Intel Xeon E52640 Central Processing Unit (CPU) and Tesla TITAN Xp Graphics Processing Unit (GPU, 12GB memory) in the open framework Tensorflow
[33] with CUDA and CUDNN support.IiiA3 Performance evaluation
For a quantitative evaluation, mean square error (MSE), peak signal to noise ratio (PSNR) and structural similarity index (SSIM) [34] were measured as follows:
(15) 
(16) 
(17) 
where is the reconstructed image, denotes the reference image and is the total number of image pixels. The SSIM index is a multiplicative combination of the luminance term, the contrast term, and the structural term (details shown in [34]).
IiiB Does the Frequency Domain Network Work?
To demonstrate the efficacy of the FDN, we compare the DIMENSION model with the stateoftheart CNN method D5C5 [22] and set the weights of the kspace loss and the spatial loss to be zeros. So the total loss function in this section is:
(18) 
The two models have the same amount of network parameters. And for fair comparisons, the networks’ hyperparameters are also set to be the same. The reconstruction results of the D5C5 and the DIMENSION models on the test datasets are shown in Fig.
3.The display ranges for the error maps are . At 4fold acceleration, one can see that the DIMENSION reconstruction outperforms the D5C5 reconstruction in terms of artifacts removal and detail preservation (see the arrows in Fig. 3 (e) and (h)). To explore the reconstructed results in the direction of temporal, we show the slices along and temporal dimensions (). From the images (Fig. 3 (f)), we can see that the reconstruction of the DIMENSION model has a smaller error map, which is consistent with the above conclusion. We also show the quantitative evaluations of the DIMENSION model and the D5C5 model in Table III.
Models  MSE  PSNR  SSIM 

Zerofilling  
D5C5  
DIMENSION  0.000176  37.5857  0.9846 

The bolder ones mean better.
One can see that the DIMENSION model achieves optimal quantitative evaluations over the D5C5 model (0.000128 lower in MSE, 2.4163dB higher in PSNR and 0.071 higher in SSIM). So the DIMENSION model is superior to the D5C5 model in both visual results and quantitative indicators.
This indicates the DIMENSION model could effectively learn crossdomain information and improve MR reconstruction by using both kspace and spatial prior knowledge. Subsequent experiments are all constructed based on the DIMENSION model.
IiiC Does the Kspace Loss Work?
This section mainly explores whether the kspace loss can improve the reconstruction results. The purpose of introducing the kspace loss is to further constrain FDN to get a better network predicted kspace. The kspace loss is shown in Eq. 11. We select the MSE between predicted kspace and fully sampled kspace as the kspace loss and the total loss function in this section is shown below:
(19) 
where is a hyperparameter. We choose here and refer to the DIMENSION model that introduces kspace loss as DIMENSIONKLoss. The comparison results of the DIMENSION model and the DIMENSIONKLoss model are shown in Fig. 4.
Obviously, the DIMENSIONKLoss model gets better results, especially in removing artifacts. We can obtain the same conclusions from the yt images (Fig. 4 (f)). The quantitative measurements can be found in Table IV.
Models  MSE  PSNR  SSIM 

Zerofilling  
DIMENSION  
DIMENSIONKLoss  0.000061  41.8101  0.9939 

The bolder ones mean better.
Introducing the kspace loss can improve the MSE, PSNR and SSIM (0.000044 lower in MSE, 2.3390dB higher in PSNR and 0.0029 higher in SSIM). Therefore, we can see that the kspace loss can effectively improve the cardiac MR reconstruction. With the joint action of KDC and kspace loss, more accurate predicted kspace can be obtained.
IiiD Does the Spatial Loss Work?
In this section, we demonstrate whether the spatial loss can improve the MR reconstruction. The spatial loss (in Eq. 12) is to constrain the reconstruction results of different blocks with individual weights. We can regard the outputs of the shallow blocks as the preliminary reconstruction, and the output of the last block as the final reconstruction. To demonstrate the effectiveness of the spatial loss, we use the combination of the primary loss and the spatial loss as the total loss in this section:
(20) 
Here, we choose and refer to the DIMENSION model with the spatial loss as DIMENSIONSLoss. Fig. 5 shows the reconstruction results of the DIMENSION model and the DIMENSIONSLoss model.
We can clearly see that the DIMENSIONSLoss model has fewer artifacts and retains details better from the error maps (as shown by the red and yellow arrows in Fig. 5 (e, h, f5, f7).
Models  MSE  PSNR  SSIM 

Zerofilling  
DIMENSION  
DIMENSIONSLoss  0.000214  36.6892  0.9764 

The bolder ones mean better.
The quantitative measurements can be found in Table V, from which we can see that the spatial loss can effectively improve the MSE, PSNR and SSIM (0.000047 lower in MSE, 0.8588dB higher in PSNR and 0.0026 higher in SSIM). Therefore, we can see that the spatial loss can further improve the cardiac MR reconstruction.
IiiE Comparison to the Stateoftheart Methods
Based on the above sections, the network we used is the DIMENSION model (as shown in Fig. 1) and the multisupervised loss we selected is
(21) 
In this section, we will demonstrate the effectiveness of the proposed method by comparing it with the traditional CS (kt FOCUSS [5], kt SLR [9], S+L [8]) methods and the stateoftheart CNN method (D5C5 [22]). All the CSbased methods select their singlechannel versions. For fair comparisons, we adjust the parameters of the CSMRI methods to their best performances. The reconstruction results at 4fold acceleration from these methods are shown in Fig. 6.
The kt SLR removes artifacts better than the kt FOCUSS and S+L. However, these three CSbased methods lose more structural details than the CNNbased methods. Compared with the D5C5 method, the proposed method can not only retain more details, but also remove more artifacts.
We show the evaluation indexes in Table VI.
Models  MSE  PSNR  SSIM 

Zerofilling  
kt FOCUSS  
kt SLR  
S+L  
D5C5  
The proposed  0.000086  40.6256  0.9655 

The bolder ones mean better.
Note that the CNNbased methods outperform the CSbased methods in all three performance indexes. We observe the proposed DIMENSION model achieves the optimal performance in MSE, PSNR and SSIM indexes among all the methods. The reconstruction time of different methods is shown in Table VII. And for fair comparsisons, all methods are implemented on the same CPU Intel Xeon E52640. It can be seen that the reconstruction time of the CNNbased methods is significantly shorter than the CSbased methods.
Methods  kt SLR  L+S  D5C5  The proposed 

Running time/s  4.8845 

The bolder ones mean better.
At 8fold acceleration, the reconstruction results of different methods are shown in Fig. 7. At high acceleration factors, the singlechannel CSbased methods have difficulties in reconstructing high quality images. However, the CNNbased methods can still get better reconstruction results. Compared with the D5C5 method, the proposed method can not only retain the details, but also remove the artifacts better. The improvements of CNNbased methods in reconstruction are more obvious at high acceleration factors. The evaluation indexes are shown in Table. VIII. We observe that the proposed method achieves the optimal performances in MSE, PSNR and SSIM indexes among all the methods.
Models  MSE  PSNR  SSIM 

Zerofilling  
kt FOCUSS  
kt SLR  
D5C5  
DIMENSION  0.000381  32.7019  0.9690 

The bolder ones mean better.
Iv Discussion
Iva Hyperparameters Selection
The weights selection of the kspace loss and the spatial loss influence the reconstruction [28]. In this section, we will discussion the selections of and .
IvA1 The Selection of
In order to find an appropriate , we trained a series of DIMENSION models with different and then tested them on the same test set. The average MSE, PSNR, SSIM on the test set are shown in Fig. 8.
We can observe that when , the DIMENSION model gets the lowest MSE, the highest PSNR and the highest SSIM. So we pick as the appropriate value for .
IvA2 The Selection of
There are two ways to select : the one is to take the same values for these three weights. And the other is to take increasing or decreasing values for these three weights.
Firstly, we consider the case when each takes the same value. We trained a series of DIMENSION models, where , in turn, equal to . Then all these models were tested on the same test set. The quantitative results are shown in Fig. 9.
We can see that has the best performance on the test set with the lowest MSE, the highest PSNR and the highest SSIM. We refer to the DIMENSION model with as DIMENSIONSLoss1.
Secondly, we consider the case when these three weights take increasing or decreasing values. The values of are shown in Table IX.
Case 1  Case 2  Case 3  Case 4  Case 5  Case 6  Case 7  Case 8  Case 9  Case 10  

We have considered a total of 10 cases, where are increasing in the first five and decreasing in the last five. The quantitative results are shown in Fig. 10.
Obviously, we find that the ninth case gets the best quantization results, where MSE is the lowest, PSNR and SSIM are the highest. So, we choose as our preferred weights. We refer to the DIMENSION model with as DIMENSIONSLoss2.
Then, it is natural to think about which of these two ways to select works better. The quantitative and reconstruction results of the DIMENSIONSLoss1 model and the DIMENSIONSLoss2 model are respectively shown in Fig. 11
and Table X.
Models  MSE  PSNR  SSIM 

Zerofilling  
DIMENSIONSLoss1  0.000083  40.8160  0.9918 
DIMENSIONSLoss2 

The bolder ones mean better.
The two models have similar reconstructed results, but the quantization results of DIMENSIONSLoss1 are slightly improved. So we choose DIMENSIONSLoss1 as the final DIMENSIONSLoss model.
IvB The Limitations of the Proposed Work
Although our proposed method achieves the improved reconstruction results for dynamic MR imaging compared to other methods, there is still a certain degree of smooth in the reconstructed images at high acceleration factors. Part of the reasons may be the loss functions used in this work. The MSE loss functions only indicate the mean square information between the reconstructed image and the ground truth and cannot perceive the image structure information. DAGAN [35] couples an adversarial loss with an innovative content loss to reconstruct CSMRI, which could preserve perceptual image details. This inspired us to use different loss functions related to structural information in the future works. Furthemore, network structures could have other options to improve the reconstruction, which are going to be explored and investigated. For example, Dense networks [36] may be utilized to make full use of hierarchical features.
V Conclusion and Outlook
In this work, we propose a dynamic MR imaging method with both kspace and spatial prior knowledge integrated via multisupervised network training, dubbed as DIMENSION. Our contributions are mainly reflected in the crossdomain network structures and the multisupervised loss function strategy. The crossdomain network structures help us to make better use of both frequential domain and spatial domain prior knowledge. The multisupervised loss function strategy avoids the singlesupervised training and provides different levels of supervision for the entire network. Specifically, we introduce the kspace loss, which can force the predicted kspace from the frequency domain network to be as close as possible to the fully sampled kspace. We also introduce the spatial loss into the spatial domain network, which can constrain the reconstruction results at different levels. We compared the proposed approach with kt FOCUSS, kt SLR, L+S and the stateoftheart CNNbased method. Experimental results show that the proposed network structure and loss function strategy can improve dynamic MR reconstruction accuracy with shorter time. In future works, perceived loss functions and more advanced network structures may be studied and used.
References
 [1] W. Kaiser and E. Zeitler, “MR imaging of the breast: fast imaging sequences with and without GdDTPA. preliminary observations.” Radiology, vol. 170, no. 3, pp. 681–686, 1989.
 [2] D. K. Sodickson and W. J. Manning, “Simultaneous acquisition of spatial harmonics (SMASH): fast imaging with radiofrequency coil arrays,” Magnetic Resonance in Medicine, vol. 38, no. 4, pp. 591–603, 1997.
 [3] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, 2006.
 [4] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” Magnetic Resonance in Medicine, vol. 58, no. 6, pp. 1182–1195, 2007.
 [5] H. Jung, J. C. Ye, and E. Y. Kim, “Improved k–t BLAST and k–t SENSE using FOCUSS,” Physics in Medicine & Biology, vol. 52, no. 11, p. 3201, 2007.
 [6] J. Tsao, P. Boesiger, and K. P. Pruessmann, “kt BLAST and kt SENSE: dynamic MRI with high frame rate exploiting spatiotemporal correlations,” Magnetic Resonance in Medicine, vol. 50, no. 5, pp. 1031–1042, 2003.
 [7] D. Liang, E. V. DiBella, R.R. Chen, and L. Ying, “kt ISD: dynamic cardiac MR imaging using compressed sensing with iterative support detection,” Magnetic Resonance in Medicine, vol. 68, no. 1, pp. 41–53, 2012.
 [8] R. Otazo, E. Candès, and D. K. Sodickson, “Lowrank plus sparse matrix decomposition for accelerated dynamic MRI with separation of background and dynamic components,” Magnetic Resonance in Medicine, vol. 73, no. 3, pp. 1125–1136, 2015.
 [9] S. G. Lingala, Y. Hu, E. DiBella, and M. Jacob, “Accelerated dynamic MRI exploiting sparsity and lowrank structure: kt SLR,” IEEE Transactions on Medical Imaging, vol. 30, no. 5, pp. 1042–1054, 2011.
 [10] J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic MR data reconstruction,” IEEE Transactions on Medical Imaging, vol. 33, no. 4, pp. 979–994, 2014.
 [11] Y. Wang and L. Ying, “Compressed sensing dynamic cardiac cine MRI using learned spatiotemporal dictionary.” IEEE Trans. Biomed. Engineering, vol. 61, no. 4, pp. 1109–1120, 2014.
 [12] J. Sun, H. Li, Z. Xu et al., “Deep ADMMNet for compressive sensing MRI,” in Advances in Neural Information Processing Systems, 2016, pp. 10–18.
 [13] K. Hammernik, T. Klatzer, E. Kobler, M. P. Recht, D. K. Sodickson, T. Pock, and F. Knoll, “Learning a variational network for reconstruction of accelerated MRI data,” Magnetic Resonance in Medicine, vol. 79, no. 6, pp. 3055–3071, 2018.

[14]
F. Knoll, K. Hammernik, E. Kobler, T. Pock, M. P. Recht, and D. K. Sodickson, “Assessment of the generalization of learned image reconstruction and the potential for transfer learning,”
Magnetic Resonance in Medicine, 2018.  [15] S. Wang, Z. Su, L. Ying, X. Peng, S. Zhu, F. Liang, D. Feng, and D. Liang, “Accelerating magnetic resonance imaging via deep learning,” in Biomedical Imaging (ISBI), 2016 IEEE 13th International Symposium on. IEEE, 2016, pp. 514–517.

[16]
K. Kwon, D. Kim, and H. Park, “A parallel MR imaging method using multilayer perceptron,”
Medical Physics, vol. 44, no. 12, pp. 6209–6224, 2017.  [17] Y. Han, J. Yoo, H. H. Kim, H. J. Shin, K. Sung, and J. C. Ye, “Deep learning with domain adaptation for accelerated projectionreconstruction MR,” Magnetic Resonance in Medicine, vol. 80, no. 3, pp. 1189–1205, 2018.
 [18] B. Zhu, J. Z. Liu, S. F. Cauley, B. R. Rosen, and M. S. Rosen, “Image reconstruction by domaintransform manifold learning,” Nature, vol. 555, no. 7697, p. 487, 2018.
 [19] T. Eo, Y. Jun, T. Kim, J. Jang, H.J. Lee, and D. Hwang, “KIKInet: crossdomain convolutional neural networks for reconstructing undersampled magnetic resonance images,” Magnetic Resonance in Medicine, 2018.

[20]
L. Sun, Z. Fan, Y. Huang, X. Ding, and J. Paisley, “Compressed sensing MRI
using a recursive dilated network,”
The Association for the Advance of Artificial Intelligence
, 2018.  [21] T. M. Quan, T. NguyenDuc, and W.K. Jeong, “Compressed sensing MRI reconstruction using a generative adversarial network with a cyclic loss,” IEEE Transactions on Medical Imaging, vol. 37, no. 6, pp. 1488–1497, 2018.
 [22] J. Schlemper, J. Caballero, J. V. Hajnal, A. N. Price, and D. Rueckert, “A deep cascade of convolutional neural networks for dynamic MR image reconstruction,” IEEE Transactions on Medical Imaging, vol. 37, no. 2, pp. 491–503, 2018.

[23]
C. Qin, J. V. Hajnal, D. Rueckert, J. Schlemper, J. Caballero, and A. N. Price, “Convolutional recurrent neural networks for dynamic MR image reconstruction,”
IEEE Transactions on Medical Imaging, 2018. 
[24]
G. Wang, J. C. Ye, K. Mueller, and J. A. Fessler, “Image reconstruction is a new frontier of machine learning,”
IEEE Transactions on Medical Imaging, vol. 37, no. 6, pp. 1289–1296, 2018.  [25] H. K. Aggarwal, M. P. Mani, and M. Jacob, “MoDL: Model based deep learning architecture for inverse problems,” IEEE Transactions on Medical Imaging, 2018.

[26]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 2016, pp. 770–778.  [27] D. O. Walsh, A. F. Gmitro, and M. W. Marcellin, “Adaptive reconstruction of phased array MR imagery,” Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine, vol. 43, no. 5, pp. 682–690, 2000.
 [28] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, p. 436, 2015.

[29]
K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification,” in
Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1026–1034.  [30] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proceedings of The Fourteenth International Conference on Artificial Intelligence and Statistics, 2011, pp. 315–323.
 [31] M. D. Zeiler, “ADADELTA: an adaptive learning rate method,” arXiv preprint arXiv:1212.5701, 2012.
 [32] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
 [33] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al., “Tensorflow: a system for largescale machine learning.” in OSDI, vol. 16, 2016, pp. 265–283.
 [34] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
 [35] G. Yang, S. Yu, H. Dong, G. Slabaugh, P. L. Dragotti, X. Ye, F. Liu, S. Arridge, J. Keegan, Y. Guo et al., “DAGAN: Deep dealiasing generative adversarial networks for fast compressed sensing MRI reconstruction,” IEEE Transactions on Medical Imaging, vol. 37, no. 6, pp. 1310–1321, 2018.

[36]
Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image superresolution,” in
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.