Magnetic resonance imaging (MRI) is a frequently used medical imaging modality in clinical practice as it proves to be an excellent non-invasive source for revealing structural as well as anatomical information. A major shortcoming of the MRI acquisition process is its considerably long scan time. This is due to sequential acquisition of large volumes of data, not in the image domain but in k-space, i.e. Fourier domain. Such a prolonged scanning time can cause significant artefacts because of physiological motion, movement of patient during the scan. It may also hinder the use of MRI in time-critical diagnosis.
A possible way to speed up the imaging process is by parallel imaging techniques , . However, the number and arrangement of the receiver coils severely impact the acceleration factor for such techniques, and could possibly introduce imaging artefacts. Another possible way for fast acquistion is leveraging compressive sensing (CS)  theory based undersampling, which is used very often for this purpose. However, doing this renders the inverse problem ill-posed, making the recovery of high-quality MR images extremely challenging. Moreover, the presence of noise during the acquisition may also severely impact the reconstruction quality.
Conventional approaches for CS-MRI reconstruction focus extensively on the usage of sparse representations to assume prior knowledge on the structure of the MR image to be reconstructed. Sparse representations can be explored by the use of predefined transforms 
such as total variation, discrete cosine transform, discrete wavelet transform, discrete Fourier transform, etc. Alternatively, dictionary learning based methods learn sparse representations from the subspace spanned by the data. Both these types of approaches suffer from long computation time due to the iterative nature of the optimization processes. Moreover, the universally applicable sparsifying transforms might find it difficult to completely capture the fine details as observed in biological tissues .
Deep learning frameworks have enjoyed great success in similar inverse problems such as single-image super resolution , denoising , etc., where the methods try to recover missing information from incomplete/ noisy data. Yang et al.  used the alternating direction method of multipliers (ADMM) algorithm  to train their deep network for CS-MRI reconstruction. With the recent advancements of generative adversarial networks (GANs) [10, 18], CS-MRI reconstruction problems are also addressed using adversarial learning framework. In a recent study, Bora et al. 
have shown that pretrained generative models like varitional autoencoders and GANs can be used for recovery of CS signal without making use of sparsity at all. Yang et al.  proposed a U-net  based generator, following a refinement learning based approach, with mean square error (MSE) and perceptual loss to reconstruct the images. In , authors proposed fully residual network using addition-based skip connections. They used cyclic loss for data consistency constraints in the training process to achieve better reconstruction quality. Deora et al.  proposed a U-net based generator with patch discriminator  based model to perform the reconstruction task. Along with mean absolute error (MAE) and structural similarity (SSIM) , authors used Wasserstein loss  to improve the adversarial learning. Mardani et al.  introduced affine projection operator in-between the generator and the discriminator to improve the data consistency in the reconstructed images. Though deep adversarial networks have significantly improved the quality of the CS-MRI reconstruction, one of the biggest short-comings is that they work on real-valued inputs though the input is inherently complex-valued. The other limitation is that almost all reconstruction networks depend on pixel-based or losses. This efficiently reconstructs the low frequency components of data, but often fails to generate the middle and high frequency information which depicts fine textural and contextual parts of an image .
Motivation of the work: Complex parameter space provides several benefits over real parameter space. Apart from having biological inspiration and significance , complex-valued representation not only increases the representational capacity of the network, it can be more stable than real-valued space in various applications [36, 5]. Complex-valued operations can be performed with easy optimization techniques  without sacrificing the generalization ability of the network 
. Several researchers have reported that the complex-valued neural network exhibits faster learning with better robustness to noise[5, 2, 38]. Complex-valued operations also preserve the phase information which encodes fine structural details of an image [36, 28]. Even with these striking benefits, complex-valued deep networks are not widely explored. In fact, to the best of our knowledge, complex-valued GAN (Co-VeGAN) model has never been explored for any reconstruction problem.
Contributions: The contributions of this work are as follows.
Complex-valued operations are widely unexplored for GAN architecture, and it has never been explored for CS-MRI reconstruction problem. This work not only exploits complex-valued operations to achieve better reconstruction, but also documents the stability and quality of a Co-VeGAN approach for a reconstruction problem with various losses like , SSIM, etc.
We have proposed a novel generator architecture by modifying the existing U-net model .
As CS-MRI problem formulation is closely related to super-resolution problems, taking an inspiration from , we introduce wavelet loss in our model to better reconstruct mid-frequency components of the MR image. To the best of our knowledge, wavelet loss has never been integrated with any complex-valued neural network before.
The acquistion model of MRI can be described as follows:
where is the desired image,
denotes the observed data vector, the vectorcaptures the noise. denotes the matrix to compute the 2D Fourier transform, describes the matrix for undersampling, and denotes vectorization. Given an observation , the aim of reconstruction is to recover in the presence of a non-zero noise vector . We attempt the recovery of by using a GAN model.
2.1 Complex-valued GAN
A GAN comprises of two networks, namely a generator and a discriminator . In order to generate images which are similar to the samples of the distribution of true data , the generator attempts to map an input vector to the output
. On the other hand, the discriminator aims to classify the generated samplesand the samples from the distribution of .
We propose the use of a complex-valued GAN, where the generator , and the discriminator can be complex-valued networks. However, since both the generated and ground truth (GT) images are real-valued, we stick to the use of a real-valued discriminator.
2.2 Complex-valued operations
The complex-valued equivalent of real-valued 2D convolution is discussed below. The convolution of a complex-valued kernel with complex valued feature maps , can be represented as , where
In these notations, subscripts and
denote the real and imaginary parts, respectively. In order to implement the aforementioned complex-valued convolution, we use real-valued tensors, where() is stored such that the imaginary part () is concatenated to the real part (). The resultant includes four real-valued 2D convolutions as mentioned in (2), which is stored in a similar manner by concatenating to .
Backpropagation can be performed on a function that is non-holomorphic as long as it is differentiable with respect to its real and imaginary parts 
. Since all the loss functions considered in this work are real-valued, we considerto be a real-valued function of weight vector . The update rule of using gradient descent can be written as:
where is the learning rate, denotes the complex conjugate, and the gradient of is calculated as follows:
We make use of the complex BN applicable to complex numbers, proposed by . To ensure that the complex data is scaled in such a way that the distribution of real and imaginary components is circular, the 2D complex vector can be whitened as follows:
where denotes the covariance matrix, and denotes expectation operator. It can be represented as:
Learnable parameters , are used to scale and shift the aforementioned standardized vector as follows:
where is a 2 matrix, and is a complex number.
In order to work with complex-valued entities, several activations have been proposed in previous works . In this work, the following complex-valued activations are considered:
Complex parametric ReLU (PReLU): The complex equivalent of PReLU  is obtained when PReLU is applied separately to the real and imaginary parts of an element. The expression for PReLU is given by:
where and are trainable parameters.
ReLU : This activation allows a complex element to pass only if both its real and imaginary parts are positive. It is given by:
2.3 Network Architecture
The generator architecture of the proposed model is shown in Fig. 1
. It is based on a U-net architecture. The left side is a contracting path, where each step involves the creation of downsampled feature maps using a convolutional layer with a stride of two, which is followed by BN and activation. The right side is an expanding path, where each step consists of upsampling (by a factor of two), convolutional layer to create new feature maps, followed by BN and activation layers.
In order to provide richer context about the low-level features for superior reconstruction, the low-level feature maps from the contracting path are concatenated to the high-level feature maps of same size in the expanding path. In this work, we propose the use of dense connections between the steps (layers) within the contracting as well as the expanding path. These dense connections 
help to improve the flow of information between the layers and encourage the features to be reused from the preceding layers. There is also an added benefit of increase in variation of available information by concatenation of feature maps. Since the feature maps at various layers are not of the same size, the average pooling and upsampling (with bilinear interpolation) operations have been introduced in the connections between the layers of the contracting path and the expanding path, respectively. However, the use of these operations to change the size of the feature maps by a factor greater than(less than ) not only increases the computational and memory requirement, but also reduces the quality of information available to the subsequent layers. As shown in Fig. 1, the parameter is set as 3 in this work.
Further, residual-in-residual dense blocks (RRDBs)  are incorporated at the lowest layer of the generator, where feature maps of size are present. Each block uses residual learning across each dense block, as well as across a group of three dense blocks. At both the levels, residual scaling is used, i.e. the residuals are scaled by before they are added to the identity mapping, as shown in Fig. 1
. These RRDBs not only make the network length variable, because of the residual connections which make identity mappings easier to learn, but also make a rich amount of information accessible to the deeper layers, through the dense connections.
At the output of the generator, a hyperbolic tangent activation is applied, which brings the real as well as imaginary parts of the final feature map in the range . These are brought to the range . In order to make the output real-valued, the absolute value is obtained (which lies in the range ), and then brought back to the range .
The discriminator architecture is based on a standard convolutional neural network. It has 11 convolutional layers, each of which is followed by BN and activation. We use a patch based discriminator to increase the focus on the reconstruction of high frequency content. The patch based discriminator scores each patch of the image separately, and its output is the mean of the individual patch scores. This framework makes the network insensitive to the input size.
2.4 Training Losses
2.4.1 Adversarial Loss
In order to constrain the generator to produce the MR image corresponding to the samples acquired in the k-space, it is conditioned  over the zero-filled reconstruction (ZFR) given by:
where , denotes the Hermitian operator, and denotes the conversion of vector to a square matrix. Conventionally, the solution to the minmax game between the generator and the discriminator is obtained by using binary cross-entropy based loss. However, it causes the problem of vanishing and exploding gradients, which makes the training of the GAN model unstable. In order to prevent this, Wasserstein distance based loss  is used. Mathematically, the training process of this conditional GAN using Wasserstein loss is formulated as follows:
where is the distribution of the GT images, and is the distribution of the aliased ZFR images.
2.4.2 Content Loss
Besides adversarial loss, other losses are required to bring the reconstructed output closer to the corresponding GT image. In order to do so, we incorporate an MAE based loss, so that the pixel-wise difference between the GT and the generated image is minimized. It is given by:
where vec denotes vectorization, and denotes the norm.
2.4.3 Structural Similarity Loss
As the high frequency details in the MR image help in distinguishing various regions of the brain, it is extremely important to improve their reconstruction. SSIM quantifies the similarity between the local patches of two images on the basis of luminance, contrast and structure. It is calculated as follows:
where and represent two patches of size from an image, and
denote the mean and variance of, and denote the mean and variance of , and denotes the covariance of and . and are slack values to avoid division by zero.
In order to improve the perceptual quality of the reconstructed MR image and preserve the structural details, a mean SSIM (mSSIM) based loss is incorporated in the training of the generator. It maximizes the patch-wise SSIM between the generated image and the corresponding GT image, as follows:
where is the number of patches in the image.
2.4.4 Wavelet Loss
In order to further enhance the textural details in the generated image, a weighted version of MAE in the wavelet domain is included as another loss term. In order to decompose the image into sets of wavelet coefficients
, which are equal in size, and correspond to even division of bands in the frequency domain, the wavelet packet transform is used. Fig.2 depicts one step of the recursive process which is followed to obtain the sets of wavelet coefficients. For an level decomposition which produces sets of wavelet coefficients of size with , the wavelet loss is formulated as follows:
where denotes the weight of the set of coefficients. Since the pixel-wise MAE loss contributes more towards the improvement of low frequency details, and the mSSIM loss focuses more on preserving the high frequency content in the reconstructed image, higher weights
are assigned to the wavelet coefficients corresponding to the band-pass components to improve their reconstruction. This is done by setting the weights according to the probability density function of a Gaussian distribution with meanand variance . In this work, , i.e. , and .
2.4.5 Overall Loss
The overall loss which is used to train the generator, is formulated as a weighted sum of the losses presented above:
In this work, , , , and .
|No. of generator||2M||1.2M||1.2M||1.5M||1.5M|
|PSNR (dB)/ mSSIM||39.640/ 0.9823||40.048/ 0.9866||41.418/ 0.9879||43.798/ 0.9902||45.044/ 0.9919|
2.5 Training settings
For implementing the model, Keras framework with TensorFlow backend is used. The model is trained using 4 NVIDIA GeForce GTX 1080 Ti GPUs. In this work, the batch size is set as 16. In the generator, each layer produces 32 feature maps. The growth rate for the dense blocks in the RRDBs is set as 8,is 0.2, and 4 RRDBs are used. The absolute value of the discriminator weights is clipped at 0.05. For each generator update, the discriminator is updated thrice. For training the model, we use Adam optimizer , with and . The initial learning rate is set as , with a decay of , so that it becomes
of the initial value after 5 epochs.
3 Results and Discussion
We evaluate our models on T-1 weighted brain MR images from the MICCAI 2013 grand challenge dataset  as well as on T-2 weighted MR images of brain from the IXI dataset111https://brain-development.org/ixi-dataset/.
For the MICCAI 2013 dataset, 20 787 images are used for training, after randomly choosing images from the train set, and then applying data augmentation using images with 10% and 20% additive Gaussian noise. 1-D Gaussian undersampling is used in this case to generate 20% and 30% undersampled data using the masks shown in Fig. 4(a) and (b), respectively. For testing, 2000 images are chosen randomly from the test set of the dataset. For the IXI dataset, 7500 images are randomly chosen for training and 100 non-overlapping images are chosen for testing. In this case, radial undersampling is used, with the mask shown in Fig. 4(c).
|PSNR (dB)/ mSSIM||45.044/ 0.9919||35.9912/ 0.9690||45.377/ 0.9930|
Table 1 and Fig. 3 show the quantitative and qualitative results for ablation study of the model, respectively, to highlight the importance of various components used in the model. These results are reported for 30% undersampled images, from the MICCAI 2013 dataset. In the first case, a real-valued GAN model comprised of a U-net based generator without RRDBs, without the dense connections in the contracting and expanding paths, with ReLU activation and with in Eq. 15, is considered. In the next case, the complex-valued equivalent of the previous model is considered. In the third case, the effect of adding RRDBs in the last layer of the complex valued U-net is observed without the wavelet loss. In the fourth case, the addition of dense connections in the complex valued U-net is considered with the RRDBs, ignoring the wavelet loss. In the last case, the effect of including by setting is observed. Each step results in significant improvement in peak signal-to-noise ratio (PSNR) as well as mSSIM. For the rest of the results, we use the network settings mentioned in the last stage.
Table 2 and Fig. 5 show the quantitative and qualitative results for comparing various activations, respectively. These results are also reported for 30% undersampled images, from the MICCAI 2013 dataset. It is observed that ReLU has the worst, while PReLU has the best performance. The rest of the results are reported by using PReLU activation.
Fig. 6 shows the qualitative results of our final model for reconstruction of two 20% and 30% undersampled images from the MICCAI 2013 dataset. It is observed that the proposed model is able to reconstruct high-quality images by preserving most of the structural details. Also, we observe that use of noisy images during training helps in reducing the impact of noise on the reconstruction quality.
|Method||Noise-free images||10% noise added||20% noise added|
|PSNR (dB) / mSSIM||PSNR (dB)||PSNR (dB)|
|ZFR||35.61 / 0.7735||24.42||18.56|
|DeepADMM||37.36 / 0.8534||25.19||19.33|
|DLMRI||38.24 / 0.8020||24.52||18.85|
|Noiselet||39.01 / 0.8588||25.29||19.42|
|BM3D||39.93 / 0.9125||24.71||18.65|
|DAGAN||40.20 / 0.9681||37.40||34.23|
Table 3 illustrates the comparison of the proposed method with the state-of-the-art approaches. These results are reported for 30% undersampled images, from the MICCAI 2013 dataset. It can be observed that there is a significant boost in both the PSNR and mSSIM values when the proposed method is compared with existing approaches.
Table 4 show the comparison of the proposed method with ZFR and methods like FBPCNet , DeepADMM , DLMRI , and DeepCascade . These results are reported for 30% undersampled images, from the IXI dataset. It can be observed that although the PSNR is marginally less, there is a significant boost in the mSSIM for the proposed method. Also, the inference step takes only 0.0227 seconds.
The model trained on 30% undersampled brain images from the MICCAI 2013 dataset is also tested for reconstruction of 30% undersampled images of canine legs from the MICCAI 2013 challenge. Fig. 7 shows the qualitative results of this zero-shot inference. The proposed model achieves an average PSNR 42.584 dB and mSSIM 0.9857, when inferred for 2000 test images.
In this paper, we introduced a novel complex-valued generator architecture that is trained using SSIM and wavelet losses to achieve fast yet efficient reconstruction. The fast inference step opens up the possibility for a real-time implementation of the method. Detailed analyses have shown that the proposed method significantly improves the quality of the reconstruction compared to state-of-the-art CS-MRI reconstruction methods. It has also been shown that the present model can be trained with noise based data augmentation, and it outperforms all the existing methods when reconstruction is attempted using noisy data.
Wasserstein generative adversarial networks.
Proceedings of the 34th International Conference on Machine Learning, Vol. 70, International Convention Centre, Sydney, Australia, pp. 214–223. External Links: Cited by: §1, §2.4.1.
Unitary evolution recurrent neural networks. CoRR abs/1511.06464. Cited by: §1, §2.2.
-  (2017) Compressed sensing using generative models. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70, pp. 537–546. External Links: Cited by: §1.
-  (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. 3 (1), pp. 1–122. External Links: Cited by: §1.
Associative long short-term memory. CoRR abs/1602.03032. Cited by: §1.
-  (2019) Robust compressive sensing mri reconstruction using generative adversarial networks. CoRR abs/1910.06067. Cited by: §1.
-  (2016-02) Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (2), pp. 295–307. External Links: Cited by: §1.
-  (2006-04) Compressed sensing. 52 (4), pp. 1289–1306. External Links: Cited by: §1.
-  (2016-11-01) Decoupled algorithm for MRI reconstruction using nonlocal block matching model: BM3D-MRI. 56 (3), pp. 430–440. External Links: Cited by: Table 3.
-  (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems 27, pp. 2672–2680. External Links: Cited by: §1.
-  (2002) Generalized autocalibrating partially parallel acquisitions (GRAPPA). 47 (6), pp. 1202–1210. External Links: Cited by: §1.
-  (2016) On complex valued convolutional neural networks. abs/1602.09046. External Links: Cited by: item 3.
Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In
Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ICCV ’15, Washington, DC, USA, pp. 1026–1034. External Links: Cited by: item 2.
-  (2012) Generalization characteristics of complex-valued feedforward neural networks in relation to signal coherence. IEEE Transactions on Neural Networks and learning systems 23 (4), pp. 541–551. Cited by: §1.
Densely connected convolutional networks.
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 2261–2269. External Links: Cited by: §2.3.
-  (2017-10) Wavelet-SRNet: a wavelet-based CNN for multi-scale face super resolution. In 2017 IEEE International Conference on Computer Vision (ICCV), Vol. , pp. 1698–1706. External Links: Cited by: 3rd item, §1.
-  (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, pp. 448–456. External Links: Cited by: §2.2.
-  (2017-07) Image-to-image translation with conditional adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 5967–5976. External Links: Cited by: §1.
-  (2017-Sep.) Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing 26 (9), pp. 4509–4522. External Links: Cited by: Table 4, §3.
-  (2015) Adam: A method for stochastic optimization. See DBLP:conf/iclr/2015, External Links: Cited by: §2.5.
-  (2009) The complex gradient operator and the CR-calculus. abs/0906.4835. Cited by: §2.2.
-  2013 Diencephalon standard challenge. Cited by: §3.
-  (2015) Balanced sparse model for tight frames in compressed sensing magnetic resonance imaging. 10 (4), pp. 1–19. External Links: Cited by: §1.
-  (2007) Sparse MRI: the application of compressed sensing for rapid MR imaging. 58 (6), pp. 1182–1195. External Links: Cited by: §1.
-  (2019-01) Deep generative adversarial neural networks for compressive sensing mri. IEEE Transactions on Medical Imaging 38 (1), pp. 167–179. External Links: Cited by: §1.
-  (2014) Conditional generative adversarial nets. abs/1411.1784. Cited by: §2.4.1.
-  (2002-11) On the critical points of the complex-valued neural network. In Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP ’02., Vol. 3, pp. 1099–1103 vol.3. External Links: Cited by: §1.
-  (1981) The importance of phase in signals. Proceedings of the IEEE 69 (5), pp. 529–541. Cited by: §1.
-  (2015) Multichannel compressive sensing MRI using noiselet encoding. 10 (5), pp. 1–27. External Links: Cited by: Table 3.
-  SENSE: sensitivity encoding for fast MRI. Magnetic Resonance in MedicineMagnetic Resonance in MedicineCoRRIEEE Transactions on Image ProcessingArXivArXivIEEE Transactions on Medical ImagingIEEE Transactions on Medical ImagingIEEE Transactions on Computational ImagingIEEE Transactions on Information TheoryIEEE Transactions on Medical ImagingMagnetic Resonance in MedicineInverse Problems in Science and EngineeringPLOS ONEInverse ProblemsJournal of Mathematical Imaging and VisionPLOS ONEFoundations and Trends in Machine LearningIEEE Transactions on Medical ImagingCoRR 42 (5), pp. 952–962. External Links: Cited by: §1.
-  (2018-06) Compressed sensing mri reconstruction using a generative adversarial network with a cyclic loss. IEEE Transactions on Medical Imaging 37 (6), pp. 1488–1497. External Links: Cited by: §1.
-  (2011-05) MR image reconstruction from highly undersampled k-space data by dictionary learning. 30 (5), pp. 1028–1041. External Links: Cited by: §1, Table 3, Table 4, §3.
-  (2013) Neuronal synchrony in complex-valued deep networks. CoRR abs/1312.6115. Cited by: §1.
-  (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: 2nd item, §1.
-  (2018-02) A deep cascade of convolutional neural networks for dynamic MR image reconstruction. 37 (2), pp. 491–503. External Links: Cited by: Table 4, §3.
-  (2018) Deep complex networks. See DBLP:conf/iclr/2018, External Links: Cited by: §1, §2.2.
-  (2018-09) ESRGAN: enhanced super-resolution generative adversarial networks. In The European Conference on Computer Vision (ECCV) Workshops, pp. 63–79. Cited by: §2.3.
-  (2016) Full-capacity unitary recurrent neural networks. In Advances in Neural Information Processing Systems, pp. 4880–4888. Cited by: §1.
-  (2012) Image denoising and inpainting with deep neural networks. In Advances in Neural Information Processing Systems 25, pp. 341–349. External Links: Cited by: §1.
-  (2018-06) DAGAN: Deep de-aliasing generative adversarial networks for fast compressed sensing MRI reconstruction. 37 (6), pp. 1310–1321. External Links: Cited by: §1, Table 3.
-  (2016) Deep ADMM-Net for compressive sensing MRI. In Advances in Neural Information Processing Systems 29, pp. 10–18. External Links: Cited by: §1, Table 3, Table 4, §3.
-  (2004-04) Image quality assessment: From error visibility to structural similarity. 13 (4), pp. 600–612. External Links: Cited by: §1.