Over the years, interest in Diffusion Weighted MRI (DW-MRI) has continuously increased thanks to its versatile ability in diverse clinical applications (tamada2017prostate; razek2018assessment; razek2019multi). DW-MRI quantifies the degree of Brownian motion of protons and can detect diffusion restriction in areas with high cellularity (i.e. malignancies) and cell membrane density. Detecting changes in diffusion at the cellular level means that DW-MRI can be used not only to detect and characterize lesions, but also to monitor and predict therapeutic responses. (taouli2010diffusion; pereira2019diffusion) Another great advantage of DW-MRI comes from its ability to quantify the amount of diffusion via calculating the apparent diffusion coefficient (ADC) map, which further enhances the prognostic power.
In order to achieve DW-MRI, diffusion sensitizing bi-polar gradient is used to capture the small molecular movement. While sensitizing for microscopic movement of molecules, macroscopic patient motion can also be captured in the image, which introduces one of the greatest technical difficulties in DW-MRI. Motion artifacts arising from bulk patient motion can be reduced by cardiac gating, pulse triggering, or designing a specific pulse sequence for motion artifact removal (bammer2003basic; chilla2015diffusion). However, these approaches cannot completely remove the artifacts, leaving much room for improvement.
Yet another problem in DW-MRI comes from its inherently low resolution and signal-to-noise (SNR) ratio owing to the typical usage of fast imaging schemes such as echo planar imaging (EPI) (baliyan2016diffusion) with small number of excitations (NEX). The low spatial resolution provides further challenges in DW-MRI, making it hard to visualize small lesions of interest, or exactly quantify the diffusion property. A typical example of (relatively) high quality and low quality DW-MRI images are depicted in Fig. 1. Although the four different images in Fig. 1 were scanned with the same parameters in the same scanner, compared to images in (a-b), the images in (c-d) have lower resolution, and motion artifacts make the image quality worse.
Over the years, researchers have sought to overcome the challenges in DW-MRI with various methods. Super-resolution (SR) algorithms in DW-MRI were proposed to increase the spatial resolution of DW-MRI initially via simple interpolation based methods. Later on, model-based optimization methods(scherrer2011super) were adopted. Based on the theory of compressed sensing (CS), regularized optimization methods, most of which utilizes variants of total variation (TV) regularization, were proposed in the context of diffusion magnetic resonance imaging (ning2016joint; teh2020improved). Other advanced methods such as the Metropolis-Hastings algorithm was proposed in (yap2013generative). In parallel, motion artifact correction methods in DW-MRI were developed. (ozaki2013motion) proposed to design a velocity-compensated diffusion gradients in order to compensate for cardiac motion. Independent of the weighting scheme, CS-based MRI motion artifact correction methods (vasanawala2010improved; yang2013sparse; jin2017mri) were also proposed.
The downside of model-based optimization methods and CS-based methods come from their high computational complexity and cumbersome hyper-parameter tuning. Moreover, these algorithms typically require raw -space data, which are often hard to collect. Moreover, most of the algorithms developed aim at a specific task, either acceleration or super-resolution, which cannot handle arbitrary degradation in the image.
Lately, deep learning based methods for DW-MRI super-resolution (albay2018diffusion; tanno2017bayesian; fan2020generative) were developed, thanks to its ability to learn from data distribution. Motion artifact correction methods based on deep learning were also extensively studied (duffy2018retrospective; pawar2018motion)
, but most of these studies were restricted to the supervised learning situation, where the network learns from matched target data.
This greatly shrinks the applicability of these algorithms, since there are many situations in DW-MRI where such supervised training is unavailable. For instance, matching pairs of high-resolution (HR) and low-resolution (LR) DW-MRI scans are unobtainable. Moreover, exactly matching motion-free images are impossible to obtain. To this end, unsupervised learning-based methods such as Cycle-MedGAN (armanious2019unsupervised; armanious2020unsupervised) and CycleGAN with bootstrap aggregation (oh2020unsupervised) have been proposed. However, to the best of our knowledge, a method which simultaneously corrects for motion artifacts and performs super-resolution at once, has not been reported in literature.
To address the problem in an unsupervised fashion to be able to directly apply in real-world situation, and to solve the problem of super-resolution and artifact correction simultaneously, here we propose an unsupervised learning to solve it all. Our idea is based on the investigation of using bootstrap aggregation for probabilistic -space correction (oh2020unsupervised), which is further extended in this study to also account for super-resolution, and adapted to the quality enhancement of DW-MRI. Specifically, in the training of MR reconstruction network for -space correction, we add a stochastic degradation block as a forward mapping, which models the degradation process of the resolution in the image acquisition. By this simple extension, the -space correction network learns not only to correct for under-sampled -space, but also to enhance the resolution. Once the network is trained, bootstrap sub-sampling and aggregation is performed at the inference stage, where -space phase encoding lines are randomly sub-sampled, and the corrected images are aggregated to form a final high-quality reconstructed image.
The applicability of our novel deep learning framework was explored using both simulated data, and in vivo liver DW-MRI scans. Results confirm that the proposed method can effectively remove motion artifacts present in DW-MRI and increase the resolution/SNR altogether. Interestingly, the same reconstruction steps can also be directly applied to enhance the quality of the ADC maps, showing the versatility of the proposed method.
Consider the following forward measurement model:
where is the -space data,
corresponds to Fourier transform, andis the low-quality image in which the resolution is limited by the system, formally defined as:
is the latent high-quality image that we would like to estimate,is the stochastic degradation process parametrized by a random sample
from a probability distribution, which generates the low-quality image . Moreover, it is often the case the transient patient motion introduces
-space outliers along the phase encoding direction(oh2020unsupervised):
In Eq. (3), is the motion-corrupted -space measurement, is the clean data in (1), is the frequency-encoding direction, is the phase-encoding direction, and . Here, note that motion artifacts are induced by the phase shift along the phase-encoding direction , where contains the indices of errors. Eqs. (1), (2) and (3) tells us that the forward model for the measurement is given by
Hence, the correction of -space outliers, together with the inversion of the degradation process by blur operation is necessary in order to recover clean images at the desired resolution.
2.1 Bootstrap Aggregation for Artifact Correction and Super Resolution
Recently in our companion paper (oh2020unsupervised), we demonstrated that the -space outliers can be corrected by a bootstrap aggregation approach in the context of MRI reconstruction (cha2020geometric), which can be formulated as follows:
where is the random down-sampling operator which probabilistically masks out phase-encoding lines, is the -space measurement, is the weight applied to the reconstruction, refers to the inverse Fourier transform, and
is the trained neural network which can reconstruct high quality images from the aliased input image due to under-sampled-space measurement. In other words, the measured -space is randomly sub-sampled different ways, reconstructed via the reconstruction network , and subsequently aggregated to form the final image.
The reason why the approach in (5) is efficient for motion artifact correction comes from the following observation
for certain , where the operator is defined in (3). In another words, when the motion corrupted -space outliers in the phase-encoding lines are removed, the sub-sampled image becomes similar to the sub-sampled image from the clean image. Hence, for certain sampling pattern , the motion-corrupted -space outlier can be effectively removed. Subsequently, the bootstrap aggreagtion scheme introduced in Eq. (5) corrects for the missing -space components to form a clean image, since the network was trained to reconstruct the clean images.
Unfortunately, this is not enough in the case where the image to be reconstructed through bootstrap aggregation is also degraded by (2), and hence has lower resolution. To account for this additional caveat, we extend the training framework to incorporate the image degradation process in (1),(2), and (3). Specifically, we model the degradation process as follows:
where represents the Gaussian blur kernel parameterized with , and the noise is modeled with white Gaussian noise , so that is the set containing all the parameters of degradation. Here, we further assume that is fixed, and sample . Furthermore, we also assume so that the inherent noise do not get emphasized while the super-resolution process. Pictorially, can be described as Fig. 3. Notably, although this might not capture the true degradation process, it closely resembles the truth, as we empirically show in the Results section. Moreover, when the forward model is exactly known, our proposed method can better approximate the desired clean image, which we show in the simulation study.
All in all, the resulting forward model for the Fourier inversion from the random -space subsampling is given by
where is defined in (7), and the last approximation is due to (6). Therefore, we need to train the reconstruction network such that the network learns the inverse mapping with respect to the forward operation . For training, we utilize physics-informed OT-cycleGAN, which was shown to be especially effective for MRI reconstruction (oh2020unpaired; chung2021two; cha2020unpaired).
Specifically, let and denote the domains for the motion artifact free high resolution images and the aliased images from the forward operation in (2.1), respectively. We further assume that and
are associated with probability measureand , respectively. We define the transport cost between the two probability spaces and :
where is the neural network generator to perform inverse operator. Then, the corresponding optimal transport problem is given by
denotes the set of the joint distribution whose marginals areand
. The geometric meaning of (10) was discussed in detail in (oh2020unpaired; chung2021two; cha2020unpaired), which aims to minimize the statistical distances between transported and the empirical measures in and simultaneously.
Furthermore, the primal formulation of the unsupervised learning in (10) can be represented by a dual formulation:
Here, is the hyper-parameter, and the cycle-consistency term is given by
whereas the second term is the discriminator term:
where is a 1-Lipschitz function. Note that only a single generator/discriminator pair is used for distribution mapping, since the mapping from to is replaced by a deterministic forward operator in (2.1). As discussed in (oh2020unpaired; chung2021two; cha2020unpaired), the discriminator term in (14) can be replaced by LS-GAN (mao2017least) loss.
The resulting OT-cycleGAN framework is illustrated in Fig. 2(a), where in the upper part of the cycle, clean image is blurred via the stochastic degradation operation , Fourier transformed to -space for sub-sampling, then transformed back to image space, after which follows the generator , which learns to invert the forward mapping. The lower branch starts from the unpaired artifact image, and does the opposite. During the learning process, the discriminator is used for learning the distribution of the clean image.
At test stage, as shown in Fig. 2(b), the low-quality image is Fourier transformed to -space, and it is randomly subsampled using different masks, creating images containing aliasing artifacts. These images are reconstructed in parallel with the trained network , which is aggregated by Eq. (5). For simplicity, for all the following experiments, we set .
Note that the reconstruction need not be started from raw -space. DICOM images can be directly utilized, and hence our method can be applied in a wide variety of clinical situations, where the -space raw data is not available. Another great advantage is that the same neural network can be used with different sub-sampling factors at the test stage, adapting to user needs. When the sub-sampling factor is set to low values (e.g. 1.5, 2), reconstructed images are kept sharp without the presence of aliasing artifacts. When the factor is set to high values (e.g. 3, 4), bulk motion is best removed, albeit minor aliasing artifacts are introduced. For all the reported results, we set . The results and consequences will further be discussed in Section 5.1.
3.1 Training Dataset
To demonstrate the efficacy of our method, we initially performed a simulation study using human connectome project (HCP) dataset, containing magnitude only MR brain images. Specific parameters are shown in Table 1. Out of 190 patient volume scans, 150 volumes (3000 slices) were used for training, and 40 volumes (800 slices) for testing. To simulate the situation where the images are contaminated with motion artifacts along with blurring, we specify the parameters of the degradation to , and Procedure for motion artifact generation can be found in Section 3.5.
Moreover, another simulation study using the liver DW-MRI data was performed. Specifically, out of 100 high quality DWI patient scans (4831 slices), 80 patients were selected as training set and 20 patients were selected as test set. Simulated degradation was set identical to the HCP simulation study, and the motion artifact generation process is elaborated in 3.5.
|Scanner||Siemens 3T scanner|
|Echo train duration||1105|
|Time of Repetition(TR) [ms]||3200|
|Time of Echo (TE) [ms]||565|
|Matrix size||320 320|
|Resolution [mm]||0.7 0.7 0.7|
For in vivo study, we use 2D diffusion weighted MRI (DW-MRI) of liver acquired through single shot echo planar imaging (EPI) with specific parameters as specified in Table 2. All scans were taken as breath-hold MR scans. Moreover, scans were acquired via varying to quantify the ADC map.
|Scanner||Siemens Skyra 3T scanner|
|Time of Repetition(TR) [ms]||1800|
|Time of Echo (TE) [ms]||56|
|Echo Train Length (ETL)||37|
|Matrix size||232 180|
|Num. Excitation (NEX)||2|
|Resolution [mm]||2.0 2.0|
|Slice Thickness [mm]||6.0|
Among 201 DWI patient scans, 100 (4831 slices) were manually classified asbad quality, and 101 as good quality (3815 slices) by an attending radiologist. Consequently, the high quality images were used as training in Fig. 2, and the low quality images were used for inference. Notably, all the patient scans share the same scan parameters as in Table 2, and thus there are no clear distinctions between the images in two different distributions. In other words, the high quality images do not have much higher image resolution than the low quality images, which could hamper the training since target distribution is not well-defined. Nevertheless, our proposed method is still able to enhance the quality of DW-MRI images in a fully unsupervised fashion, as will be shown in the results section.
3.2 Network Architecture
Network architecture for the generator and discriminator used in Fig. 2 (a),(b) are illustrated in Fig. 4. The architecture of CNN generator depicted in Fig. 4(a) was adopted from U-Net (ronneberger2015u), with 4 stages of pooling/unpooling, the number of filter channels initially set to 64 at stage 0, which doubles at every stage, reaching 1024 at stage 4. At each stage, blocks are made up of 3 3 convolution, group normalization with group
, and ReLU activation. The last feature activation of the encoder stage are concatenated to the decoder layer of the corresponding stage, and at the last layer, 11 convolution is used.
For the discriminator, 2D PatchGAN architecture was adopted from (isola2017image), as shown in Fig. 4(b). Specifically, each block consists of 4
4 convolution with stride, Instance normalization, and LeakyReLU activation. For additional stability, we also used spectral normalization at every convolution block. Last layer of the network consists of 11 convolution to get the final output.
3.3 Training Details
For training, the hyper-parameter for the loss function was set to
, and the networks were trained for 150 epochs with Adam optimizer using the default parameterswith the initial learning rate of , which was decayed by a factor of 10 at the 100th
epoch. Batch size for training was set to 1. The discriminator was updated once at every update step of the generator. For pre-processing, we normalize the image by the standard deviation of each slice. For-space sub-sampling, we use 1D Gaussian random sampling with the acceleration factor of , selected randomly at each update step. The masks generated for the acceleration factor also contains central autocalibration signal (ACS) region which take up 6% and 11% of the
-space central lines, respectively. All experiments were performed using PyTorch. Training took about a day using NVIDIA GeForce GTX 2080-Ti GPU.
3.4 Comparative Algorithms
We performed comparative study using the state-of-the-art methods for MR super-resolution and motion artifact correction: ESRGAN (wang2018esrgan), cycleGAN (zhu2017unpaired), and Oh et al. (oh2020unsupervised).
First, we compared our method to ESRGAN, since this is the method utilized in many variants of MR super-resolution (albay2018diffusion; fan2020generative). Since ESRGAN is based on supervised training, we generated the training dataset by retrospectively down-sampling the training dataset by the size factor of , and up-sampling the data by bicubic upsampling. Subsequently, the network was trained using the same network architectures described in Section 3.2 for fair comparison, using the parameters .
CycleGAN (zhu2017unpaired) is also chosen as one of the comparative algorithms, which is widely used when forward/backward mapping between two arbitrary distributions is needed (sim2020optimal). The networks utilized in cycleGAN are set to the same as the ones described in Section 3.2. The only difference is that the vanilla cycleGAN architecture requires two sets of generator/discriminator, which requires double the parameters needed compared to the proposed method.
We finally compare our method to the most recent Oh et al. (oh2020unsupervised), which is one of the state-of-the-art algorithms for motion correction, and also the most similar to the proposed method. The network architectures were kept as the same with the proposed method. Parameters in the training/testing phase were all set to the same parameters advised in the original paper.
|Image type||Metric||Input||ESRGAN (wang2018esrgan)||cycleGAN (zhu2017unpaired)||Oh et al. (oh2020unpaired)||Proposed|
3.5 Motion Artifact Simulation
For the performed simulation study with HCP dataset, we simulated the motion artifact, following the procedures in (tamada2020motion), and the specific parameters used in (oh2020unsupervised). More specifically, the phase shift in eq. (3) is defined as
where is the random phase variation sampled as at each phase-encoding line, and the threshold is statically set to .
4 Experimental Results
4.1 Simulation Study
For quantitative evaluation of the proposed method, we first perform a simulation study using the HCP dataset, which is illustrated in Fig. 5. We can see both quantitatively and qualitatively the superiority of our method compared to other methods. Namely, ESRGAN is able to boost the resolution of the artifact image as shown in Fig. 5(b), but it also sharpens the motion artifacts along with it. CycleGAN in Fig. 5(c) does remove the artifacts and sharpens the image to some extent, but it also introduces unwanted artifacts, and has low quantitative metric. Oh et al. in Fig. 5 efficiently removes the motion artifacts, but produces blurry images due to the degradation. On the other hand, the proposed method in Fig. 5
(e) successfuly removes motion artifacts while sharpening the image overall, closely approximating the target image. Quantitative metrics depicting peak signal-to-noise-ratio (PSNR) and structural similarity index (SSIM) in Table3 also show that the results are in fact superior.
Furthermore, we present a simulation study using simulated test set retrospectively acquired from DW-MRI scans, as elaborated in Section 3.1. Results can be seen in Fig. 6 (a-e), where we constantly see that the proposed method is able to closely estimate the label image, devoid of artifacts. Moreover, also in terms of quantitative metrics Table 3, the proposed method outperforms Oh et al. (oh2020unsupervised) by more than 1 db in PSNR. In contrast, other methods other than the proposed method show their own weaknesses. ESRGAN sharpens the motion artifacts along with the boundaries. CycleGAN makes the overall image sharper, but often breaks down as can be seen in the second row of Fig. 6, and has low PSNR values. Oh et al. is excellent at removing the motion artifacts, but the details are omitted.
4.2 In Vivo Study
Experiments with real DW-MRI data can be seen in Fig. 7. Consistent with the results from the simulation study, we again observe that our method in Fig. 7 outperforms all the other methods, effectively removing the motion artifacts while making the image sharper. In contrast, reconstruction results with cycleGAN in Fig. 7(b) does sharpen the image, but introduces unwanted artifacts, and changes the content of the original image, best seen in the third row of the figure. ESRGAN in Fig. 7(c) sharpens the image by a little margin, but the artifact remains intact. Oh et al. (oh2020unsupervised) in Fig. 7(d) does remove the motion artifacts, but the image still looks blurry.
Moreover, zooming into the images shown in the second row of Fig. 7, we see the region of motion artifact enclosed in the yellow box. We observe that the boundaries are blurry, and thus the structure is not clearly seen in all methods except ESRGAN and the proposed method. However, the proposed method is able to remove the artifacts so that we can visualize the sharp boundaries again. ESRGAN is also able to sharpen the edges, but it also sharpens the artifacts.
4.3 Quality Enhancement of ADC Maps
The calculation of ADC map via conventional methods often results in low resolution, noisy maps, mostly due to the pixel-wise regression from the different images of varying values. When the images are not perfectly aligned, this also contributes to the error in the ADC maps, making it harder to estimate the high quality version of the ADC map. We conjectured that since our network learned the ability to enhance the resolution and decrease noise while compensating for motion artifacts, our network could also be applied to ADC maps directly. The results are demonstrated in Fig. 8. The network used for DW-MRI reconstruction used in Section 4.2 was used to enhance the quality of ADC maps acquired from the vendor using conventional method (rosenkrantz2011diffusion; koh2007diffusion). We see in Fig. 8(a) that the original maps are quite noisy, which could hamper clinical utility. On the other hand, the reconstruction results with the proposed method in Fig. 8 removes noise and enhances the resolution, creating a cleaner version of the ADC maps.
5.1 Effect of varying sub-sampling factor
It should be noted that the same network can be utilized differently, by changing the bootstrap sub-sampling factor at the inference stage, depicted in Fig. 4(b). Specifically, the sub-sampling operator can be chosen freely, as long as the acceleration factor does not exceed beyond the maximum factor that was used in the training step. In our case, since at the training stage the factor was chosen as , we performed reconstruction using , and the results are reported in 9. With both examples, we see that on low acceleration factors, images tend to be cleaner and sharper. As the acceleration factor increases however, we observe that motion artifacts are better corrected, but minor aliasing artifacts are introduced along with it, making the image less clean. Hence, radiologists would be able to control the factor, switching the sub-sampling factor to higher values when more motion correction is desired.
5.2 Ablation study
One might question the importance of the degradation block being stochastic. Many factors account for this: for one, the amount of degradation varies across the low quality images, and hence the network needs to learn the variability in the degradation process. Furthermore, sampling the degradation block from certain distribution also has augmentation effect, which could also enhance the expressivity of the network. To strengthen our statement, we performed an ablation study using simulated DW-MRI dataset, as shown in Table 4.
|kernel: ✗, noise: ✗||35.94||0.926|
|kernel: ✗, noise: ✓||38.68||0.959|
|kernel: ✓, noise: ✗||39.97||0.960|
|kernel: ✓, noise: ✓||40.04||0.961|
Notably, we can see that keeping the degradation block stochastic, along with injecting random noise produce the best results. Without the noise factor, the metrics slightly decrease, possibly due to noise enhancement in the super-resolution process. Using fixed kernel drops the metrics even more, and finally, when the kernel is fixed and no noise is injected, the performance drops heavily, and we can clearly see the importance of the stochasticity in .
5.3 Limitations in Acquiring High Quality Target Distribution
Unsupervised learning, especially in the context of image enhancement, is related to learning the optimal transport mapping from the low-quality input distribution to the high-quality target distribution (sim2020optimal). Consequently, a crucial factor which controls the ability of the network is the quality of the training target distribution. When the networks are trained with well-maintained, superior quality images constituting the target distribution, the network learns well, and also performs well at test stage. Unfortunately, the high-quality target distribution utilized in the in vivo study in Section 4.2 share the same scan acquisition parameters with the low-quality input distribution. For example, the number of excitations (NEX) in EPI sequence is the decisive factor of image quality, which is set to 2 in both input and target images. Although better quality images are manually curated by experienced radiologists for use as a target dataset, this still places limitations on the degree of quality improvement that can be achieved by the trained network. If we were to acquire unpaired higher quality scans with higher NEX as the target distribution, which is in fact possible in DW-MRI, we conjecture that our network will be able to enhance the quality of DW-MRI even further. Having said that, our novel method, which is a proof-of-concept, can further be enhanced in a future study using a well controlled clinical dataset.
In this work, we proposed a novel unsupervised deep learning method for simultaneous super-resolution and motion artifact correction. The network is trained with the physics-informed OT-cycleGAN which learns to invert the blurring process and reconstruct the under-sampled -space. Subsequently, the trained network can be flexibly used in bootstrap sub-sampling and aggregation process to enhance the low quality DW-MRI images. With extensive experiments using both simulated and in vivo studies, we showed clear superiority of our method over existing methods. Our framework is flexible in that it can be extended to other imaging situations and modalities, not to mention arbitrary artifacts that are hard to define or solve in a supervised fashion. Applying our method of quality enhancement to other domains could be an interesting directory of study. Finally, the proposed work does not fully analyze the impact and influence of the quality enhancement in clinical perspective, which could also be a promising direction of research.
This work was supported by the National Research Foundation of Korea under Grant NRF-2020R1A2B5B03001980