Log In Sign Up

Simultaneous super-resolution and motion artifact removal in diffusion-weighted MRI using unsupervised deep learning

Diffusion-weighted MRI is nowadays performed routinely due to its prognostic ability, yet the quality of the scans are often unsatisfactory which can subsequently hamper the clinical utility. To overcome the limitations, here we propose a fully unsupervised quality enhancement scheme, which boosts the resolution and removes the motion artifact simultaneously. This process is done by first training the network using optimal transport driven cycleGAN with stochastic degradation block which learns to remove aliasing artifacts and enhance the resolution, then using the trained network in the test stage by utilizing bootstrap subsampling and aggregation for motion artifact suppression. We further show that we can control the trade-off between the amount of artifact correction and resolution by controlling the bootstrap subsampling ratio at the inference stage. To the best of our knowledge, the proposed method is the first to tackle super-resolution and motion artifact correction simultaneously in the context of MRI using unsupervised learning. We demonstrate the efficiency of our method by applying it to both quantitative evaluation using simulation study, and to in vivo diffusion-weighted MR scans, which shows that our method is superior to the current state-of-the-art methods. The proposed method is flexible in that it can be applied to various quality enhancement schemes in other types of MR scans, and also directly to the quality enhancement of apparent diffusion coefficient maps.


page 2

page 3

page 6

page 7

page 9

page 10


STRESS: Super-Resolution for Dynamic Fetal MRI using Self-Supervised Learning

Fetal motion is unpredictable and rapid on the scale of conventional MR ...

Learning from a Handful Volumes: MRI Resolution Enhancement with Volumetric Super-Resolution Forests

Magnetic resonance imaging (MRI) enables 3-D imaging of anatomical struc...

Unsupervised MR Motion Artifact Deep Learning using Outlier-Rejecting Bootstrap Aggregation

Recently, deep learning approaches for MR motion artifact correction hav...

Accurate super-resolution low-field brain MRI

The recent introduction of portable, low-field MRI (LF-MRI) into the cli...

Variational Multi-Task MRI Reconstruction: Joint Reconstruction, Registration and Super-Resolution

Motion degradation is a central problem in Magnetic Resonance Imaging (M...

Annealed Score-Based Diffusion Model for MR Motion Artifact Reduction

Motion artifact reduction is one of the important research topics in MR ...

3D-QCNet – A Pipeline for Automated Artifact Detection in Diffusion MRI images

Artifacts are a common occurrence in Diffusion MRI (dMRI) scans. Identif...

1 Introduction

Over the years, interest in Diffusion Weighted MRI (DW-MRI) has continuously increased thanks to its versatile ability in diverse clinical applications (tamada2017prostate; razek2018assessment; razek2019multi). DW-MRI quantifies the degree of Brownian motion of protons and can detect diffusion restriction in areas with high cellularity (i.e. malignancies) and cell membrane density. Detecting changes in diffusion at the cellular level means that DW-MRI can be used not only to detect and characterize lesions, but also to monitor and predict therapeutic responses. (taouli2010diffusion; pereira2019diffusion) Another great advantage of DW-MRI comes from its ability to quantify the amount of diffusion via calculating the apparent diffusion coefficient (ADC) map, which further enhances the prognostic power.

In order to achieve DW-MRI, diffusion sensitizing bi-polar gradient is used to capture the small molecular movement. While sensitizing for microscopic movement of molecules, macroscopic patient motion can also be captured in the image, which introduces one of the greatest technical difficulties in DW-MRI. Motion artifacts arising from bulk patient motion can be reduced by cardiac gating, pulse triggering, or designing a specific pulse sequence for motion artifact removal (bammer2003basic; chilla2015diffusion). However, these approaches cannot completely remove the artifacts, leaving much room for improvement.

Figure 1: Illustration of variability in image quality existent in diffusion-weighted MRI. (a-b) High quality images with high resolution and no artifacts. (c-d) Low quality images with low resolution and motion artifacts. Yellow arrows indicate areas where motion artifacts are observed.

Yet another problem in DW-MRI comes from its inherently low resolution and signal-to-noise (SNR) ratio owing to the typical usage of fast imaging schemes such as echo planar imaging (EPI) (baliyan2016diffusion) with small number of excitations (NEX). The low spatial resolution provides further challenges in DW-MRI, making it hard to visualize small lesions of interest, or exactly quantify the diffusion property. A typical example of (relatively) high quality and low quality DW-MRI images are depicted in Fig. 1. Although the four different images in Fig. 1 were scanned with the same parameters in the same scanner, compared to images in (a-b), the images in (c-d) have lower resolution, and motion artifacts make the image quality worse.

Over the years, researchers have sought to overcome the challenges in DW-MRI with various methods. Super-resolution (SR) algorithms in DW-MRI were proposed to increase the spatial resolution of DW-MRI initially via simple interpolation based methods. Later on, model-based optimization methods 

(scherrer2011super) were adopted. Based on the theory of compressed sensing (CS), regularized optimization methods, most of which utilizes variants of total variation (TV) regularization, were proposed in the context of diffusion magnetic resonance imaging (ning2016joint; teh2020improved). Other advanced methods such as the Metropolis-Hastings algorithm was proposed in (yap2013generative). In parallel, motion artifact correction methods in DW-MRI were developed. (ozaki2013motion) proposed to design a velocity-compensated diffusion gradients in order to compensate for cardiac motion. Independent of the weighting scheme, CS-based MRI motion artifact correction methods (vasanawala2010improved; yang2013sparse; jin2017mri) were also proposed.

The downside of model-based optimization methods and CS-based methods come from their high computational complexity and cumbersome hyper-parameter tuning. Moreover, these algorithms typically require raw -space data, which are often hard to collect. Moreover, most of the algorithms developed aim at a specific task, either acceleration or super-resolution, which cannot handle arbitrary degradation in the image.

Lately, deep learning based methods for DW-MRI super-resolution (albay2018diffusion; tanno2017bayesian; fan2020generative) were developed, thanks to its ability to learn from data distribution. Motion artifact correction methods based on deep learning were also extensively studied (duffy2018retrospective; pawar2018motion)

, but most of these studies were restricted to the supervised learning situation, where the network learns from matched target data.

This greatly shrinks the applicability of these algorithms, since there are many situations in DW-MRI where such supervised training is unavailable. For instance, matching pairs of high-resolution (HR) and low-resolution (LR) DW-MRI scans are unobtainable. Moreover, exactly matching motion-free images are impossible to obtain. To this end, unsupervised learning-based methods such as Cycle-MedGAN (armanious2019unsupervised; armanious2020unsupervised) and CycleGAN with bootstrap aggregation (oh2020unsupervised) have been proposed. However, to the best of our knowledge, a method which simultaneously corrects for motion artifacts and performs super-resolution at once, has not been reported in literature.

To address the problem in an unsupervised fashion to be able to directly apply in real-world situation, and to solve the problem of super-resolution and artifact correction simultaneously, here we propose an unsupervised learning to solve it all. Our idea is based on the investigation of using bootstrap aggregation for probabilistic -space correction (oh2020unsupervised), which is further extended in this study to also account for super-resolution, and adapted to the quality enhancement of DW-MRI. Specifically, in the training of MR reconstruction network for -space correction, we add a stochastic degradation block as a forward mapping, which models the degradation process of the resolution in the image acquisition. By this simple extension, the -space correction network learns not only to correct for under-sampled -space, but also to enhance the resolution. Once the network is trained, bootstrap sub-sampling and aggregation is performed at the inference stage, where -space phase encoding lines are randomly sub-sampled, and the corrected images are aggregated to form a final high-quality reconstructed image.

The applicability of our novel deep learning framework was explored using both simulated data, and in vivo liver DW-MRI scans. Results confirm that the proposed method can effectively remove motion artifacts present in DW-MRI and increase the resolution/SNR altogether. Interestingly, the same reconstruction steps can also be directly applied to enhance the quality of the ADC maps, showing the versatility of the proposed method.

2 Theory

Figure 2: Pictorial description of the overall method. (a) Unsupervised training scheme for the proposed method. At each step, is sampled from , which blurs the image and adds noise. Then, -space under-sampling is performed and the network is trained to remove the blurring motion artifacts. (b) Boosting aggregation for inference. Following (oh2020unsupervised), the degraded images are randomly sub-sampled in the -space and reconstructed with the trained network , after which follows averaging of the images.

Consider the following forward measurement model:


where is the -space data,

corresponds to Fourier transform, and

is the low-quality image in which the resolution is limited by the system, formally defined as:



is the latent high-quality image that we would like to estimate,

is the stochastic degradation process parametrized by a random sample

from a probability distribution

, which generates the low-quality image . Moreover, it is often the case the transient patient motion introduces

-space outliers along the phase encoding direction 



In Eq. (3), is the motion-corrupted -space measurement, is the clean data in (1), is the frequency-encoding direction, is the phase-encoding direction, and . Here, note that motion artifacts are induced by the phase shift along the phase-encoding direction , where contains the indices of errors. Eqs. (1), (2) and (3) tells us that the forward model for the measurement is given by


Hence, the correction of -space outliers, together with the inversion of the degradation process by blur operation is necessary in order to recover clean images at the desired resolution.

2.1 Bootstrap Aggregation for Artifact Correction and Super Resolution

Recently in our companion paper (oh2020unsupervised), we demonstrated that the -space outliers can be corrected by a bootstrap aggregation approach in the context of MRI reconstruction (cha2020geometric), which can be formulated as follows:


where is the random down-sampling operator which probabilistically masks out phase-encoding lines, is the -space measurement, is the weight applied to the reconstruction, refers to the inverse Fourier transform, and

is the trained neural network which can reconstruct high quality images from the aliased input image due to under-sampled

-space measurement. In other words, the measured -space is randomly sub-sampled different ways, reconstructed via the reconstruction network , and subsequently aggregated to form the final image.

The reason why the approach in (5) is efficient for motion artifact correction comes from the following observation


for certain , where the operator is defined in (3). In another words, when the motion corrupted -space outliers in the phase-encoding lines are removed, the sub-sampled image becomes similar to the sub-sampled image from the clean image. Hence, for certain sampling pattern , the motion-corrupted -space outlier can be effectively removed. Subsequently, the bootstrap aggreagtion scheme introduced in Eq. (5) corrects for the missing -space components to form a clean image, since the network was trained to reconstruct the clean images.

Figure 3: Pictorial representation of the degradation block . are the parameters defining . Subsequently, the convolution kernel and noise are stochastically sampled.

Unfortunately, this is not enough in the case where the image to be reconstructed through bootstrap aggregation is also degraded by (2), and hence has lower resolution. To account for this additional caveat, we extend the training framework to incorporate the image degradation process in (1),(2), and (3). Specifically, we model the degradation process as follows:


where represents the Gaussian blur kernel parameterized with , and the noise is modeled with white Gaussian noise , so that is the set containing all the parameters of degradation. Here, we further assume that is fixed, and sample . Furthermore, we also assume so that the inherent noise do not get emphasized while the super-resolution process. Pictorially, can be described as Fig. 3. Notably, although this might not capture the true degradation process, it closely resembles the truth, as we empirically show in the Results section. Moreover, when the forward model is exactly known, our proposed method can better approximate the desired clean image, which we show in the simulation study.

All in all, the resulting forward model for the Fourier inversion from the random -space subsampling is given by


where is defined in (7), and the last approximation is due to (6). Therefore, we need to train the reconstruction network such that the network learns the inverse mapping with respect to the forward operation . For training, we utilize physics-informed OT-cycleGAN, which was shown to be especially effective for MRI reconstruction (oh2020unpaired; chung2021two; cha2020unpaired).

Specifically, let and denote the domains for the motion artifact free high resolution images and the aliased images from the forward operation in (2.1), respectively. We further assume that and

are associated with probability measure

and , respectively. We define the transport cost between the two probability spaces and :


where is the neural network generator to perform inverse operator. Then, the corresponding optimal transport problem is given by



denotes the set of the joint distribution whose marginals are


. The geometric meaning of (

10) was discussed in detail in (oh2020unpaired; chung2021two; cha2020unpaired), which aims to minimize the statistical distances between transported and the empirical measures in and simultaneously.

Furthermore, the primal formulation of the unsupervised learning in (10) can be represented by a dual formulation:




Here, is the hyper-parameter, and the cycle-consistency term is given by


whereas the second term is the discriminator term:


where is a 1-Lipschitz function. Note that only a single generator/discriminator pair is used for distribution mapping, since the mapping from to is replaced by a deterministic forward operator in (2.1). As discussed in (oh2020unpaired; chung2021two; cha2020unpaired), the discriminator term in (14) can be replaced by LS-GAN (mao2017least) loss.

The resulting OT-cycleGAN framework is illustrated in Fig. 2(a), where in the upper part of the cycle, clean image is blurred via the stochastic degradation operation , Fourier transformed to -space for sub-sampling, then transformed back to image space, after which follows the generator , which learns to invert the forward mapping. The lower branch starts from the unpaired artifact image, and does the opposite. During the learning process, the discriminator is used for learning the distribution of the clean image.

At test stage, as shown in Fig. 2(b), the low-quality image is Fourier transformed to -space, and it is randomly subsampled using different masks, creating images containing aliasing artifacts. These images are reconstructed in parallel with the trained network , which is aggregated by Eq. (5). For simplicity, for all the following experiments, we set .

Note that the reconstruction need not be started from raw -space. DICOM images can be directly utilized, and hence our method can be applied in a wide variety of clinical situations, where the -space raw data is not available. Another great advantage is that the same neural network can be used with different sub-sampling factors at the test stage, adapting to user needs. When the sub-sampling factor is set to low values (e.g. 1.5, 2), reconstructed images are kept sharp without the presence of aliasing artifacts. When the factor is set to high values (e.g. 3, 4), bulk motion is best removed, albeit minor aliasing artifacts are introduced. For all the reported results, we set . The results and consequences will further be discussed in Section 5.1.

3 Methods

3.1 Training Dataset

To demonstrate the efficacy of our method, we initially performed a simulation study using human connectome project (HCP) dataset, containing magnitude only MR brain images. Specific parameters are shown in Table 1. Out of 190 patient volume scans, 150 volumes (3000 slices) were used for training, and 40 volumes (800 slices) for testing. To simulate the situation where the images are contaminated with motion artifacts along with blurring, we specify the parameters of the degradation to , and Procedure for motion artifact generation can be found in Section 3.5.

Moreover, another simulation study using the liver DW-MRI data was performed. Specifically, out of 100 high quality DWI patient scans (4831 slices), 80 patients were selected as training set and 20 patients were selected as test set. Simulated degradation was set identical to the HCP simulation study, and the motion artifact generation process is elaborated in 3.5.

Parameter Specification
Scanner Siemens 3T scanner
Echo train duration 1105
Time of Repetition(TR) [ms] 3200
Time of Echo (TE) [ms] 565
Matrix size 320 320
Resolution [mm] 0.7 0.7 0.7
Table 1: Specifications for the 3D HCP dataset.

For in vivo study, we use 2D diffusion weighted MRI (DW-MRI) of liver acquired through single shot echo planar imaging (EPI) with specific parameters as specified in Table 2. All scans were taken as breath-hold MR scans. Moreover, scans were acquired via varying to quantify the ADC map.

Parameter Specification
Scanner Siemens Skyra 3T scanner
Time of Repetition(TR) [ms] 1800
Time of Echo (TE) [ms] 56
Echo Train Length (ETL) 37
Matrix size 232 180
Num. Excitation (NEX) 2
Resolution [mm] 2.0 2.0
Slice Thickness [mm] 6.0
Table 2: Specifications for the 2D In Vivo DWI acquisition.

Among 201 DWI patient scans, 100 (4831 slices) were manually classified as

bad quality, and 101 as good quality (3815 slices) by an attending radiologist. Consequently, the high quality images were used as training in Fig. 2, and the low quality images were used for inference. Notably, all the patient scans share the same scan parameters as in Table 2, and thus there are no clear distinctions between the images in two different distributions. In other words, the high quality images do not have much higher image resolution than the low quality images, which could hamper the training since target distribution is not well-defined. Nevertheless, our proposed method is still able to enhance the quality of DW-MRI images in a fully unsupervised fashion, as will be shown in the results section.

3.2 Network Architecture

Figure 4: Network architecture used for training and bootstrap aggregation. (a) Generator architecture adopted from U-Net (ronneberger2015u), (b) PatchGAN Discriminator architecture adopted from (isola2017image).
Figure 5: Reconstruction results of HCP data simulation study: (a) input images with artifact, and reconstruction results with (b) ESRGAN (wang2018esrgan), (c) cycleGAN (zhu2017unpaired), (d) Oh et al. (oh2020unsupervised), and (e) the proposed method. (f) shows the label images. Images in the yellow box show the magnified version of the specific patch, and the images in the blue blox represent the difference. Numbers in white indicate PSNR, and SSIM values, respectively.
Figure 6: Reconstruction results DW-MRI simulation study: (a) input images with artifact, and reconstruction results with (b)ESRGAN (wang2018esrgan), (c) cycleGAN (zhu2017unpaired), (d) Oh et al. (oh2020unsupervised), and (e) proposed method. (f) shows the label images. Images in the yellow box show the magnified version of the specific patch, and the images in the blue blox represent the difference. Numbers in white indicate PSNR, and SSIM values, respectively.
Figure 7: Reconstruction results of in vivo DW-MRI acquisition: (a) input images with artifact, and reconstruction results with (b) ESRGAN (wang2018esrgan), (c) cycleGAN (zhu2017unpaired), (d) Oh et al. (oh2020unsupervised), and (e) the proposed method. Images in the yellow, and blue blox show the magnified regions of each image. Yellow arrow indicates the region where motion artifact makes the boundary blurry, while the proposed method removes the artifact successfully.

Network architecture for the generator and discriminator used in Fig. 2 (a),(b) are illustrated in Fig. 4. The architecture of CNN generator depicted in Fig. 4(a) was adopted from U-Net (ronneberger2015u), with 4 stages of pooling/unpooling, the number of filter channels initially set to 64 at stage 0, which doubles at every stage, reaching 1024 at stage 4. At each stage, blocks are made up of 3 3 convolution, group normalization with group

, and ReLU activation. The last feature activation of the encoder stage are concatenated to the decoder layer of the corresponding stage, and at the last layer, 1

1 convolution is used.

For the discriminator, 2D PatchGAN architecture was adopted from (isola2017image), as shown in Fig. 4(b). Specifically, each block consists of 4

4 convolution with stride

, Instance normalization, and LeakyReLU activation. For additional stability, we also used spectral normalization at every convolution block. Last layer of the network consists of 11 convolution to get the final output.

3.3 Training Details

For training, the hyper-parameter for the loss function was set to

, and the networks were trained for 150 epochs with Adam optimizer using the default parameters

with the initial learning rate of , which was decayed by a factor of 10 at the 100th

epoch. Batch size for training was set to 1. The discriminator was updated once at every update step of the generator. For pre-processing, we normalize the image by the standard deviation of each slice. For

-space sub-sampling, we use 1D Gaussian random sampling with the acceleration factor of , selected randomly at each update step. The masks generated for the acceleration factor also contains central autocalibration signal (ACS) region which take up 6% and 11% of the

-space central lines, respectively. All experiments were performed using PyTorch. Training took about a day using NVIDIA GeForce GTX 2080-Ti GPU.

3.4 Comparative Algorithms

We performed comparative study using the state-of-the-art methods for MR super-resolution and motion artifact correction: ESRGAN (wang2018esrgan), cycleGAN (zhu2017unpaired), and Oh et al. (oh2020unsupervised).

First, we compared our method to ESRGAN, since this is the method utilized in many variants of MR super-resolution (albay2018diffusion; fan2020generative). Since ESRGAN is based on supervised training, we generated the training dataset by retrospectively down-sampling the training dataset by the size factor of , and up-sampling the data by bicubic upsampling. Subsequently, the network was trained using the same network architectures described in Section 3.2 for fair comparison, using the parameters .

CycleGAN (zhu2017unpaired) is also chosen as one of the comparative algorithms, which is widely used when forward/backward mapping between two arbitrary distributions is needed (sim2020optimal). The networks utilized in cycleGAN are set to the same as the ones described in Section 3.2. The only difference is that the vanilla cycleGAN architecture requires two sets of generator/discriminator, which requires double the parameters needed compared to the proposed method.

We finally compare our method to the most recent Oh et al. (oh2020unsupervised), which is one of the state-of-the-art algorithms for motion correction, and also the most similar to the proposed method. The network architectures were kept as the same with the proposed method. Parameters in the training/testing phase were all set to the same parameters advised in the original paper.


Image type Metric Input ESRGAN (wang2018esrgan) cycleGAN (zhu2017unpaired) Oh et al. (oh2020unpaired) Proposed
Brain HCP PSNR[db] 28.33 29.56 30.03 32.21 34.17
SSIM 0.651 0.657 0.883 0.922 0.946
Liver DW-MRI PSNR[db] 35.00 35.68 30.75 38.90 40.04
SSIM 0.763 0.802 0.780 0.938 0.961


Table 3: Quantitative evaluation of simulation study using Brain HCP data, and liver DW-MRI.

3.5 Motion Artifact Simulation

For the performed simulation study with HCP dataset, we simulated the motion artifact, following the procedures in (tamada2020motion), and the specific parameters used in (oh2020unsupervised). More specifically, the phase shift in eq. (3) is defined as


where is the random phase variation sampled as at each phase-encoding line, and the threshold is statically set to .

Motion artifact simulation for liver DW-MRI data follows (tamada2020motion) for breathing motion. Namely, different from Eq. (16), the phase shift emerges as a sinusoidal function:


where , , and . The phase shift in Eq. (17) induces periodic motion artifacts in liver DW-MRI.

4 Experimental Results

4.1 Simulation Study

For quantitative evaluation of the proposed method, we first perform a simulation study using the HCP dataset, which is illustrated in Fig. 5. We can see both quantitatively and qualitatively the superiority of our method compared to other methods. Namely, ESRGAN is able to boost the resolution of the artifact image as shown in Fig. 5(b), but it also sharpens the motion artifacts along with it. CycleGAN in Fig. 5(c) does remove the artifacts and sharpens the image to some extent, but it also introduces unwanted artifacts, and has low quantitative metric. Oh et al. in Fig. 5 efficiently removes the motion artifacts, but produces blurry images due to the degradation. On the other hand, the proposed method in Fig. 5

(e) successfuly removes motion artifacts while sharpening the image overall, closely approximating the target image. Quantitative metrics depicting peak signal-to-noise-ratio (PSNR) and structural similarity index (SSIM) in Table 

3 also show that the results are in fact superior.

Furthermore, we present a simulation study using simulated test set retrospectively acquired from DW-MRI scans, as elaborated in Section 3.1. Results can be seen in Fig. 6 (a-e), where we constantly see that the proposed method is able to closely estimate the label image, devoid of artifacts. Moreover, also in terms of quantitative metrics Table 3, the proposed method outperforms Oh et al. (oh2020unsupervised) by more than 1 db in PSNR. In contrast, other methods other than the proposed method show their own weaknesses. ESRGAN sharpens the motion artifacts along with the boundaries. CycleGAN makes the overall image sharper, but often breaks down as can be seen in the second row of Fig. 6, and has low PSNR values. Oh et al. is excellent at removing the motion artifacts, but the details are omitted.

4.2 In Vivo Study

Experiments with real DW-MRI data can be seen in Fig. 7. Consistent with the results from the simulation study, we again observe that our method in Fig. 7 outperforms all the other methods, effectively removing the motion artifacts while making the image sharper. In contrast, reconstruction results with cycleGAN in Fig. 7(b) does sharpen the image, but introduces unwanted artifacts, and changes the content of the original image, best seen in the third row of the figure. ESRGAN in Fig. 7(c) sharpens the image by a little margin, but the artifact remains intact. Oh et al. (oh2020unsupervised) in Fig. 7(d) does remove the motion artifacts, but the image still looks blurry.

Moreover, zooming into the images shown in the second row of Fig. 7, we see the region of motion artifact enclosed in the yellow box. We observe that the boundaries are blurry, and thus the structure is not clearly seen in all methods except ESRGAN and the proposed method. However, the proposed method is able to remove the artifacts so that we can visualize the sharp boundaries again. ESRGAN is also able to sharpen the edges, but it also sharpens the artifacts.

4.3 Quality Enhancement of ADC Maps

The calculation of ADC map via conventional methods often results in low resolution, noisy maps, mostly due to the pixel-wise regression from the different images of varying values. When the images are not perfectly aligned, this also contributes to the error in the ADC maps, making it harder to estimate the high quality version of the ADC map. We conjectured that since our network learned the ability to enhance the resolution and decrease noise while compensating for motion artifacts, our network could also be applied to ADC maps directly. The results are demonstrated in Fig. 8. The network used for DW-MRI reconstruction used in Section 4.2 was used to enhance the quality of ADC maps acquired from the vendor using conventional method (rosenkrantz2011diffusion; koh2007diffusion). We see in Fig. 8(a) that the original maps are quite noisy, which could hamper clinical utility. On the other hand, the reconstruction results with the proposed method in Fig. 8 removes noise and enhances the resolution, creating a cleaner version of the ADC maps.

Figure 8: Proposed method directly applied to ADC maps. (a) ADC maps calculated with the conventional method (acquired from vendor) (b) ADC maps reconstructed via the proposed method.
Figure 9: Effect on the reconstructed results by varying the acceleration factor at test stage. (a) Input artifact images with two different artifact i),ii). (b) Reconstruction results at different acceleration factors. Yellow arrows indicate the area where bulk motion artifacts are observed. Green arrows point where minor aliasing artifacts are introduced.

5 Discussion

5.1 Effect of varying sub-sampling factor

It should be noted that the same network can be utilized differently, by changing the bootstrap sub-sampling factor at the inference stage, depicted in Fig. 4(b). Specifically, the sub-sampling operator can be chosen freely, as long as the acceleration factor does not exceed beyond the maximum factor that was used in the training step. In our case, since at the training stage the factor was chosen as , we performed reconstruction using , and the results are reported in 9. With both examples, we see that on low acceleration factors, images tend to be cleaner and sharper. As the acceleration factor increases however, we observe that motion artifacts are better corrected, but minor aliasing artifacts are introduced along with it, making the image less clean. Hence, radiologists would be able to control the factor, switching the sub-sampling factor to higher values when more motion correction is desired.

5.2 Ablation study

One might question the importance of the degradation block being stochastic. Many factors account for this: for one, the amount of degradation varies across the low quality images, and hence the network needs to learn the variability in the degradation process. Furthermore, sampling the degradation block from certain distribution also has augmentation effect, which could also enhance the expressivity of the network. To strengthen our statement, we performed an ablation study using simulated DW-MRI dataset, as shown in Table 4.

Type of PSNR[db] SSIM
kernel: ✗,   noise: ✗ 35.94 0.926
kernel: ✗,   noise: ✓ 38.68 0.959
kernel: ✓,   noise: ✗ 39.97 0.960
kernel: ✓,   noise: ✓ 40.04 0.961
Table 4: Ablation study on the stochasticity of the degradation block, . For kernel, ✗: fixed , ✓: . For noise, ✗: no noise added, ✓: .

Notably, we can see that keeping the degradation block stochastic, along with injecting random noise produce the best results. Without the noise factor, the metrics slightly decrease, possibly due to noise enhancement in the super-resolution process. Using fixed kernel drops the metrics even more, and finally, when the kernel is fixed and no noise is injected, the performance drops heavily, and we can clearly see the importance of the stochasticity in .

5.3 Limitations in Acquiring High Quality Target Distribution

Unsupervised learning, especially in the context of image enhancement, is related to learning the optimal transport mapping from the low-quality input distribution to the high-quality target distribution (sim2020optimal). Consequently, a crucial factor which controls the ability of the network is the quality of the training target distribution. When the networks are trained with well-maintained, superior quality images constituting the target distribution, the network learns well, and also performs well at test stage. Unfortunately, the high-quality target distribution utilized in the in vivo study in Section 4.2 share the same scan acquisition parameters with the low-quality input distribution. For example, the number of excitations (NEX) in EPI sequence is the decisive factor of image quality, which is set to 2 in both input and target images. Although better quality images are manually curated by experienced radiologists for use as a target dataset, this still places limitations on the degree of quality improvement that can be achieved by the trained network. If we were to acquire unpaired higher quality scans with higher NEX as the target distribution, which is in fact possible in DW-MRI, we conjecture that our network will be able to enhance the quality of DW-MRI even further. Having said that, our novel method, which is a proof-of-concept, can further be enhanced in a future study using a well controlled clinical dataset.

6 Conclusion

In this work, we proposed a novel unsupervised deep learning method for simultaneous super-resolution and motion artifact correction. The network is trained with the physics-informed OT-cycleGAN which learns to invert the blurring process and reconstruct the under-sampled -space. Subsequently, the trained network can be flexibly used in bootstrap sub-sampling and aggregation process to enhance the low quality DW-MRI images. With extensive experiments using both simulated and in vivo studies, we showed clear superiority of our method over existing methods. Our framework is flexible in that it can be extended to other imaging situations and modalities, not to mention arbitrary artifacts that are hard to define or solve in a supervised fashion. Applying our method of quality enhancement to other domains could be an interesting directory of study. Finally, the proposed work does not fully analyze the impact and influence of the quality enhancement in clinical perspective, which could also be a promising direction of research.


This work was supported by the National Research Foundation of Korea under Grant NRF-2020R1A2B5B03001980