A Learning-from-noise Dilated Wide Activation Network for denoising Arterial Spin Labeling (ASL) Perfusion Images

05/15/2020 ∙ by Danfeng Xie, et al. ∙ 37

Arterial spin labeling (ASL) perfusion MRI provides a non-invasive way to quantify cerebral blood flow (CBF) but it still suffers from a low signal-to-noise-ratio (SNR). Using deep machine learning (DL), several groups have shown encouraging denoising results. Interestingly, the improvement was obtained when the deep neural network was trained using noise-contaminated surrogate reference because of the lack of golden standard high quality ASL CBF images. More strikingly, the output of these DL ASL networks (ASLDN) showed even higher SNR than the surrogate reference. This phenomenon indicates a learning-from-noise capability of deep networks for ASL CBF image denoising, which can be further enhanced by network optimization. In this study, we proposed a new ASLDN to test whether similar or even better ASL CBF image quality can be achieved in the case of highly noisy training reference. Different experiments were performed to validate the learning-from-noise hypothesis. The results showed that the learning-from-noise strategy produced better output quality than ASLDN trained with relatively high SNR reference.



There are no comments yet.


page 4

page 6

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Arterial spin labeling (ASL) perfusion MRI provides a non-invasive way to quantify cerebral blood flow (CBF) [3, 16]. In ASL, the arterial spin labeled image (L image) and the spin untagged image (the control image or C image) were subtracted pair-wisely to generate CBF maps using an appropriate compartment model [1]. The SNR of ASL CBF maps is inherently low due to the longitudinal relaxation rate (T1) of blood water and the post-labeling transmit process. To improve SNR of the mean CBF maps, a few pairs of L/C images are often acquired. Due to the limitation of total scan time, only 10-50 L/C pairs can be acquired, resulting in a moderate SNR gain by signal averaging. Many post-processing methods have been proposed to denoise ASL MRI [2, 15, 9] but often with minor to moderate improvement. A main reason is that those methods are based on signal and noise models which are often incomplete or inaccurate for physiological measurements in the case of high noise level.

Nowadays, the major focus of denoising has been increasingly shifted to deep machine learning (DL) given its superb performance for capturing nonlinear and complex data relationship [7]

. The most widely used deep neural networks consist of multiple layers of receptive field constrained local filters which are trained layer by layer by error backpropagations


and are often called convolutional neural networks (CNN). The local feature extraction, hierachical abstraction, step-wise backpropagation of CNN and the introduction of several training strategies such as weight drop-out, batch-normalization, skip connection, and residual learning make CNN highly flexible and capable for modeling very complex and nonlinear functions buried in a real-world data such as medical imaging

[12, 11].

Several groups have used DL in ASL MRI denoising [6, 17]. Different from other denoising applications, DL-based ASL denoising network (ASLDN) doesn’t have noise-free training references. Accordingly, its denoising performance might be uplimited by the reference image SNR. Interestingly, that potential uplimit doesn’t seem to exist as several studies [6, 13, 17, 19, 18] showed that ASLDN could produce CBF images with even higher SNR than the reference. Lehtinen et al. [8] further investigated this learning-from-noise (LFN) phenomenon in more general settings. The main purpose of this study is to validate that DL-based ASL denoising models can be trained using only noisy image pairs. We show that this new learning-from-noise ASLDN (dubbed as ASLDN-LFN) does not require any quasi-noise-free reference during the training process while it can achieve similar or even better denoising performance than the previous ASLDN that was trained using quasi-noise-free reference images.

2 Methods

2.1 Problem formulation

Similar to [8], the assumption of ASLDN-LFN is that both the noisy reference and the noisy input CBF maps

are drawn from the same data distribution. When minimizing the L2 loss function:

, a CNN regressor is to find the optimum at the arithmetic mean of the observations given enough training samples [8]. This training process converges exactly with the process of averaging one subject’s all CBF maps to generate quasi-noise-free mean CBF maps (i.e., . Thus, it is not necessary to obtain quasi-noise-free references to train ASLDN.

However, considering ASL CBFs have excessive outliers, training with L1 loss is preferred. This is because training with L1 loss (

) is to find the median of the observations, and the median of the observations is more robust to outliers than the arithmetic mean of the observations [5]. We also conducted experiments to compare the effects of training with L1 loss versus training with L2 loss.

2.2 Network architecture

ASLDN-LFN using the Dilated Wide Activation Network (DWAN) that was proposed in [18]. As shown in Figure 1

, DWAN has two pathways. The difference between the local pathway and global pathway is that the first convolution layer of the 4 wide activation residual blocks in the global pathway used a dilation rate of 2, 4, 8 and 16 respectively. The local pathway extracts the local features and the global pathway uses dilation convolutions to extract global data patterns. Furthermore, the wide activation residual blocks in DWAN are able to expand data features and pass more information through the network, improving performances for low-level computer vision tasks without additional parameters and computation

[21, 4]. By combining the two-pathway structure and the wide activation residual block, this new CNN structure (DWAN) improves the denoising performance in ASLDN-LFN.

Figure 1:

Schematic illustration of the architecture of our proposed DWAN network. The first layer consists of 3×3×32 convolutional filters for the input image. Then the output of the first layer was fed to the both local pathway and global pathway. Each pathway contains 4 consecutive wide activation residual blocks. Each wide activation residual block contains two convolutional layers (3×3×128 and 3×3x32) and one activation function layer. The 3×3×128 convolutional layers in the global pathway were dilated convolutional layers with a dilation rates of 2, 4, 8, 16, respectively. The output of the local pathway and global pathway were concatenated and fed to another 3×3×1 convolutional filter. The 3×3×1 convolutional layer was attached to the end to get the predicted output image with additional input from the input image with 3×3×1 convolution. (a×b×c indicates the property of convolution. a×b is the kernel size of one filter and c is the number of the filters).

2.3 Data preparation and model training

ASL data were pooled from 280 subjects in local database. The data were acquired with a pseudo-continuous ASL sequence (40 control/labeled image pairs with labeling time = 1.5 sec, post-labeling delay = 1.5 sec, FOV=22x22 cm, matrix=64x64, TR/TE=4000/11 ms, 20 slices with a thickness of 5 mm plus 1 mm gap). ASLtbx [14] was used to preprocess ASL images using the following updated procedures: 1) ASL-specific motion correction method was applied to the raw ASL images (C/L images) to correct systematic label/control labeling induced spurious motions [15]; 2) the average of all 40 C/L image pairs was calculated and used as a template for registering the ASL C/L images to the high-resolution T1 image. Registration was performed with SPM12; 3) simple regression was used to regress out residual motions, mean CSF signal, and global signal; 4) adjacent C and L images were subtracted using simple subtraction to generate perfusion-weighted images which were then converted into quantitative CBF using the same method as in [14]. M0 is approximated by the control image in each label/control image pair and M0 calibration is performed at each voxel separately using the value at the corresponding voxel location of the control image. Outlier CBF image timepoints were identified and removed using the prior-guided slice-wise adaptive outlier cleaning algorithm [9] ; 5) each subject’s structural MRI was spatially normalized to the Montreal Neurologic Institute (MNI) standard brain using SPM12. The same transform was then applied to the CBF image series.

CBF image slices from 200 subjects were used as the training dataset. CBF images from 20 different subjects were used for validation. The remaining 60 subjects were used as the testing set. Input to ASLDN-LFN was the axial slice. All CBF maps were spatially normalized into the Montreal Neurological Institute (MNI) space. For each subject, we extracted one out of every three axial slices from slice 36 to slice 60 of 3D CBF maps in the MNI space and the 2D CBF maps were 109 91 pixels. The 40 ASL CBF images of each subject were divided into 4 time segments, each with 10 successively acquired images. The mean maps of the 1st segment and the 2nd segment were taken as the input and the corresponding reference for DL model training. Another set of input-reference image pairs was obtained from the mean CBF maps of the 3rd and the 4th segment. During model testing, the mean CBF image slices of the first 10 L/C pairs (in the first time segment) were used as the input.

Due to intrinsic low SNR of ASL MRI, the input and reference CBF maps were already contaminated with severe noise (As Fig. 2.A. shows). Therefore, no additional artificial noise were added to input and reference CBF maps. Mean CBF maps of the entire 40 L/C image pairs with with Gaussian smoothing (FWHM = 3mm) and state-of-art outlier cleaning [9] were used as pseudo gold standard . Comparing with previous method ASLDN [17] using pseudo gold standard as training references, the proposed ASLDN-LFN only used noisy data as the training reference. U-Net [20] and DilatedNet [6], two popular CNN structure widely used in medical imaging, were implemented as a comparison to our DWAN-based ASLDN-LFN. In all networks, batch-normalization (BN) was removed to avoid the potential errors as demonstrated in [10] (we noticed that in additional experiments). Additional experiments were conducted to compare the effects of different loss function (L1 and L2) on denoising performance.

We used Keras and Tensorflow platform to implement all the DL algorithms. Network training was through the adaptive moment estimation (ADAM) algorithm with a learning rate of 0.001 and a batch size of 64. All experiments were performed on a PC with Intel(R) Core(TM) i7-5820k CPU @3.30GHz and a Nvidia GeForce Titan Xp GPU.

2.4 Evaluation metrics

We used Peak signal-to-noise ratio (PSNR) and structure similarity index (SSIM) to quantitatively compare the performance of DWAN with U-Net and DilatedNet. When computing PSNR and SSIM, pseudo gold standard (mean CBF from entire 40 L/C pairs) were used as groundtruth.

SNR and Grey Matter/White Matter (GM/WM) contrast were calculated to measure the image quality of ASL CBF. The SNR was calculated by using the mean signal of a grey matter (GM) region-of-interest (ROI) divided by the standard deviation of a white matter (WM) ROI in slice 50. The GM/WM contrast was calculated as the mean value of GM masked area divided by the mean value of WM masked area.

The Correlation coefficient between the DL-produced CBF values and pseudo gold standard were calculated to measure the similarity of the DL-produced CBF values to those processed with non-DL methods. This process was performed at each voxel for ASLDN and ASLDN-FLN separately. The correlation coefficient maps were thresholded by r0.3 for the purpose of comparison and display.

3 Results

Figure 2: Mean CBF images of a representative subject. The rows from top to bottom are: A. mean CBF maps generated from 10 L/C paris (input to ASLDN-LFN); B. mean CBF maps from all 40 L/C pairs with Gaussian smoothing (FWHM = 3mm) and outlier cleaning (pseudo gold standard); C. output of ASLDN; and D. output of ASLDN-LFN. Only 5 axial slices were shown in each row.

Figure 2 shows the mean CBF maps produced by different algorithms. Compared to pseudo gold standard (Fig. 2.B.) and the output of ASLDN (Fig. 2.C.), the CBF maps produced by ASLDN-LFN (Fig. 2.D.) showed substantially improved quality in terms of suppressed noise and better perfusion contrast between tissues. Figure 3

shows the notched box plot of the SNR and GM/WM contrast from 60 test subjects’ mean CBF maps processed with different methods. The average SNR of pseudo gold standard, the output of ASLDN and the output of ASLDN-LFN were 5.87, 6.36 and 8.06 respectively. The average GM/WM contrast of pseudo gold standard, the output of ASLDN and the output of ASLDN-LFN were 2.14, 2.15 and 2.32. ASLDN-LFN achieved better SNR and GM/WM contrast than ASLDN and pseudo gold standard (paired t-test,


Figure 3: The notched box plot of the SNR (left) and GM/WM contrast (right) from 60 test subjects’ CBF maps with different processing methods.
Figure 4: Correlation coefficient maps of ASLDN (top) and ASLDN-LFN (bottom). Only 5 axial slices were shown. Correlation coefficients less than 0.3 were thresholded to be 0.

Figure 4 shows the correlation coefficient maps of ASLDN and ASLDN-LFN. Correlation coefficient at each voxel was calculated between the pseudo gold standard and network output. Outputs of ASLDN and ASLDN-LFN strongly correlated to the pseudo gold standard, proving that both networks can preserve individual subjects’ CBF patterns while suppressing noise. Output of ASLDN-LFN showed less correlation to input in WM because ASLDN-LFN removed more noises in WM than ASLDN.

ASLDN ASLDN Learning-from-noise
Model U-Net DilatedNet DWAN U-Net DilatedNet DWAN
PSNR 24.53 24.92 25.26 24.84 25.06 25.28
SSIM 0.796 0.793 0.802 0.798 0.797 0.803
Table 1: The average PSNR and SSIM of mean CBF maps produced by different CNN architectures in different training schemes.

Table 1 lists the PSNR and SSIM performance of ASLDN [17] and the proposed ASLDN-LFN with or without using the DWAN network structure. ASLDN-LFN showed higher PSNR and SSIM than previous ASLDN. Using DWAN in ASLDN and ASLDN-FLN provided higher PSNR and SSIM than U-Net and DilatedNet.

Figure 5: Mean CBF maps from one representative subject (Only 5 axial slices were shown in each row). The rows from top to bottom are: A. mean CBF maps generated from 10 L/C paris (input to ASLDN-LFN); B. mean CBF maps from all 40 L/C pairs with Gaussian smoothing (FWHM = 3mm) and outlier cleaning (pseudo gold standard); C. outputs of ASLDN-LFN trained with L2 loss; D. and outputs of ASLDN-LFN trained with L1 loss

Figure 5 shows the result of ASLDN-LFN that was trained with different loss functions. When input mean CBF maps contained large amounts of outliers, ASLDN-LFN trained with L2 loss was affected, resulting in deteriorated perfusion in grey matter area. ASLDN-LFN trained with L1 loss, in contrast, remains unaffected due to its robustness to outliers. PSNR and SSIM are used to quantitatively measure the denoising performance of ASLDN-LFN trained with L1 loss and L2 loss. PSNR and SSIM were 24.40 and 0.677 when ASLDN-LFN was trained with L2 loss, whereas PSNR and SSIM were 25.28 and 0.803 When ASLDN-LFN was trained with L1 loss.

4 Conclusion

In this study, we propose ASLDN-LFN to show that DL-based ASL denoising models can be trained using only noisy image pairs. The experimental results demonstrate that ASLDN-LFN can reliably denoise ASL CBF and even achieve improved image quality than ASLDN in terms of SNR and GM/WM contrast. Besides, we show that training using L1 loss is more robust to outliers than training using L2 loss for ASLDN-LFN. Moreover, by using ASLDN-LFN, more training data can be generated as it requires less L/C pairs to generate reference mean CBF maps, which is particular useful when ASL CBF data are limited.


This work was supported by NIH/NIA grant: 1 R01 AG060054-01A1


  • [1] D. C. Alsop et al. (2015) Recommended implementation of arterial spin-labeled perfusion MRI for clinical applications. Magnetic resonance in medicine 73 (1), pp. 102–116. Cited by: §1.
  • [2] Y. Behzadi, K. Restom, J. Liau, and T. T. Liu (2007) A component based noise correction method (compcor) for bold and perfusion based fMRI. Neuroimage 37 (1), pp. 90–101. Cited by: §1.
  • [3] J. A. Detre, J. S. Leigh, D. S. Williams, and A. P. Koretsky (1992) Perfusion imaging. Magnetic resonance in medicine 23 (1), pp. 37–45. Cited by: §1.
  • [4] Y. Fan, J. Yu, and T. S. Huang (2018) Wide-activated deep residual networks based restoration for bpg-compressed images. In

    Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops

    pp. 2621–2624. Cited by: §2.2.
  • [5] P. J. Huber (1992) Robust estimation of a location parameter. In Breakthroughs in statistics, pp. 492–518. Cited by: §2.1.
  • [6] K. H. Kim, S. H. Choi, and S. Park (2017) Improving arterial spin labeling by using deep learning. Radiology 287 (2), pp. 658–666. Cited by: §1, §2.3.
  • [7] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §1.
  • [8] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila (2018) Noise2noise: learning image restoration without clean data. arXiv preprint arXiv:1803.04189. Cited by: §1, §2.1.
  • [9] Y. Li, S. Dolui, D. Xie, Z. Wang, A. D. N. Initiative, et al. (2018) Priors-guided slice-wise adaptive outlier cleaning for arterial spin labeling perfusion MRI. Journal of neuroscience methods 307, pp. 248–253. Cited by: §1, §2.3, §2.3.
  • [10] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee (2017)

    Enhanced deep residual networks for single image super-resolution

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 136–144. Cited by: §2.3.
  • [11] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. van der Laak, B. van Ginneken, and C. I. Sánchez (2017) A survey on deep learning in medical image analysis. Medical image analysis 42, pp. 60–88. Cited by: §1.
  • [12] D. Shen, G. Wu, and H. Suk (2017) Deep learning in medical image analysis. Annual review of biomedical engineering 19, pp. 221–248. Cited by: §1.
  • [13] C. Ulas, G. Tetteh, S. Kaczmarz, C. Preibisch, and B. H. Menze (2018) DeepASL: kinetic model incorporated loss for denoising arterial spin labeled MRI via deep residual learning. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 30–38. Cited by: §1.
  • [14] Z. Wang, G. K. Aguirre, H. Rao, J. Wang, M. A. Fernández-Seara, A. R. Childress, and J. A. Detre (2008) Empirical optimization of asl data analysis using an asl data processing toolbox: asltbx. Magnetic resonance imaging 26 (2), pp. 261–269. Cited by: §2.3.
  • [15] Z. Wang (2012) Improving cerebral blood flow quantification for arterial spin labeled perfusion mri by removing residual motion artifacts and global signal fluctuations. Magnetic resonance imaging 30 (10), pp. 1409–1415. Cited by: §1, §2.3.
  • [16] D. S. Williams, J. A. Detre, J. S. Leigh, and A. P. Koretsky (1992) Magnetic resonance imaging of perfusion using spin inversion of arterial water. Proceedings of the National Academy of Sciences 89 (1), pp. 212–216. Cited by: §1.
  • [17] D. Xie, L. Bai, and Z. Wang (2018) Denoising arterial spin labeling cerebral blood flow images using deep learning. arXiv preprint arXiv:1801.09672. Cited by: §1, §2.3, §3.
  • [18] D. Xie, Y. Li, H. Yang, L. Bai, T. Wang, F. Zhou, L. Zhang, and Z. Wang (2020) Denoising arterial spin labeling perfusion mri with deep machine learning. Magnetic Resonance Imaging. Cited by: §1, §2.2.
  • [19] D. Xie, Y. Li, H. Yang, D. Song, Y. Shang, Q. Ge, L. Bai, and Z. Wang (2019) BOLD fmri-based brain perfusion prediction using deep dilated wide activation networks. In International Workshop on Machine Learning in Medical Imaging, pp. 373–381. Cited by: §1.
  • [20] J. Xu, E. Gong, J. Pauly, and G. Zaharchuk (2017) 200x low-dose pet reconstruction using deep learning. arXiv preprint arXiv:1712.04119. Cited by: §2.3.
  • [21] J. Yu, Y. Fan, J. Yang, N. Xu, Z. Wang, X. Wang, and T. Huang (2018) Wide activation for efficient and accurate image super-resolution. arXiv preprint arXiv:1808.08718. Cited by: §2.2.