Machine Friendly Machine Learning: Interpretation of Computed Tomography Without Image Reconstruction

12/03/2018 ∙ by Hyunkwang Lee, et al. ∙ 0

Recent advancements in deep learning for automated image processing and classification have accelerated many new applications for medical image analysis. However, most deep learning applications have been developed using reconstructed, human-interpretable medical images. While image reconstruction from raw sensor data is required for the creation of medical images, the reconstruction process only uses a partial representation of all the data acquired. Here we report the development of a system to directly process raw computed tomography (CT) data in sinogram-space, bypassing the intermediary step of image reconstruction. Two classification tasks were evaluated for their feasibility for sinogram-space machine learning: body region identification and intracranial hemorrhage (ICH) detection. Our proposed SinoNet performed favorably compared to conventional reconstructed image-space-based systems for both tasks, regardless of scanning geometries in terms of projections or detectors. Further, SinoNet performed significantly better when using sparsely sampled sinograms than conventional networks operating in image-space. As a result, sinogram-space algorithms could be used in field settings for binary diagnosis testing, triage, and in clinical settings where low radiation dose is desired. These findings also demonstrate another strength of deep learning where it can analyze and interpret sinograms that are virtually impossible for human experts.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 5

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Continued rapid advancements in algorithms and computer hardware have accelerated progress in automated computer vision and natural language processing. By combining these two factors with the availability of well-annotated large datasets, significant advances have emerged from automated medical image interpretation for the detection of disease and critical findings

Esteva et al. (2017); Gulshan et al. (2016); Chilamkurthy et al. (2018). The application of deep learning has the potential to increase diagnostic accuracy and reduce delays in diagnosis and treatment for better patient outcomes Thrall et al. (2018). Deep learning techniques are not limited to image analysis, but they also can improve image reconstruction for magnetic resonance imaging (MRI) Wang et al. (2016); Zhu et al. (2018), computed tomography (CT) Xie et al. (2018); Jin et al. (2017), and photoacoustic tomography (PAT) Antholzer et al. (2018). Deep learning now is a feasible alternative to well-established analytic and iterative methods of image reconstruction Wang et al. (2018); Do et al. (2014); Do and Karl (2014); Do et al. (2013, 2011).

However, most prior work using deep learning algorithms has focused on image analysis of reconstructed images or as an alternative approach to image reconstruction. Despite this human centric approach, there is no reason that deep learning algorithms must function in image-space. Since all the information in the reconstructed images is present in the raw measurement data, deep learning models could potentially derive features directly from raw data in sinogram-space without intermediary image reconstruction, with possibly even better performance than models trained in image-space. In this study, we determined the feasibility analyzing computed tomography (CT) projection data - sinograms - through a deep learning approach for human anatomy identification and pathology detection. We proposed a customized convolutional neural network (CNN) called SinoNet, optimized it for interpreting sinograms, and demonstrated its potential by comparing its performance to pre-existing system based on other CNN architectures using reconstructed CT images. This approach accelerates edge computing by making it possible to identify critical findings rapidly from the raw data without time-consuming image reconstruction processes. In addition, this could enable us to develop simplified scanner hardware for the direct detection of critical findings through SinoNet alone.

2 Results

2.1 Experimental design

We retrieved 200 contiguous whole body CT datasets from combined positron emission tomography-computed tomography (PET/CT) examinations for body part recognition and 720 non-contrast head CT scans for intracranial hemorrhage (ICH) detection with IRB approval from the picture archiving and communication systems at our quaternary referral hospital. Axial slices in the 200 whole body scans were annotated as sixteen different body regions by a physician, and slices of the 720 head scans were annotated with the presence of hemorrhage by a panel of five neuroradiologists by consensus (Methods). We evaluated twelve different classification models developed by training Inception-v3 Szegedy et al. (2016) on reconstructed CT images and SinoNet with sinograms (Table 1, Methods). The reconstructed CT images containing Hounsfield units (HU) were converted to scaled linear attenuation coefficients (LAC). Two-dimensional (2D) parallel-beam Radon transform was applied to the LAC slices (512x512 pixels) to generate a fully-sampled sinogram with 360 projections and 729 detector pixels (sino360x729), which was then uniformly subsampled in the horizontal direction (projection views) and averaged in vertical direction (detector pixels) by factors of 3 and 9 to obtain moderately sampled sinograms with 120 views by 240 pixels (sino120x240) and sparsely sampled sinograms with 40 views by 80 pixels (sino40x80).

Original CT images were used as fully sampled reconstructed images (recon360x729), and images reconstructed from the sparse sinograms (recon120x240 and recon40x80) were generated using a deep learning approach (FBPConvNet Jin et al. (2017)) followed by a conversion from LAC to HU. Reconstructed CT images and sinograms with predefined window-level settings were created to evaluate the effect of windowing: wrecon360x729, wrecon120x240, wrecon40x80; and wsino360x729, wsino120x240, wsino40x80 (Methods). Based on the scanning geometries and window-level settings described above, 12 CNN models were evaluated: 6 were developed by training Inception-v3 Szegedy et al. (2016) with reconstructed CT images and the other 6 were obtained by training SinoNet with sinograms (Table 1, Methods). Data for body part recognition was randomly split into training, validation, and test sets with balanced genders: 140 scans in training, 30 in validation, and 30 in testing. A similar dataset breakdown was performed for ICH detection with 478 scans in training, 121 in validation, and 121 in testing. Details of data preparation, CNN architecture, sinogram generation, and image reconstruction are described in Methods.

Fully sampled Moderately sampled Sparsely sampled
360 projections and 729 detectors 120 projections and 240 detectors 40 projections and 80 detectors
I1: recon360x729 (original CT) I3: recon120x240 I5: recon40x80
S1: sino360x729 S3: sino120x240 S5: sino40x80
I2: wrecon360x729 (windowed original CT) I4: wrecon120x240 I6: wrecon40x80
S2: wsino360x729 S4: wsino120x240 S6: wsino40x80
Table 1: Summary of the 12 different models evaluated in this study.
Figure 1:

Performance of 12 different models trained on reconstruction images and sinograms with varying numbers of projections and detectors for body part recognition. 95% confidence intervals (CIs) are indicated in black error bars. The purple and blue bars (I1-I6) compare the test accuracy of Inception-v3 trained with full dynamic range reconstructed images with abdominal window setting reconstructed images (window-level=40HU, window-width=400HU),. The green and red bars (S1-S6) compare the performance of SinoNet models trained with sinograms generated from full-range and windowed reconstructed images, respectively.

2.2 Results of body part recognition

Figure 1 shows test performance of the twelve different models for body part recognition. Models trained on fully sampled images had accuracies of 97.4% in image-space, 96.6% in sinogram-space, 97.9% in windowed-image-space, and 97.4% in windowed-sinogram-space. Moderately sampled images had model accuracies of 97.4% in image-space, 96.3% in sinogram-space, 97.9% in windowed-image-space, and 97.4% in windowed-sinogram-space. Sparsely sampled images had model accuracies of 97.1% in image-space, 96.2% in sinogram-space, 97.2% in windowed-image-space, and 97.1% in windowed-sinogram-space. These results imply that models trained and operating in image-space performed slightly better than sinogram-space (SinoNet) models for body part recognition, regardless of scanning geometry. Additionally, windowed input images consistently outperformed the ones with full-range images/sinograms.

Figure 2: ROC curves for performance of 12 different models trained with reconstruction images and sinograms with various sparsity configurations in numbers of projections and detectors. The purple and blue curves (I1-I6) correspond to performance of Inception-v3 trained with reconstruction images with a full dynamic range of HU values and brain window setting (window-level=50HU, window-width=100HU), respectively. The green and red curves (S1-S6) show performance of SinoNet models trained with sinograms generated from full-range and windowed reconstruction images, respectively. The areas under the curve (AUCs) for the 12 models are present in legends with their 95% CIs. Statistical significance of the difference between AUCs of paired models (Ix - Sx) was evaluated. n.s., p>0.05; * p<0.05; ** p<0.01.

2.3 Results of intracranial hemorrhage detection

Figure 2 depicts receiver operating characteristic (ROC) curves, and the corresponding areas under the ROC curves (AUC) for the twelve different models of ICH detection. Models trained on fully sampled images had AUCs of 0.898 in image-space, 0.918 in sinogram-space, 0.972 in windowed-image-space, and 0.951 in windowed-sinogram-space. Moderately sampled images had model accuracies of 0.893 in image-space, 0.915 in sinogram-space, 0.953 in windowed-image-space, and 0.947 in windowed-sinogram-space. Sparsely sampled images had model accuracies of 0.885 in image-space, 0.899 in sinogram-space, 0.909 in windowed-image-space, and 0.942 in windowed-sinogram-space

2.4 Comparison of SinoNet and Inception-v3 for analyzing sinograms

Table 2 details performance comparisons of Inception-v3 and SinoNet for interpreting fully-sampled sinograms (360 projection views and 729 detector pixels) for both body part recognition and ICH detection. SinoNet models significantly outperformed Inception-v3 models in both tasks.

Body part recognition (Accuracy) ICH detection (AUC)
Input Inception-v3 SinoNet Inception-v3 SinoNet
sino360x729 93.9% (93.4%-94.4%) 96.6% (96.2%-96.9%) 0.873 (0.849-0.895) 0.918* (0.899-0.935)
sino120x240 93.5% (93.0%-94.0%) 96.3% (95.9%-96.7%) 0.874 (0.851-0.896) 0.915* (0.897-0.932)
sino40x80 93.4% (92.9%-93.9%) 96.2% (95.8%-96.6%) 0.852 (0.828-0.876) 0.899* (0.879-0.917)
Table 2: Comparison of Inception-v3 and SinoNet network performance when both networks are trained on full-range sinograms are varying sampling densities for body part recognition and intracranial hemorrhage (ICH) detection. Body part recognition is reported in accuracy. ICH detection as AUC. 95% CIs in parentheses. * p<0.0001.
Figure 3: Examples of reconstructed images and sinograms with different labels for a body part recognition and b ICH detection. From left to right: original CT images, windowed CT images, sinograms with 360 projections by 729 detector pixels, and windowed sinograms 360x729. In the last row, an example CT with hemorrhage is annotated with a dotted circle in image-space with the region of interest converted into the sinogram domain using Radon transform. This area is highlighted in red on the sinogram in the fifth column.

3 Discussion

We have demonstrated that models trained on sinograms can achieve similar performance when compared to models using conventional reconstructed images for body part recognition and ICH detection in all three scanning geometries, despite the fact that the measurement data are not interpretable to humans. SinoNet, when trained with sinograms, has comparable performance with that of Inception-v3 when trained with reconstructed CT images for body part recognition, regardless of the number of projection views or detectors. For ICH detection, SinoNet trained with full-range sinograms outperformed Inception-v3 trained with full dynamic range reconstructed images for all three scanning geometries, with SinoNet significantly outperforming Inception-v3 when using windowed, sparsely sampled images. By applying window settings similar to what a radiologist would use, network performance increased significantly due to the improved target to background (Figure 3) in both reconstructed images and in sinogram-space. As depicted in Figure 3 (b), not only are the key features relevant to hemorrhage enhanced in the windowed CT image, but also in the windowed sinogram.

SinoNet, a customized convolutional neural network, was developed for analyzing sinograms through customized Inception modules with multi-scale convolutional and pooling layers

Szegedy et al. (2016). In SinoNet, the square convolutional filters in the original Inception module were replaced by various sized rectangular convolutional filters which include width-wise (projection dominant) and height-wise (detector dominant) filters. The customized architecture of SinoNet allowed for significantly improved performance in both body part recognition and ICH detection when compared with Inception-v3 models trained with sinograms, regardless of sampling density. These results imply that non-square filters may be effective in enabling models to learn the interplay between projection views and detector pixels from sinusoidal curves and to extract salient features from the sinogram domain for classification, a task thought to be impossible for human experts to grasp. This approach is similar to the one proposed for learning temporal and frequency features using rectangular convolution filters in spectrograms Pons et al. (2016).

SinoNet, by operating in sinogram-space, can accelerate image interpretation for pathology detection as complex computations for image reconstruction are not required. SinoNet also excels when the projection data was moderately or sparsely sampled, maintaining its AUC at 0.942 on the hemorrhage detection task, while Inceptionv3 dropped from 0.972 to 0.909. Sparsely sampled datasets suggest that radiation dose could be markedly decreased with only a slight degradation in performance for sinogram-space algorithms. The number of projections linearly correlates with radiation dose, theoretically achieving 33% and 89% dose reductions for moderately and sparsely sampled data respectively. Similarly, by reducing the size and number of detectors required for diagnostic CT data, cheaper and simpler CT scanners can be created. At our institution, the average head CT has a CTDIvol of 50 mGy. Sparsely sampled data could have CTDIvol between 6 and 16 mGy. One possible use of this technique would be to use the sinogram model as a first-line screening tool in the field setting without image reconstruction, subsequently prioritizing a patient for potential stroke therapy given no evidence of intracranial hemorrhage. Subsequent full-dose CT could be used to confirm the interpretation from the sinogram method. Another possible use for this technique would be to create “smart-scanners” which allow the CT scanner to adjust the protocol and field of view based on the intended region of the body.

Although these results demonstrate the power of the sinogram based approach, several important areas of future investigation remain. Due to their unavailability, the sinograms used in this study were simulated by applying the 2D parallel-beam Radon transform to the reconstructed CT images rather than actual measurement data acquired from CT scanners. Improved simulation data could be acquired by accounting for other advanced projection geometries - cone-beam or fan-beam - and considering Poisson noise when generating projection data. Although SinoNet trained with windowed sinograms achieved comparable or better performance compared with windowed reconstructed images, windowed sinograms were generated from reconstructed images that were postprocessed with predefined window settings; generation of windowed sinograms directly from CT measurement data is not straightforward, but it could be implemented by using energy-resolving, photon-counting detectors from multi-energy CT imaging to acquire measurements in multiple energy bins McCollough et al. (2015). Our work will need to be further validated by using raw data from clinical scanners as well as raw data from actual low-dose image acquisitions to see if performance remains robust despite increased image noise.

4 Methods

This HIPAA-compliant retrospective study was conducted with the approval of our institutional review board and under a waiver of informed consent.

4.1 Data collection and annotation

Body part recognition: a total of 200 contrast-enhanced PET/CT examinations of head, neck, chest, abdomen, and pelvis for 100 female and 100 male patients were retrieved from our institutional Picture Archiving and Communication System (PACS) between May 2012 and July 2012. 56,334 axial slices in the CT scans were annotated as one of sixteen body regions by a physician (Figure 6

). 15% of the total slices were randomly selected for use as validation data for hyperparameter tuning and model selection, 15% as test data for performance evaluation, and the rest as training data for model development (Table

3).

Intracranial hemorrhage (ICH) detection: a total of 720 5-mm non-contrast head CT scans were identified and retrieved from our PACS between June 2013 and July 2017. Every 5-mm thick axial slice (3,151 slices without ICH and 2,895 slices with ICH) was annotated by five board-certified neuroradiologists (blinded for review, 9 to 34 years experience) according to presence of ICH by consensus. The examinations included 201 cases without ICH and 519 cases with ICH, which were randomly split into train, validation, and test datasets at the case-level to ensure slices from the same case were not split across different datasets (Table 4).

Train Validation Test
No. Cases 140 (70F, 70M) 30 (15F, 15M) 30 (15F, 15M)
No. Images 39,472 8,383 8,479
L1: Head 1,980 483 435
L2: Eye lens 878 189 188
L3: Nose 1,449 309 323
L4: Salivary gland 1,803 361 349
L5: Thyroid 1,508 312 333
L6: Upper lung 1,632 345 392
L7: Thymus 3,213 727 672
L8: Heart 3,360 707 762
L9: Chest 4,647 914 935
L10: Upper abdomen 4,943 1,008 1,103
L11: Lower abdomen 1,736 342 368
L12: Upper pelvis 2,524 617 545
L13: Lower pelvis 2,230 563 422
L14: Bladder 3,144 609 766
L15: Upper leg 2,607 563 532
L16: Lower leg 1,818 334 354
Table 3: Distribution of training, validation, and test datasets for body part recognition. F, Female; M, Male
Train Validation Test
No. Cases No. Images No. Cases No. Images No. Cases No. Images
No ICH 141 2,202 30 474 30 475
ICH 337 1,915 91 490 91 475
Total 478 4,117 121 964 121 950
Table 4: Distribution of training, validation, and test datasets for ICH detection.

4.2 Sinogram generation

Simulated sinograms were utilized in this study instead of raw data obtained by commercial CT scanners as this was a retrospective analysis and access to raw projection data from patient CT scans could not be retrieved. To generate simulated sinograms, the pixel values of 512x512 CT images stored in DICOM file were first converted into scaled linear attenuation coefficients (LACs). Any calculated negative LAC was leveled to zero under the assumption that it is physically impossible to have negative LACs, so this result must represent random noise. Subsequently, three different sinograms were generated based on the scaled LAC images. First, we computed sinograms with 360 projection views over 180 degrees and 729 detectors (sino360x729), using the 2D parallel-beam Radon transform. sino360x729 were then used to produce sparser sinograms by uniformly subsampling projection views (in the horizontal direction) and averaging projection data from adjacent detectors (in the vertical direction) by factors of 3 and 9 to obtain sinograms with 120 projection views and 240 detectors (sino120x240) and sinograms with 40 projection views and 80 detectors (sino40x80), respectively (Figure 4). Sparser sinograms (sino40x80, sino120x240

) were resized to 360x729 pixels using a bilinear interpolation to have a uniform resolution with the corresponding full-view sinograms (

sino360x729).

Figure 4: a Schematic of sinogram generation with 360 projection views and 729 detectors (sino360x729) from original CT images (converted into linear attenuation coefficients). b Sparse sinograms were created from sino360x729 by downsampling in the horizontal dimension and signal averaging in the vertical dimension to simulate the effect of acquiring an image with 120 projection views and 240 detectors (sino120x240) or an image with 40 projection views and 80 detectors (sino40x80)

4.3 Image reconstruction

Reconstructed images were generated from the synthetic sinograms for models I1-I6. Original CT images were used as the reconstructed images for recon360x729 as fully sampled sinogram data could be completely reconstructed into images using filtered back projection (FBP). However, other complex algorithms are needed to reconstruct high-quality images from sparser datasets, such as model-based iterative reconstruction. Rather than employing complex iterative algorithms, we implemented a deep learning approach to reconstruct sparsely sampled sinograms as this technique has been demonstrated to compare favorably to state-of-the-art iterative algorithms for sparse-view image reconstruction Jin et al. (2017); Xie et al. (2018). We implemented FBPConvNet, a modified U-net Ronneberger et al. (2015) with multiresolution decomposition and residual learning as proposed by a prior work Jin et al. (2017). FBPConvNet takes FBP reconstructed images from sparser sinograms (sino120x240 or sino40x80

) as inputs and is trained for regression between the input and the original CT image (converted into LACs) with mean square error (MSE) as the loss function (Figure

7). Since the output images of FBPConvNet were LACs, they were converted into HU as the final reconstructed images. Sparser sinograms were resized to 360x729 pixels using bilinear interpolation in order to make the corresponding FBP images have the uniform resolution of 512x512 pixels, resulting in final reconstructed images of 512x512 pixels. The best FBPConvNet models selected based on RMSE values on the validation data were employed on sino120x240 and sino40x80 to generate recon120x240 and recon40x80 respectively. The root mean square error (RMSE) of reconstructed images obtained from the FBPConvNet in validation dataset are much smaller than that of conventional FBP images (Table 5).

Figure 5: a Overall network architecture of SinoNet. b

Detailed network diagram within the Inception modules that include rectangular convolutional filters and pooling layers. The modified Inception module contains multiple rectangular convolution filters of varying sizes: height-wise rectangular filters (projection dominant) in red; width-wise rectangular filters (detector dominant) in orange; “Conv3x3/s2” indicates a convolutional layer with 3x3 filters and 2 stride, and “Conv3x2” means a convolution layer with 3x2 filters and 1 stride.

c Dense-Inception layers contain two densely connected Inception modules. d

Transition modules situated between Dense-Inception modules reduce the size of feature maps. Conv = convolution layer, MaxPool = max pooling layer, AvgPool = average pooling layer

4.4 Windowed images and sinograms

We utilized full-range 12-bit grayscale images and windowed 8-bit grayscale images with different window-levels (WL) and window-widths (WW) suitable for each task: abdominal window (WL=40HU, WW=400HU) for body part recognition and brain window (WL=50HU, WW=100HU) for ICH detection. The windowed sinograms were generated from corresponding windowed CT images. Examples of windowed images and sinograms are shown in Figure 8.

4.5 Convolutional neural network for sinograms: SinoNet

A customized convolutional neural network, SinoNet, was designed for analyzing sinograms using customized Inception modules with multiple convolutional and pooling layers and dense connection for efficient use of model parameters Szegedy et al. (2016); Huang et al. (2017). As shown in Figure 5, the Inception module was modified with various sized rectangular convolutional filters in SinoNet. The non-square filters include height-wise (detector dominant) and width-wise (projection dominant) filters to enable efficient extraction of features from sinusoidal curves. Two Inception modules were densely connected to form a Dense-Inception block, which was followed by a Transition block to reduce the number and dimension of feature maps for computational efficiency, as suggested in the original report Huang et al. (2017). In this study, SinoNet was used only for interpreting sinograms.

4.6 Baseline convolutional neural network: Inception-v3

Inception-v3 Szegedy et al. (2016)

, a validated CNN for object recognition in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)

Russakovsky et al. (2015)

, was selected as the network architecture to develop classification models trained on reconstructed images. We modified Inception-v3 by replacing the last fully-connected layers with a sequence of a global average pooling (GAP) layer, a fully-connected layer, and a softmax layer with outputs of the same number of categories: 16 multi-class outputs for body part recognition and a binary output for ICH detection. Inception-v3 was also used to classify sinograms when evaluating SinoNet performance at body part recognition and ICH detection when using sinograms as the input data.

4.7 Weight initialization

All models developed using Inception-v3 and SinoNet for body part recognition task were initialized with He normal initialization He et al. (2015). For the ICH detection task, models were initialized with corresponding pre-trained weights on the body part recognition with full-view scanning geometry. For example, the Inception-v3 model trained with recon360x729 for body part recognition was used as the initial weights for Inception-v3 models trained with reconstructed images for ICH detection for all scanning geometries and window levels. Similarly, SinoNet ICH detection models were initialized using the weights from the body part recognition SinoNet model trained with sino360x729.

4.8 Performance evaluation and statistical analysis

Test accuracy was used as the performance metric for comparing body part recognition models, and ROC curves with AUC were used for evaluating performance of models for detection of ICH. All performance metrics were calculated using scikit-learn 0.19.2 available in python 2.7.12. A non-parametric approach (DeLong DeLong et al. (1988)) was used to assess the statistical significance of the difference between AUCs of ICH detection models trained with reconstruction images and sinograms using Stata version 15.1 (StataCorp, College Station, Texas, USA). We employed a non-parametric, bootstrap approach with 2,000 iterations to compute 95% CIs of the metrics including test accuracy and AUC Efron and Tibshirani (1994).

4.9 Network training

Classification models for body part recognition and ICH detection were trained for 45 epochs using the Adam optimizer with default settings

Kingma and Ba (2014) and a mini-batch size of 80. FBPConvNet models were trained for 100 epochs using the Adam optimizer with default settings and a mini-batch size of 20. The base learning rate of 0.001 was decayed by a factor of 10 every 15 epochs for the classification models and every 33 epochs for FBPConvNet. The best classification and FBPConvNet models were selected based on the validation loss.

4.10 Infrastructure

We used radon and iradon

functions in Matlab 2018a for generating sinograms and obtaining FBP reconstructed images, respectively. We used Keras (version 2.1.1) with a Tensorflow backend (version 1.3.0) as the framework for developing deep learning models, and performed experiments using an NVIDIA Devbox (Santa Clara, CA) equipped with four TITAN X GPUs with 12GB of memory per GPU.

Supplementary Information

Figure 6: a A coronal view of a whole-body CT scan image with regions of each body part (annotated in green); b Representative CT images of 16 different body parts in axial view: L1=Brain, L2=Eye lens, L3=Nose, L4=Salivary gland, L5=Thyroid, L6=Upper lung, L7=Thymus, L8=Heart, L9=Chest, L10=Upper abdomen, L11=Lower abdomen, L12=Upper pelvis, L13=Lower pelvis, L14=Bladder, L15=Upper leg, L16=Lower leg.
Figure 7: Network architecture of FBPConvNet for sparse image reconstruction. The FBPConvNet is a modified U-net which employs multilevel decomposition and multichannel filtering with a skip connection between input and output for residual learning. FBP, filtered backprojection; LAC, linear attenuation coeffcients
Figure 8: Example images for sinograms (sino360x729, sino120x240, sino40x80) and reconstructed images (recon360x729, recon120x240, recon40x80) for a body part recognition and b ICH detection tasks. Windowed reconstruction images were generated by applying abdomen window (window-level = 40 HU, window-width = 400 HU) for body part recognition and brain window (window-level = 50 HU, window-width = 100 HU) for ICH detection. All reconstructed images and sinograms are normalized to the same resolution for this figure.
Body part recognition ICH detection
Input FBP FBPConvNet FBP FBPConvNet
sino120x240 1155.4 19.3 28.6 6.2 1251.5 35.6 26.8 8.5
sino40x80 1147.2 19.4 66.9 16.6 1238.1 34.2 66.1 21.2
Table 5: RMSE computed on the validation dataset between scaled LACs converted from original CT images and reconstructed images (recon120x240, recon40x80) through FBP and FBPConvNet from sparse sinograms. RMSE values are expressed as mean standard deviation.

References

  • Esteva et al. [2017] Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and Sebastian Thrun. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639):115, 2017.
  • Gulshan et al. [2016] Varun Gulshan, Lily Peng, Marc Coram, Martin C Stumpe, Derek Wu, Arunachalam Narayanaswamy, Subhashini Venugopalan, Kasumi Widner, Tom Madams, Jorge Cuadros, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama, 316(22):2402–2410, 2016.
  • Chilamkurthy et al. [2018] Sasank Chilamkurthy, Rohit Ghosh, Swetha Tanamala, Mustafa Biviji, Norbert G Campeau, Vasantha Kumar Venugopal, Vidur Mahajan, Pooja Rao, and Prashant Warier. Deep learning algorithms for detection of critical findings in head ct scans: a retrospective study. The Lancet, 2018.
  • Thrall et al. [2018] James H Thrall, Xiang Li, Quanzheng Li, Cinthia Cruz, Synho Do, Keith Dreyer, and James Brink. Artificial intelligence and machine learning in radiology: opportunities, challenges, pitfalls, and criteria for success. Journal of the American College of Radiology, 15(3):504–508, 2018.
  • Wang et al. [2016] Shanshan Wang, Zhenghang Su, Leslie Ying, Xi Peng, Shun Zhu, Feng Liang, Dagan Feng, and Dong Liang. Accelerating magnetic resonance imaging via deep learning. In Biomedical Imaging (ISBI), 2016 IEEE 13th International Symposium on, pages 514–517. IEEE, 2016.
  • Zhu et al. [2018] Bo Zhu, Jeremiah Z Liu, Stephen F Cauley, Bruce R Rosen, and Matthew S Rosen. Image reconstruction by domain-transform manifold learning. Nature, 555(7697):487, 2018.
  • Xie et al. [2018] Shipeng Xie, Xinyu Zheng, Yang Chen, Lizhe Xie, Jin Liu, Yudong Zhang, Jingjie Yan, Hu Zhu, and Yining Hu. Artifact removal using improved googlenet for sparse-view ct reconstruction. Scientific reports, 8, 2018.
  • Jin et al. [2017] Kyong Hwan Jin, Michael T McCann, Emmanuel Froustey, and Michael Unser. Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing, 26(9):4509–4522, 2017.
  • Antholzer et al. [2018] Stephan Antholzer, Markus Haltmeier, and Johannes Schwab. Deep learning for photoacoustic tomography from sparse data. Inverse Problems in Science and Engineering, pages 1–19, 2018.
  • Wang et al. [2018] Ge Wang, Jong Chu Ye, Klaus Mueller, and Jeffrey A Fessler. Image reconstruction is a new frontier of machine learning. IEEE transactions on medical imaging, 37(6):1289–1296, 2018.
  • Do et al. [2014] Synho Do, William Clem Karl, Sarabjeet Singh, Mannudeep Kalra, Tom Brady, Ellie Shin, and Homer Pien. High fidelity system modeling for high quality image reconstruction in clinical ct. PloS one, 9(11):e111625, 2014.
  • Do and Karl [2014] S Do and C Karl. Sinogram sparsified metal artifact reduction technology (ssmart). In The Third International Conference on Image Formation in X-ray Computed Tomography, pages 798–802, 2014.
  • Do et al. [2013] Synho Do, Janne J Näppi, and Hiroyuki Yoshida. Iterative reconstruction for ultra-low-dose laxative-free ct colonography. In International MICCAI Workshop on Computational and Clinical Challenges in Abdominal Imaging, pages 99–106. Springer, 2013.
  • Do et al. [2011] Synho Do, W Clem Karl, Zhuangli Liang, Mannudeep Kalra, Thomas J Brady, and Homer H Pien. A decomposition-based ct reconstruction formulation for reducing blooming artifacts. Physics in Medicine & Biology, 56(22):7109, 2011.
  • Szegedy et al. [2016] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , pages 2818–2826, 2016.
  • Pons et al. [2016] Jordi Pons, Thomas Lidy, and Xavier Serra. Experimenting with musically motivated convolutional neural networks. In Content-Based Multimedia Indexing (CBMI), 2016 14th International Workshop on, pages 1–6. IEEE, 2016.
  • McCollough et al. [2015] Cynthia H McCollough, Shuai Leng, Lifeng Yu, and Joel G Fletcher. Dual-and multi-energy ct: principles, technical approaches, and clinical applications. Radiology, 276(3):637–653, 2015.
  • Ronneberger et al. [2015] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
  • Huang et al. [2017] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In CVPR, volume 1, page 3, 2017.
  • Russakovsky et al. [2015] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015.
  • He et al. [2015] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
  • DeLong et al. [1988] Elizabeth R DeLong, David M DeLong, and Daniel L Clarke-Pearson.

    Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach.

    Biometrics, pages 837–845, 1988.
  • Efron and Tibshirani [1994] Bradley Efron and Robert J Tibshirani. An introduction to the bootstrap. CRC press, 1994.
  • Kingma and Ba [2014] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.