Continued rapid advancements in algorithms and computer hardware have accelerated progress in automated computer vision and natural language processing. By combining these two factors with the availability of well-annotated large datasets, significant advances have emerged from automated medical image interpretation for the detection of disease and critical findingsEsteva et al. (2017); Gulshan et al. (2016); Chilamkurthy et al. (2018). The application of deep learning has the potential to increase diagnostic accuracy and reduce delays in diagnosis and treatment for better patient outcomes Thrall et al. (2018). Deep learning techniques are not limited to image analysis, but they also can improve image reconstruction for magnetic resonance imaging (MRI) Wang et al. (2016); Zhu et al. (2018), computed tomography (CT) Xie et al. (2018); Jin et al. (2017), and photoacoustic tomography (PAT) Antholzer et al. (2018). Deep learning now is a feasible alternative to well-established analytic and iterative methods of image reconstruction Wang et al. (2018); Do et al. (2014); Do and Karl (2014); Do et al. (2013, 2011).
However, most prior work using deep learning algorithms has focused on image analysis of reconstructed images or as an alternative approach to image reconstruction. Despite this human centric approach, there is no reason that deep learning algorithms must function in image-space. Since all the information in the reconstructed images is present in the raw measurement data, deep learning models could potentially derive features directly from raw data in sinogram-space without intermediary image reconstruction, with possibly even better performance than models trained in image-space. In this study, we determined the feasibility analyzing computed tomography (CT) projection data - sinograms - through a deep learning approach for human anatomy identification and pathology detection. We proposed a customized convolutional neural network (CNN) called SinoNet, optimized it for interpreting sinograms, and demonstrated its potential by comparing its performance to pre-existing system based on other CNN architectures using reconstructed CT images. This approach accelerates edge computing by making it possible to identify critical findings rapidly from the raw data without time-consuming image reconstruction processes. In addition, this could enable us to develop simplified scanner hardware for the direct detection of critical findings through SinoNet alone.
2.1 Experimental design
We retrieved 200 contiguous whole body CT datasets from combined positron emission tomography-computed tomography (PET/CT) examinations for body part recognition and 720 non-contrast head CT scans for intracranial hemorrhage (ICH) detection with IRB approval from the picture archiving and communication systems at our quaternary referral hospital. Axial slices in the 200 whole body scans were annotated as sixteen different body regions by a physician, and slices of the 720 head scans were annotated with the presence of hemorrhage by a panel of five neuroradiologists by consensus (Methods). We evaluated twelve different classification models developed by training Inception-v3 Szegedy et al. (2016) on reconstructed CT images and SinoNet with sinograms (Table 1, Methods). The reconstructed CT images containing Hounsfield units (HU) were converted to scaled linear attenuation coefficients (LAC). Two-dimensional (2D) parallel-beam Radon transform was applied to the LAC slices (512x512 pixels) to generate a fully-sampled sinogram with 360 projections and 729 detector pixels (sino360x729), which was then uniformly subsampled in the horizontal direction (projection views) and averaged in vertical direction (detector pixels) by factors of 3 and 9 to obtain moderately sampled sinograms with 120 views by 240 pixels (sino120x240) and sparsely sampled sinograms with 40 views by 80 pixels (sino40x80).
Original CT images were used as fully sampled reconstructed images (recon360x729), and images reconstructed from the sparse sinograms (recon120x240 and recon40x80) were generated using a deep learning approach (FBPConvNet Jin et al. (2017)) followed by a conversion from LAC to HU. Reconstructed CT images and sinograms with predefined window-level settings were created to evaluate the effect of windowing: wrecon360x729, wrecon120x240, wrecon40x80; and wsino360x729, wsino120x240, wsino40x80 (Methods). Based on the scanning geometries and window-level settings described above, 12 CNN models were evaluated: 6 were developed by training Inception-v3 Szegedy et al. (2016) with reconstructed CT images and the other 6 were obtained by training SinoNet with sinograms (Table 1, Methods). Data for body part recognition was randomly split into training, validation, and test sets with balanced genders: 140 scans in training, 30 in validation, and 30 in testing. A similar dataset breakdown was performed for ICH detection with 478 scans in training, 121 in validation, and 121 in testing. Details of data preparation, CNN architecture, sinogram generation, and image reconstruction are described in Methods.
|Fully sampled||Moderately sampled||Sparsely sampled|
|360 projections and 729 detectors||120 projections and 240 detectors||40 projections and 80 detectors|
|I1: recon360x729 (original CT)||I3: recon120x240||I5: recon40x80|
|S1: sino360x729||S3: sino120x240||S5: sino40x80|
|I2: wrecon360x729 (windowed original CT)||I4: wrecon120x240||I6: wrecon40x80|
|S2: wsino360x729||S4: wsino120x240||S6: wsino40x80|
2.2 Results of body part recognition
Figure 1 shows test performance of the twelve different models for body part recognition. Models trained on fully sampled images had accuracies of 97.4% in image-space, 96.6% in sinogram-space, 97.9% in windowed-image-space, and 97.4% in windowed-sinogram-space. Moderately sampled images had model accuracies of 97.4% in image-space, 96.3% in sinogram-space, 97.9% in windowed-image-space, and 97.4% in windowed-sinogram-space. Sparsely sampled images had model accuracies of 97.1% in image-space, 96.2% in sinogram-space, 97.2% in windowed-image-space, and 97.1% in windowed-sinogram-space. These results imply that models trained and operating in image-space performed slightly better than sinogram-space (SinoNet) models for body part recognition, regardless of scanning geometry. Additionally, windowed input images consistently outperformed the ones with full-range images/sinograms.
2.3 Results of intracranial hemorrhage detection
Figure 2 depicts receiver operating characteristic (ROC) curves, and the corresponding areas under the ROC curves (AUC) for the twelve different models of ICH detection. Models trained on fully sampled images had AUCs of 0.898 in image-space, 0.918 in sinogram-space, 0.972 in windowed-image-space, and 0.951 in windowed-sinogram-space. Moderately sampled images had model accuracies of 0.893 in image-space, 0.915 in sinogram-space, 0.953 in windowed-image-space, and 0.947 in windowed-sinogram-space. Sparsely sampled images had model accuracies of 0.885 in image-space, 0.899 in sinogram-space, 0.909 in windowed-image-space, and 0.942 in windowed-sinogram-space
2.4 Comparison of SinoNet and Inception-v3 for analyzing sinograms
Table 2 details performance comparisons of Inception-v3 and SinoNet for interpreting fully-sampled sinograms (360 projection views and 729 detector pixels) for both body part recognition and ICH detection. SinoNet models significantly outperformed Inception-v3 models in both tasks.
|Body part recognition (Accuracy)||ICH detection (AUC)|
|sino360x729||93.9% (93.4%-94.4%)||96.6% (96.2%-96.9%)||0.873 (0.849-0.895)||0.918* (0.899-0.935)|
|sino120x240||93.5% (93.0%-94.0%)||96.3% (95.9%-96.7%)||0.874 (0.851-0.896)||0.915* (0.897-0.932)|
|sino40x80||93.4% (92.9%-93.9%)||96.2% (95.8%-96.6%)||0.852 (0.828-0.876)||0.899* (0.879-0.917)|
We have demonstrated that models trained on sinograms can achieve similar performance when compared to models using conventional reconstructed images for body part recognition and ICH detection in all three scanning geometries, despite the fact that the measurement data are not interpretable to humans. SinoNet, when trained with sinograms, has comparable performance with that of Inception-v3 when trained with reconstructed CT images for body part recognition, regardless of the number of projection views or detectors. For ICH detection, SinoNet trained with full-range sinograms outperformed Inception-v3 trained with full dynamic range reconstructed images for all three scanning geometries, with SinoNet significantly outperforming Inception-v3 when using windowed, sparsely sampled images. By applying window settings similar to what a radiologist would use, network performance increased significantly due to the improved target to background (Figure 3) in both reconstructed images and in sinogram-space. As depicted in Figure 3 (b), not only are the key features relevant to hemorrhage enhanced in the windowed CT image, but also in the windowed sinogram.
SinoNet, a customized convolutional neural network, was developed for analyzing sinograms through customized Inception modules with multi-scale convolutional and pooling layersSzegedy et al. (2016). In SinoNet, the square convolutional filters in the original Inception module were replaced by various sized rectangular convolutional filters which include width-wise (projection dominant) and height-wise (detector dominant) filters. The customized architecture of SinoNet allowed for significantly improved performance in both body part recognition and ICH detection when compared with Inception-v3 models trained with sinograms, regardless of sampling density. These results imply that non-square filters may be effective in enabling models to learn the interplay between projection views and detector pixels from sinusoidal curves and to extract salient features from the sinogram domain for classification, a task thought to be impossible for human experts to grasp. This approach is similar to the one proposed for learning temporal and frequency features using rectangular convolution filters in spectrograms Pons et al. (2016).
SinoNet, by operating in sinogram-space, can accelerate image interpretation for pathology detection as complex computations for image reconstruction are not required. SinoNet also excels when the projection data was moderately or sparsely sampled, maintaining its AUC at 0.942 on the hemorrhage detection task, while Inceptionv3 dropped from 0.972 to 0.909. Sparsely sampled datasets suggest that radiation dose could be markedly decreased with only a slight degradation in performance for sinogram-space algorithms. The number of projections linearly correlates with radiation dose, theoretically achieving 33% and 89% dose reductions for moderately and sparsely sampled data respectively. Similarly, by reducing the size and number of detectors required for diagnostic CT data, cheaper and simpler CT scanners can be created. At our institution, the average head CT has a CTDIvol of 50 mGy. Sparsely sampled data could have CTDIvol between 6 and 16 mGy. One possible use of this technique would be to use the sinogram model as a first-line screening tool in the field setting without image reconstruction, subsequently prioritizing a patient for potential stroke therapy given no evidence of intracranial hemorrhage. Subsequent full-dose CT could be used to confirm the interpretation from the sinogram method. Another possible use for this technique would be to create “smart-scanners” which allow the CT scanner to adjust the protocol and field of view based on the intended region of the body.
Although these results demonstrate the power of the sinogram based approach, several important areas of future investigation remain. Due to their unavailability, the sinograms used in this study were simulated by applying the 2D parallel-beam Radon transform to the reconstructed CT images rather than actual measurement data acquired from CT scanners. Improved simulation data could be acquired by accounting for other advanced projection geometries - cone-beam or fan-beam - and considering Poisson noise when generating projection data. Although SinoNet trained with windowed sinograms achieved comparable or better performance compared with windowed reconstructed images, windowed sinograms were generated from reconstructed images that were postprocessed with predefined window settings; generation of windowed sinograms directly from CT measurement data is not straightforward, but it could be implemented by using energy-resolving, photon-counting detectors from multi-energy CT imaging to acquire measurements in multiple energy bins McCollough et al. (2015). Our work will need to be further validated by using raw data from clinical scanners as well as raw data from actual low-dose image acquisitions to see if performance remains robust despite increased image noise.
This HIPAA-compliant retrospective study was conducted with the approval of our institutional review board and under a waiver of informed consent.
4.1 Data collection and annotation
Body part recognition: a total of 200 contrast-enhanced PET/CT examinations of head, neck, chest, abdomen, and pelvis for 100 female and 100 male patients were retrieved from our institutional Picture Archiving and Communication System (PACS) between May 2012 and July 2012. 56,334 axial slices in the CT scans were annotated as one of sixteen body regions by a physician (Figure 6
). 15% of the total slices were randomly selected for use as validation data for hyperparameter tuning and model selection, 15% as test data for performance evaluation, and the rest as training data for model development (Table3).
Intracranial hemorrhage (ICH) detection: a total of 720 5-mm non-contrast head CT scans were identified and retrieved from our PACS between June 2013 and July 2017. Every 5-mm thick axial slice (3,151 slices without ICH and 2,895 slices with ICH) was annotated by five board-certified neuroradiologists (blinded for review, 9 to 34 years experience) according to presence of ICH by consensus. The examinations included 201 cases without ICH and 519 cases with ICH, which were randomly split into train, validation, and test datasets at the case-level to ensure slices from the same case were not split across different datasets (Table 4).
|No. Cases||140 (70F, 70M)||30 (15F, 15M)||30 (15F, 15M)|
|L2: Eye lens||878||189||188|
|L4: Salivary gland||1,803||361||349|
|L6: Upper lung||1,632||345||392|
|L10: Upper abdomen||4,943||1,008||1,103|
|L11: Lower abdomen||1,736||342||368|
|L12: Upper pelvis||2,524||617||545|
|L13: Lower pelvis||2,230||563||422|
|L15: Upper leg||2,607||563||532|
|L16: Lower leg||1,818||334||354|
|No. Cases||No. Images||No. Cases||No. Images||No. Cases||No. Images|
4.2 Sinogram generation
Simulated sinograms were utilized in this study instead of raw data obtained by commercial CT scanners as this was a retrospective analysis and access to raw projection data from patient CT scans could not be retrieved. To generate simulated sinograms, the pixel values of 512x512 CT images stored in DICOM file were first converted into scaled linear attenuation coefficients (LACs). Any calculated negative LAC was leveled to zero under the assumption that it is physically impossible to have negative LACs, so this result must represent random noise. Subsequently, three different sinograms were generated based on the scaled LAC images. First, we computed sinograms with 360 projection views over 180 degrees and 729 detectors (sino360x729), using the 2D parallel-beam Radon transform. sino360x729 were then used to produce sparser sinograms by uniformly subsampling projection views (in the horizontal direction) and averaging projection data from adjacent detectors (in the vertical direction) by factors of 3 and 9 to obtain sinograms with 120 projection views and 240 detectors (sino120x240) and sinograms with 40 projection views and 80 detectors (sino40x80), respectively (Figure 4). Sparser sinograms (sino40x80, sino120x240
) were resized to 360x729 pixels using a bilinear interpolation to have a uniform resolution with the corresponding full-view sinograms (sino360x729).
4.3 Image reconstruction
Reconstructed images were generated from the synthetic sinograms for models I1-I6. Original CT images were used as the reconstructed images for recon360x729 as fully sampled sinogram data could be completely reconstructed into images using filtered back projection (FBP). However, other complex algorithms are needed to reconstruct high-quality images from sparser datasets, such as model-based iterative reconstruction. Rather than employing complex iterative algorithms, we implemented a deep learning approach to reconstruct sparsely sampled sinograms as this technique has been demonstrated to compare favorably to state-of-the-art iterative algorithms for sparse-view image reconstruction Jin et al. (2017); Xie et al. (2018). We implemented FBPConvNet, a modified U-net Ronneberger et al. (2015) with multiresolution decomposition and residual learning as proposed by a prior work Jin et al. (2017). FBPConvNet takes FBP reconstructed images from sparser sinograms (sino120x240 or sino40x80
) as inputs and is trained for regression between the input and the original CT image (converted into LACs) with mean square error (MSE) as the loss function (Figure7). Since the output images of FBPConvNet were LACs, they were converted into HU as the final reconstructed images. Sparser sinograms were resized to 360x729 pixels using bilinear interpolation in order to make the corresponding FBP images have the uniform resolution of 512x512 pixels, resulting in final reconstructed images of 512x512 pixels. The best FBPConvNet models selected based on RMSE values on the validation data were employed on sino120x240 and sino40x80 to generate recon120x240 and recon40x80 respectively. The root mean square error (RMSE) of reconstructed images obtained from the FBPConvNet in validation dataset are much smaller than that of conventional FBP images (Table 5).
4.4 Windowed images and sinograms
We utilized full-range 12-bit grayscale images and windowed 8-bit grayscale images with different window-levels (WL) and window-widths (WW) suitable for each task: abdominal window (WL=40HU, WW=400HU) for body part recognition and brain window (WL=50HU, WW=100HU) for ICH detection. The windowed sinograms were generated from corresponding windowed CT images. Examples of windowed images and sinograms are shown in Figure 8.
4.5 Convolutional neural network for sinograms: SinoNet
A customized convolutional neural network, SinoNet, was designed for analyzing sinograms using customized Inception modules with multiple convolutional and pooling layers and dense connection for efficient use of model parameters Szegedy et al. (2016); Huang et al. (2017). As shown in Figure 5, the Inception module was modified with various sized rectangular convolutional filters in SinoNet. The non-square filters include height-wise (detector dominant) and width-wise (projection dominant) filters to enable efficient extraction of features from sinusoidal curves. Two Inception modules were densely connected to form a Dense-Inception block, which was followed by a Transition block to reduce the number and dimension of feature maps for computational efficiency, as suggested in the original report Huang et al. (2017). In this study, SinoNet was used only for interpreting sinograms.
4.6 Baseline convolutional neural network: Inception-v3
Inception-v3 Szegedy et al. (2016)
, a validated CNN for object recognition in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)Russakovsky et al. (2015)
, was selected as the network architecture to develop classification models trained on reconstructed images. We modified Inception-v3 by replacing the last fully-connected layers with a sequence of a global average pooling (GAP) layer, a fully-connected layer, and a softmax layer with outputs of the same number of categories: 16 multi-class outputs for body part recognition and a binary output for ICH detection. Inception-v3 was also used to classify sinograms when evaluating SinoNet performance at body part recognition and ICH detection when using sinograms as the input data.
4.7 Weight initialization
All models developed using Inception-v3 and SinoNet for body part recognition task were initialized with He normal initialization He et al. (2015). For the ICH detection task, models were initialized with corresponding pre-trained weights on the body part recognition with full-view scanning geometry. For example, the Inception-v3 model trained with recon360x729 for body part recognition was used as the initial weights for Inception-v3 models trained with reconstructed images for ICH detection for all scanning geometries and window levels. Similarly, SinoNet ICH detection models were initialized using the weights from the body part recognition SinoNet model trained with sino360x729.
4.8 Performance evaluation and statistical analysis
Test accuracy was used as the performance metric for comparing body part recognition models, and ROC curves with AUC were used for evaluating performance of models for detection of ICH. All performance metrics were calculated using scikit-learn 0.19.2 available in python 2.7.12. A non-parametric approach (DeLong DeLong et al. (1988)) was used to assess the statistical significance of the difference between AUCs of ICH detection models trained with reconstruction images and sinograms using Stata version 15.1 (StataCorp, College Station, Texas, USA). We employed a non-parametric, bootstrap approach with 2,000 iterations to compute 95% CIs of the metrics including test accuracy and AUC Efron and Tibshirani (1994).
4.9 Network training
Classification models for body part recognition and ICH detection were trained for 45 epochs using the Adam optimizer with default settingsKingma and Ba (2014) and a mini-batch size of 80. FBPConvNet models were trained for 100 epochs using the Adam optimizer with default settings and a mini-batch size of 20. The base learning rate of 0.001 was decayed by a factor of 10 every 15 epochs for the classification models and every 33 epochs for FBPConvNet. The best classification and FBPConvNet models were selected based on the validation loss.
We used radon and iradon
functions in Matlab 2018a for generating sinograms and obtaining FBP reconstructed images, respectively. We used Keras (version 2.1.1) with a Tensorflow backend (version 1.3.0) as the framework for developing deep learning models, and performed experiments using an NVIDIA Devbox (Santa Clara, CA) equipped with four TITAN X GPUs with 12GB of memory per GPU.
|Body part recognition||ICH detection|
|sino120x240||1155.4 19.3||28.6 6.2||1251.5 35.6||26.8 8.5|
|sino40x80||1147.2 19.4||66.9 16.6||1238.1 34.2||66.1 21.2|
- Esteva et al.  Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and Sebastian Thrun. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639):115, 2017.
- Gulshan et al.  Varun Gulshan, Lily Peng, Marc Coram, Martin C Stumpe, Derek Wu, Arunachalam Narayanaswamy, Subhashini Venugopalan, Kasumi Widner, Tom Madams, Jorge Cuadros, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama, 316(22):2402–2410, 2016.
- Chilamkurthy et al.  Sasank Chilamkurthy, Rohit Ghosh, Swetha Tanamala, Mustafa Biviji, Norbert G Campeau, Vasantha Kumar Venugopal, Vidur Mahajan, Pooja Rao, and Prashant Warier. Deep learning algorithms for detection of critical findings in head ct scans: a retrospective study. The Lancet, 2018.
- Thrall et al.  James H Thrall, Xiang Li, Quanzheng Li, Cinthia Cruz, Synho Do, Keith Dreyer, and James Brink. Artificial intelligence and machine learning in radiology: opportunities, challenges, pitfalls, and criteria for success. Journal of the American College of Radiology, 15(3):504–508, 2018.
- Wang et al.  Shanshan Wang, Zhenghang Su, Leslie Ying, Xi Peng, Shun Zhu, Feng Liang, Dagan Feng, and Dong Liang. Accelerating magnetic resonance imaging via deep learning. In Biomedical Imaging (ISBI), 2016 IEEE 13th International Symposium on, pages 514–517. IEEE, 2016.
- Zhu et al.  Bo Zhu, Jeremiah Z Liu, Stephen F Cauley, Bruce R Rosen, and Matthew S Rosen. Image reconstruction by domain-transform manifold learning. Nature, 555(7697):487, 2018.
- Xie et al.  Shipeng Xie, Xinyu Zheng, Yang Chen, Lizhe Xie, Jin Liu, Yudong Zhang, Jingjie Yan, Hu Zhu, and Yining Hu. Artifact removal using improved googlenet for sparse-view ct reconstruction. Scientific reports, 8, 2018.
- Jin et al.  Kyong Hwan Jin, Michael T McCann, Emmanuel Froustey, and Michael Unser. Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing, 26(9):4509–4522, 2017.
- Antholzer et al.  Stephan Antholzer, Markus Haltmeier, and Johannes Schwab. Deep learning for photoacoustic tomography from sparse data. Inverse Problems in Science and Engineering, pages 1–19, 2018.
- Wang et al.  Ge Wang, Jong Chu Ye, Klaus Mueller, and Jeffrey A Fessler. Image reconstruction is a new frontier of machine learning. IEEE transactions on medical imaging, 37(6):1289–1296, 2018.
- Do et al.  Synho Do, William Clem Karl, Sarabjeet Singh, Mannudeep Kalra, Tom Brady, Ellie Shin, and Homer Pien. High fidelity system modeling for high quality image reconstruction in clinical ct. PloS one, 9(11):e111625, 2014.
- Do and Karl  S Do and C Karl. Sinogram sparsified metal artifact reduction technology (ssmart). In The Third International Conference on Image Formation in X-ray Computed Tomography, pages 798–802, 2014.
- Do et al.  Synho Do, Janne J Näppi, and Hiroyuki Yoshida. Iterative reconstruction for ultra-low-dose laxative-free ct colonography. In International MICCAI Workshop on Computational and Clinical Challenges in Abdominal Imaging, pages 99–106. Springer, 2013.
- Do et al.  Synho Do, W Clem Karl, Zhuangli Liang, Mannudeep Kalra, Thomas J Brady, and Homer H Pien. A decomposition-based ct reconstruction formulation for reducing blooming artifacts. Physics in Medicine & Biology, 56(22):7109, 2011.
Szegedy et al. 
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew
Rethinking the inception architecture for computer vision.
Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
- Pons et al.  Jordi Pons, Thomas Lidy, and Xavier Serra. Experimenting with musically motivated convolutional neural networks. In Content-Based Multimedia Indexing (CBMI), 2016 14th International Workshop on, pages 1–6. IEEE, 2016.
- McCollough et al.  Cynthia H McCollough, Shuai Leng, Lifeng Yu, and Joel G Fletcher. Dual-and multi-energy ct: principles, technical approaches, and clinical applications. Radiology, 276(3):637–653, 2015.
- Ronneberger et al.  Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
- Huang et al.  Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In CVPR, volume 1, page 3, 2017.
- Russakovsky et al.  Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015.
- He et al.  Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
DeLong et al. 
Elizabeth R DeLong, David M DeLong, and Daniel L Clarke-Pearson.
Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach.Biometrics, pages 837–845, 1988.
- Efron and Tibshirani  Bradley Efron and Robert J Tibshirani. An introduction to the bootstrap. CRC press, 1994.
- Kingma and Ba  Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.