Intracranial hemorrhage (ICH) is a critical finding seen in various clinical circumstances spanning major trauma to spontaneous intracranial aneurysmal rupture. Early and accurate detection is essential in achieving optimal outcomes. An AI-facilitated first read of CT brains could provide value by detecting subtle bleeds which might go unrecognized, as well as providing triage-service to prioritize positively-flagged studies for expert radiologist review.
In recent years, convolutional neural networks (CNN’s) have been successfully designed to detect various pathologies in medical imaging[textray, ccs, radbot, cf]. Previously reported deep-learning infrastructures for automatic ICH detection have based ICH prediction upon either the the entire 3D Head CT volume [deep3dconv] or each 2D CT slice [qureai, radnet]. While the former potentially utilizes a larger amount of data, it is at the cost of relatively weak supervision due to the high dimensionality of the input volume. The second approach requires a substantial tagging effort due to tedious annotation of every relevant slice in the scan.
Jnawali et al [deep3dconv] assembled a dataset of 40k studies and preprocessed it to a fixed input size. It was then used for the training of a 3D convolution [3dconvaction] classification pipeline and reported to have an AUC of 0.86 using a single model. Additional work was in [qureai], in which the authors utilized a large dataset of 6k studies tagged slice-wise by radiologists for training. To localize the findings, the authors had to annotate the slices pixel-wise to create the masks necessary in order to train a UNet [unet] architecture for segmentation. They report AUC of 0.9419 for the classification part. In [radnet], the authors used multiple segmentation auxiliary losses to leverage the pixel-wise information and aggregated the 3D volumetric decision using LSTM [hochreiter1997long].
The present report describes integration of both classification and segmentation of an image in a single network, utilizing the pixel-wise prediction to improve the 3D volumetric ICH classification result. BloodNet is a CNN architecture which explicitly incorporates the pixel-wise prediction through modeling the dependency between the classification and segmentation task.
Every cell represents the number of tagged slices. All slices were manually pixel wise annotated for positive ICH on 175 ICH positive and 102 on ICH negative scans.
2 Materials and Methods
For training and validation, 175 non-contrast CT brain studies with ICH-positive radiology reports were reviewed by at least one expert radiologist who validated the existence of the reported ICH and manually segmented it. An ICH-negative dataset including 102 CTs was also assembled. For validation we use only positive studies, which contain both positive and negative slices. Testing was performed on two datasets totaling 1,426 expert-validated studies, including an enriched (67% ICH positive) and randomly sampled (16% positive) set. Every study was tagged by a single expert radiologist while multiple experts participated in the tagging.
The present report describes a new pipeline for CT-based ICH classification intended for enhanced triage. The setup relies on the learning of both classification and segmentation, having demonstrated that the segmentation task provides synergistic support to the ICH classification task. A high level description of our architecture is described in Figure 5.
To exploit the volumetric nature of ICH, the input number of slices was set as 5 consecutive axial CT slices, allowing for better detection of true ICH. We empirically observed that the learned models better distinguish artifacts and hemorrhages, which may look similar on a single slice but commonly appear differently over consecutive slices. We show example for the advantage of such context in Fig 2. Additional preprocessing included the utilization of standard brain-windowing. Since we empirically observed that a hemorrhage might be very small, we kept the input slices in the full 512x512 CT resolution.
Given the input slices we first base our approach on performing classification alone, using the architecture in Figure 3. Hence our classification loss is:
Where is the ground truth label, is the prediction of the -th sample, is the number of samples and is the binary cross entropy function:
Considering the clear advantages of multi-task learning reported in recent research [he2017mask, radnet], we modified the architecture and added a decoder to enable the multi-task learning scenario of classification and segmentation (see Figure 4). We also added an auxiliary segmentation loss:
Where and are the height and width of input slice, is the pixel in the spatial position of the th sample.
Our final loss is thus:
Finally, instead of implicitly using the segmentation information as supervision, we explicitly design the architecture to utilize the segmentation information to support classification. More specifically, we sum over the decoder network segmentation prediction, multiply by the voxel volume and concatenate the approximation of blood in as a feature in the classification branch.
To train this architecture, we employ three steps. First, we train the segmentation branch alone. Then, we freeze all weights and train only the last fully connected layer of the classification branch. Finally, we train the entire architecture for both classification and segmentation in an end-to-end manner. Respectively, we use , , in the loss equation. In all our experiments we use the Adam optimizer with learning rate of and exponential decay of
. All architectures were implemented in Tensorflow and trained using 4 Nvidia Tesla K80 GPUs. In inference, given a study, we compute the probability for ICH over every slice and use the maximal probability as the study probability for ICH.
We choose the best architecture using AUC over validation set. Table 3 provides comparison between models. We then evaluated on two different held out test sets, a positive enriched and a randomly sampled sets. The advantage of a positive enriched set is in representation of different types of ICHs as well as ICHs which are less prevalent. To collect this set we used a textual search over radiology reports. Since such data collection method might present a bias towards a specific search criteria, we also collected a randomly sampled set. We assume that in the randomly sampled set the cases in radiologists daily routine are well represented. We report AUCs of 0.9493 and 0.9566 over the enriched and randomly sampled tests set. Table 2 provides further information. A manual review of false positives showed propensity to aberrantly misclassify calcified hemangiomas, dystrophic parenchymal calcifications and basal ganglial calcifications.
Every cell represents the number of tagged studies. These studies were tagged only on study level. These studies were held out during training and validation with respect to patient.
|ResNet50 [he2016deep]||0.9159||[0.9081, 0.9236]|
|Single task, classification||0.9453||[0.9395, 0.9512]|
|Multi task, classification and segmentation||0.9411||[0.9352, 0.9471]|
|Task dependent, segmentation dependent classification||0.9658||[0.9611, 0.9704]|
Ablation studies of results over validation set slices. Mean AUC and CI are are computed using bootstrap .
This work provides further evidence to support the approach of utilizing pixel wise annotated data for classification. However, our results indicate that relying on the multi-task setting alone might not be enough to yield a significant improvement in performance for classification. In BloodNet, we explicitly model a segmentation dependent classification, resulting in design that fully leverages the dense pixel wise supervision to boost classification performance. It has the advantage of both classification and localization of the acute finding and while classification is most important in a triage system, the localization provides reasoning hence crucial for a radiologist to have a better understanding of the prediction.
The authors would like to thank Orna Bregman, Assaf Pinhasi, Jonathan Laserson, David Chettrit, Chen Brestel, Eli Goz, Phil Teare, Tomer Meir, Rachel Wities, Amit Oved, Raouf Muhamedrahimov and Eyal Toledano for helpful comments and discussions during this research.