Deep learning for brain metastasis detection and segmentation in longitudinal MRI data

by   Yixing Huang, et al.

Brain metastases occur frequently in patients with metastatic cancer. Early and accurate detection of brain metastases is very essential for treatment planning and prognosis in radiation therapy. To improve brain metastasis detection performance with deep learning, a custom detection loss called volume-level sensitivity-specificity (VSS) is proposed, which rates individual metastasis detection sensitivity and specificity in (sub-)volume levels. As sensitivity and precision are always a trade-off in a metastasis level, either a high sensitivity or a high precision can be achieved by adjusting the weights in the VSS loss without decline in dice score coefficient for segmented metastases. To reduce metastasis-like structures being detected as false positive metastases, a temporal prior volume is proposed as an additional input of the neural network. Our proposed VSS loss improves the sensitivity of brain metastasis detection, increasing the sensitivity from 86.7 Alternatively, it improves the precision from 68.8 additional temporal prior volume, about 45 are reduced in the high sensitivity model and the precision reaches 99.6 the high specificity model. The mean dice coefficient for all metastases is about 0.81. With the ensemble of the high sensitivity and high specificity models, on average only 1.5 false positive metastases per patient needs further check, while the majority of true positive metastases are confirmed. The ensemble learning is able to distinguish high confidence true positive metastases from metastases candidates that require special expert review or further follow-up, being particularly well-fit to the requirements of expert support in real clinical practice.



There are no comments yet.


page 9

page 10

page 14

page 15

page 17

page 18

page 19


3DFPN-HS^2: 3D Feature Pyramid Network Based High Sensitivity and Specificity Pulmonary Nodule Detection

Accurate detection of pulmonary nodules with high sensitivity and specif...

Multimodal Volume-Aware Detection and Segmentation for Brain Metastases Radiosurgery

Stereotactic radiosurgery (SRS), which delivers high doses of irradiatio...

Random Bundle: Brain Metastases Segmentation Ensembling through Annotation Randomization

We introduce a novel ensembling method, Random Bundle (RB), that improve...

Deep Learning Based Detection and Localization of Cerebal Aneurysms in Computed Tomography Angiography

Detecting cerebral aneurysms is an important clinical task of brain comp...

Ensembling Low Precision Models for Binary Biomedical Image Segmentation

Segmentation of anatomical regions of interest such as vessels or small ...

Weak labels and anatomical knowledge: making deep learning practical for intracranial aneurysm detection in TOF-MRA

Supervised segmentation algorithms yield state-of-the-art results for au...

Lymph Node Gross Tumor Volume Detection in Oncology Imaging via Relationship Learning Using Graph Neural Network

Determining the spread of GTV_LN is essential in defining the respective...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I. Introduction

Patients with metastatic cancer have a high risk of developing brain metastases (BM), with an approximate incident rate up to 40% 1. The progression of brain metastases leads to reduced or absent efficacy of common systemic treatments. Therefore, successful treatment of BM is very crucial for patient survival and quality of life 2, 3. As whole-brain radiotherapy causes cognitive impairments, stereotactic radiosurgery (SRS) has gained increasing preference for BM treatment 4, 5, 6, 7. SRS delivers high focused radiation to metastasis regions with little dose to surrounding normal brain tissues. Hence, it causes much fewer side effects compared with whole-brain radiotherapy. For SRS treatment planning, the number, size, boundary, and location of BM are essential information, which requires accurate detection and subsequent segmentation of BM. Currently, BM are identified manually by neuroradiologists and radiation oncologists, which is time-consuming and suffers from inter-rater variability. Especially, small metastases are easily overlooked in manual detection, as they are located only in a few image slices and typically have low contrast. In addition, some anatomical structures, such as blood vessels, appear very similar to BM in 2D intersection planes, which makes manual identification challenging 8. Therefore, computer assisted automated BM identification has important clinical value.

For automated BM detection and segmentation, conventional machine learning methods such as template matching

9, 10

, support vector machine

11 and AdaBoost 12 have been applied. However, they have been proven inferior to the latest deep learning methods 11, 12. With the recent explosion of deep learning techniques, although the majority of researchers focus on the segmentation of primary brain tumors like gliomas 13, the research of deep learning for BM detection and segmentation is growing 14. Considering neural network architectures for BM segmentation, 3D U-Net 15, 16, 17, 18 and DeepMedic 19, 20, 16, 21, 15 are the most common two networks. Other neural networks include GoogLeNet 22, V-Net 11, Faster R-CNN 12, single-shot detectors 23

, and custom convolutional neural networks (CNNs)

24, 25. To save memory, the GoogLeNet approach 22

utilizes seven slices, one central slice plus six neighboring slices, as a 2.5D model. All other neural networks use 3D subvolumes for training. To train segmentation neural networks, loss functions of binary cross-entropy (BCE), dice similarity coefficient (DSC), and intersection over union (IOU) are generally applied. To improve BM segmentation performance, new loss functions are proposed. For example, Bousabarah et al.

17 proposed a soft dice loss computed from three spatial scales, which are the outputs of three down-sampled layers from the 3D U-Net. As dice coefficients can be dominated by large tumors, the contribution of small metastasis regions will be underweighted. To overcome this problem, Hu et al. 15 proposed a volume-aware dice loss, where the dice coefficients are reweighted by metastasis volume sizes. However, all the above methods have limited performance, either in detection sensitivity or precision. For example, the volume-aware dice loss together with ensemble learning only achieves a precision of 0.79 15; the soft dice loss only achieves a sensitivity of 0.82 17; the custom CNN 24 and the GooLeNet 22 both achieve a sensitivity of 0.83. Although Dikici et al. 25 and Charron et al. 21 have achieved relatively high sensitivity of 0.9 and 0.93 respectively, they both have high average false positive (FP) rates of 9.1 and 7.8 FP metastases per patient, respectively. Especially, their detection sensitivity of tiny metastases is not satisfactory. For example, the GoogLeNet-based method 22 only achieves 50% sensitivity for metastases smaller than 7 mm 8; The two-step U-Net method 18, achieving 100% accuracy, is only evaluated on BM larger than 0.07 cm with a median size of 2.22 cm.

Among the above methods, 21, 15, 17, 22 are based on multi-modal MRI data, as complementary information from multi-modal/parametric imaging is beneficial for BM detection and segmentation. However, T2-weighted, T2-FLAIR 26

, and diffusion tensor imaging (DTI)

27 images routinely available typically have low resolution, especially slice resolution, which limits their benefits in detecting tiny BM. In addition, the registration among multi-modal data increases the requirements for standardized sequences and in clinical routine complete multi-modal datasets are not always available. Therefore, BM detection and segmentation in single-modal data has more practical value, since tiny metastases that are most challenging for expert detection typically are visualized on high-resolution post-contrast 3D T1w sequences only. So far, the investigation on BM detection in longitudinal MRI data has not been studied yet. In this work, the application of deep learning for BM detection and segmentation in single-modal longitudinal MRI data is investigated.

The contribution of this work lies mainly in the following aspects: a) A new loss function called volume-level sensitivity-specificity (VSS) loss, based on the definitions of sensitivity and specificity exactly at a (sub)volume level, is proposed to improve tiny metastases detection and to adjust sensitivity and specificity; b) To the best our knowledge, our work is the first to integrate temporal prior information for automated BM identification; c) An ensemble learning strategy that achieves high sensitivity for BM detection and distinguishes high-confidence metastases from metastases candidates, which require expert review or additional follow-up, to accommodate requirements of expert support in real clinical practice.

Ii. Materials And Methods

Ii.A. Baseline method: DeepMedic

In this work, we apply the DeepMedic 19 as our baseline method because of its efficacy in various brain tumor segmentation tasks. DeepMedic is originally proposed for general brain tumor segmentation. In brain tumor segmentation, especially for BM segmentation, class imbalance is a major issue as normal tissue voxels outnumber tumor voxels. To overcome this problem, DeepMedic samples volume segments online to keep class balance during training. In addition, multi-scale features are extracted via parallel convolutional pathways.

Ii.B. Volume-level sensitivity-specificity (VSS) loss

Conventional loss functions 28, 29 like BCE, DSC, and IOU evaluate the output of a neural network at a voxel level for segmentation purpose. Such a segmentation needs to identify the existence, location, and three-dimensional extent of a metastasis, which is challenging in general. Optimal brain metastases segmentation is particularly difficult to achieve with conventional voxel level loss functions, since the vast differences in volume sizes (or voxel numbers) among metastases leads to limited detection rate of tiny metastases and the partial volume effect results in large uncertainties in segmentation of tiny metastases when they approach individual voxel size. Instead, for clinical applications, algorithms are typically evaluated in a patient (volume) or lesion (metastasis) level. For example, sensitivity stands for the ability of a test to correctly identify patients with a disease, while specificity defines the ability of a test to correctly identify people without the disease 30

. For BM segmentation in 3D volumes (same for subvolumes), the first primary clinical requirement is to correctly detect the existence of metastases with high sensitivity and specificity irrespective of the exact three-dimensional extent of each metastasis. If a metastasis exists in a volume (volume-level positive case), in case of a voxel-level classification task, this can be translated to maximize the probability of at least one predicted voxel being located in a true metastasis region (sensitivity); When no metastasis exists in a volume (volume-level negative case), the network needs to minimize the probability of any voxel being a false-positive metastasis (specificity).

Sensitivity (positive case): We denote the segmentation label of the -th volume by . Correspondingly, the prediction of the -th volume is denoted by . The label is binary, while the prediction is a probability between 0 and 1. With this definition, represents whether any metastasis exists in the -th volume. If , there is at least one metastasis; otherwise, no metastasis exists. is the intersection region, where is voxel-wise multiplication. Then represents the maximum voxel level probability that at least one metastasis has been detected within the -th volume (probability from 0 to 1). Note that instead of is used to make sure the detected metastasis mask has overlap with the reference mask. In a batch with samples, stands for the number of correctly detected volumes (when the maximum probability approaches 1 in the ideal case). The total number of volumes containing metastases is . Accordingly, we can define the sensitivity as follows,


where is a small value to avoid division by zero.

Specificity (negative case): In the negative case, equals one for the -th segmentation label, and is the total number of negative volumes. Similarly, equals one means that the neural network has predicted a tumor probability of zero for every voxel in the volume. The total number of volumes, which are correctly segmented as fully negative, is . Accordingly, we can calculate the specificity as follows,


ensures that at least one metastasis can be detected in a volume if metastases exist, while ensures no FP metastasis presents in a fully negative volume. With the two metrics, a volume-level sensitivity-specificity (VSS) loss is defined,


where () is a parameter to adjust the weights of and , as it is always a trade-off between sensitivity and specificity. According to its definition, the VSS loss focuses on the detection of metastases instead of segmentation, incorporating maximum predicted voxel-level probabilities in a continuous fashion. Because the sensitivity definition includes a overlap criterion and specificity is evaluated exclusively in volumes without metastases, high predicted tumor probabilities inside true metastases improve the cost function, whereas discrepancies between predictions and ground truth in the ambiguous periphery of metastases does not affect it for volumes harbouring metastases. Therefore, a conventional segmentation loss like BCE or DSC needs to be used together with the VSS loss so that the neural network is able to perform segmentation as well. In this work, we combine the VSS loss with the BCE loss as a default joint loss, denoted by JVSS,


Note that a sensitivity-specificity segmentation loss has been proposed for sclerosis lesion segmentation 31. It is defined as follows,


where is the voxel index and hence the loss function is defined at a voxel-level as opposed to a subvolume level. The left part is sensitivity error and the right part is the specificity error. The parameter adjusts the weights of these two parts. This loss function does not follow the definitions of sensitivity and specificity exactly. For distinction, we call this loss function sensitivity-specificity error (SSE). In this work, the performance comparison of our proposed sensitivity-specificity loss function with SSE in BM identification is investigated.

Ii.C. Temporal prior path

(a) Prior image
(b) Main image
(c) Difference image
Figure 1: Exemplary images acquired from different time points: (a) temporal prior image; (b) current main image, where the contour indicated by the arrow is segmented incorrectly as a metastasis by a neural network that used the prior image as an additional channel as opposed to an additional pathway; (c) difference between (a) and (b), where the bright region indicated by the arrow is the main cause for the incorrect segmentation.

In contrast-enhanced MRI images, many structures such as blood vessels are also enhanced by contrast agents. Hence, they have similar appearance to metastases in the regard to intensity, shape, and size in 2D intersectional planes. As a consequence, it is very challenging for human experts as well as deep neural networks to distinguish them. One key difference of BM from such normal structures is that BM are sphere-like structures while enhanced vascular structures are tube-like structures in 3D space. Therefore, 3D instead of 2D neural networks are utilized to capture 3D features. In addition, another key difference of BM from normal structures is that BM grow much faster in size. In radiation therapy, patients have regular follow-up MRI scans in approximately every 4-6 weeks. Therefore, by comparing two images acquired from two time points, if a high contrast structure emerges or grows, this structure has high confidence to be a metastasis. To integrate such temporal prior information in deep learning, two potential ways are possible: using the temporal prior volume as an additional input channel or as an additional input path. As anatomical structures imaged at different time points cannot be perfectly registered to the same position, using temporal prior volumes as an additional channel will result in a high FP rate. An example is displayed in Fig. 1, where a normal tissue region indicated by the arrow in Fig. 1(b) is segmented incorrectly as a metastasis. Such mis-segmentation is mainly caused by the imperfect registration, since high intensity difference is observed in the corresponding region of the difference image (Fig. 1(c)). To avoid such problem, in this work we propose to put the temporal prior volume as an additional input path, where features from two time points are merged at deep layers. The overall architecture is displayed in Fig. 2, where the normal resolution subvolumes from both the prior and main datasets as well as two low resolution subvolumes from the main dataset are fed into DeepMedic. Note that for volumes without any temporal prior, an empty prior volume with zero values is used.

Figure 2: The modified DeepMedic architecture uses one additional path for temporal prior volumes. The normal resolution (NR) subvolumes from both the prior and main datasets as well as two low resolution (LR) subvolumes from the main dataset are input segments of different network paths. Modified from 19.

Ii.D. Ensemble learning

For segmentation outputs from neural networks trained with conventional losses like BCE, a threshold can be adjusted to get a balance between sensitivity and specificity. All the segmentation masks generated in this work use a default probability threshold of 0.5, which is a default setting of DeepMedic as well as a convention for segmentation applications. Since a large value leads to high sensitivity while a small one leads to high specificity with our proposed JVSS loss. We can train the same network twice, where temporal prior is used, to get a high sensitivity model () and a high specificity model (), respectively. The final segmentation mask is the union of the two segmentation masks from and . The metastases predicted by have high confidence as true positive metastases, while predicts additional metastasis candidates with high sensitivity for detailed analysis by clinical experts or to undergo additional imaging studies or follow-up imaging.

Ii.E. Experimental Setup

Ii.E.1. Dataset

770 contrast enhanced T1 volumes from 176 patients using the MRI magnetization-prepared rapid gradient echo (MPRAGE) sequence from a longitudinal study

32, 33 are used for evaluation. Please find detailed information about data preprocessing and the distribution of BM in 32, 33. Among them, 600 volumes are used for training, 67 volumes for validation to monitor convergence and overfitting, and the remaining 103 volumes are used for test. The training volumes are from 135 patients and 466 of them have temporal prior volumes. The test volumes are from 32 patients (excluding any patient in the training datasets) and 71 of them have temporal prior volumes. In total, 278 metastases are contained in the test dataset. Among all the metastases, 130 (47%) of metastases have a volume size smaller than 0.1 cm, which corresponds to a diameter of about 6 mm. All volumes are resampled to an isotropic 1 mm resolution with dimensions and affinely registered. Bias correction is applied to get homogeneous intensity.

Ii.E.2. Training and test parameters

All DeepMedic models are trained on an NVIDIA Quadro RTX 8000 GPU with Intel Xeon Gold 6158R CPUs. Each training takes about 21 hours. Based on the validation loss, 60 epochs are used to train each model. The initial learning rate is 0.001. The RMSProp optimizer


and Nesterov momentum

35 with , and regularization is applied. The class-balance sampling with 50% probability of tumor segments is applied to extract training samples. The segments for the main path have a size of

. As 9 convolutional layers are used without padding, the effective segment size in the network output is

. The segments are augmented with random intensity scaling, flipping and rotation. For efficient inference, input segments are applied.

Ii.E.3. Evaluation metrics

Since our VSS loss calculates sensitivity and specificity in subvolumes during training, subvolume-level sensitivity and specificity as well as precision are calculated based on the number () of subvolumes to show the efficacy of our proposed VSS loss. The metrics are defined as follows,


where TP, FN, and FP stand for true positive, false negative, and false positive, respectively. Positive means at least one voxel is positive (value 1) in a subvolume, while negative means all the voxels are negative (value 0) in the subvolume. Evaluation of whole brain volumes (instead of subvolumes) is not performed since all the volumes contain BM and these volumes are all predicted as positive by DeepMedic.

For clinical applications, the number, location, and size of BM are very important for accurate treatment. Hence, our evaluation will focus on metastasis-level metrics. Since all the volumes contain BM, the metric of specificity is not used. Instead, the sensitivity and precision are calculated at a metastasis-level based on the number () of TP, FP, and FN metastases. The metric of DSC can reflect the accuracy of location and size of segmented BM. In practice, BM segmentation is performed after detection confirmation. Therefore, the mean DSC (mDSC) of all TP metastases detected by a network model is calculated. The reference BM masks are segmented manually by experienced radiation oncologists.

Iii. Results

Iii.A. Results of the VSS loss

Iii.A.1. Subvolume-level evaluation

The subvolume-level evaluation of DeepMedic models trained with different loss functions is displayed in Tab. 1. With the JVSS loss, depending on the value of in Eqn. (4), the sensitivity and specificity can be adjusted. For reference, 673 subvolumes are positive and 14746 subvolumes are negative. The baseline DeepMedic model trained with the conventional BCE loss, denoted by , achieves a sensitivity of 0.880, a specificity of 0.992, and a precision of 0.826. For DeepMedic with the JVSS loss, when , a high sensitivity of 0.952 is achieved. However, the specificity is relatively low. Especially, the precision is only 0.427. When decreases to 0.5 (without prior), the sensitivity decreases to 0.841. Instead, the specificity and precision both increase to 0.9991 and 0.976, respectively. This is consistent with the JVSS loss design.


Loss Sensitivity Specificity Precision


no prior

BCE 0.880 0.9915 0.826
JVSS 0.952 0.9417 0.427
JVSS 0.936 0.9807 0.689
JVSSt 0.917 0.9924 0.846
JVSS 0.899 0.9978 0.950
JVSS 0.868 0.9981 0.954
JVSS 0.841 0.9991 0.976


JVSS 0.927 0.9873 0.769
JVSS 0.857 0.9995 0.986


Table 1: Subvolume-level accuracy of DeepMedic with different values.
Figure 3: The ROC curves of different models.

The receiver operating characteristic (ROC) curves of different models are displayed in Fig. 3. Due to the high specificity of DeepMedic models, all the ROC curves in Fig. 3 have steep transitions from low sensitivity to high sensitivity in a short FP rate (1 - specificity) range. As a consequence, the comparison of different ROC curves is not straightforward. The area under curve (AUC) for each ROC curve is displayed in Fig. 3. When in the JVSS loss decreases from 1 to 0.5, the AUC values increases from 0.9752 to 0.9822 first and later decreases to 0.9536, where and achieve the best AUC values, both close to 0.982.

Iii.A.2. Metastasis-level evaluation


Prior Loss Sensitivity Precision FP mDSC


No BCE 0.853 0.691 106 0.789
JVSS 0.974 0.274 718 0.808
JVSS 0.946 0.516 247 0.798
JVSS 0.914 0.736 91 0.801
JVSS 0.881 0.918 22 0.792
JVSS 0.860 0.930 18 0.788
JVSS 0.802 0.987 3 0.755
Yes JVSS 0.932 0.621 158 0.808
JVSS 0.842 0.996 1 0.760


Number of total metastases: 278

Table 2: Metastasis detection accuracy of DeepMedic with the JVSS loss using different values.
Figure 4: Three examples of FN metastases by (the baseline DeepMedic model trained with BCE loss), which are detected by DeepMedic with the JVSS () loss. The segmentation masks are indicated by green color. Two zoomed-in ROIs without and with the segmentation mask are displayed on the left and right top corners respectively for each patient.

The metastasis-level evaluation of DeepMedic models trained with different loss functions is displayed in Tab. 2. The baseline DeepMedic model achieves a sensitivity of 0.853 and a precision of 0.691. In other words, among the 278 metastases, 237 TP metastases are detected, while 106 FP metastases are also marked. The FN metastases are mainly tiny metastases, as displayed in the top left regions-of-interest (ROIs) of Fig. 4 where three exemplary FN metastases are displayed.

With the JVSS loss, depending on the value of in Eqn. (4), the sensitivity and precision can be adjusted. When where high sensitivity is desired, DeepMedic achieves a high sensitivity of 0.974 but a low precision of 0.274. Among all the 278 metastases, 271 are successfully detected, but 718 FP metastases are detected. With a slight smaller value, i.e., , a high sensitivity of 0.946 is achieved, while the precision is increased to 0.516. The number of FP metastases is decreased from 718 to 247. In the top right ROIs of Fig. 4, the metastasis positions of the three exemplary images are correctly detected by DeepMedic trained with the JVSS loss ().

When is decreased in the JVSS loss, the sensitivity is decreased, but the precision is improved significantly. When , the sensitivity of DeepMedic with the JVSS loss is comparable to that of DeepMedic with BCE only. However, the precision is increased from 0.691 to 0.930. When , DeepMedic achieves a high precision of 0.987, with only 3 FP metastases remaining. The three FP metastases are displayed in Fig. 5. In Fig. 5, the green region is a TP metastasis and its boundary is segmented well. The red region in Fig. 5 is a FP metastasis, which is a connecting point of two vessels. The right vesicle in Fig. 5 has higher intensity than the surrounding tissues and it has a sphere-like structure in the 3D space. That is why it is falsely detected as a metastasis. The marked region in Fig. 5 has relatively higher intensity than the surrounding vessels. Therefore, whether it is a false or true positive metastasis is controversial, if we look at one single volume only. After checking with the follow-up scans, this region does not grow at all and hence we confirm that it is a FP metastasis.

Figure 5: The three false positive metastases detected by DeepMedic with the JVSS () loss are marked by red color. The green region in (a) is a detected true positive metastasis.
Figure 6: The precision-recall curves of DeepMedic with our proposed VSS loss and the SSE loss by adjusting the weight .

The metastasis detection accuracy of DeepMedic with SSE 31 is also evaluated. For a better comparison between SSE and our proposed VSS, their precision-recall (sensitivity) curves (without temporal prior) are displayed in Fig. 6 by adjusting the weight . The AUCs for JVSS and SSE are 0.9471 and 0.8985, respectively. When the precision is around 0.93, the sensitivity of SSE is 0.673 while that of JVSS is 0.860; when the sensitivity is around 0.88, the precision of SSE is 0.631 while that of JVSS is 0.918. Fig. 6 demonstrates that our proposed JVSS loss has better metastasis detection accuracy than the SSE loss.


Prior Loss Sensitivity Precision FP mDSC


No BCE 0.720 0.667 100 0.754
JVSS 0.827 0.197 939 0.814
JVSS 0.694 0.836 38 0.763


Table 3: Metastasis detection accuracy of 3D U-Net with different values.

Our proposed JVSS loss works with other neural networks. As an example, the metastasis detection accuracy of the JVSS loss with 3D U-Net is displayed in Tab. 3. With the BCE loss, 3D U-Net achieves a sensitivity of 0.720 and a precision of 0.667, which is consistent with the performance reported by other groups 15. With JVSS, when , the sensitivity of 3D U-Net increases to 0.827, but the precision drops to 0.197; when , the sensitivity is 0.694 but the precision increases to 0.836. However, the performance of 3D U-Net is inferior to DeepMedic in general. Therefore, in this work DeepMedic is chosen as our baseline method. Due to high computation of 3D models, only and are displayed for 3D U-Net as demonstration examples.

Iii.B. Results of temporal prior

Fig. 5 is an example where temporal prior information is beneficial for metastasis identification. With an additional path for the prior volume, the red region in Fig. 5, as well as the other two cases in Fig. 5, is detected correctly. The sensitivity and precision for DeepMedic with the JVSS loss ( and ) together with the temporal prior, i.e. and respectively, are displayed in Tab. 2. For (), the total number of FP metastases is reduced from 247 to 158, where FP metastases are reduced. As a trade-off, the sensitivity has a slight decrease from 0.946 to 0.932, with only 4 more FN metastases. Note that some of the test volumes are first scans without temporal priors. If we exclude such volumes, the number of FP metastases decreases from 180 to 100 with the help of temporal prior, which is about 45% less, where 3 instead of 4 more FN metastases are observed. For (), the sensitivity is slight worse than that of . However, the precision is as high as 0.996 with only one FP metastasis. The FP case is displayed in Fig. 7, where the current main image together with its temporal prior and posterior images are displayed. The difference image between the main image and the temporal prior image is displayed in Fig. 7(c), where the area indicated by the arrow has larger difference. That is why regards this region as a metastasis. However, after checking its posterior image (Fig. 7(d)), no grown metastasis exists. Therefore, we regard the detection in Fig. 7(b) as FP. But we cannot eliminate the possibility that a real metastasis has regressed at the segmented region before the posterior scan.

(a) Prior image
(b) Main image with mask
(c) Difference ((b)-(a))
(d) Posterior image
Figure 7: The single false positive metastasis detected by : (a) temporal prior image; (b) current main image, where the FP metastasis is marked by red color; (c) difference image ((b)-(a)); (d) temporal posterior image.

For segmentation accuracy, the mDSCs for all the TP metastases are displayed in Tab. 2. In general, the mDSCs are around 0.8. With a smaller value, the mDSC value gets slightly smaller. The segmentation boundaries of three exemplary metastases are displayed in Fig. 8. Green boundaries are manual reference segmentation boundaries, while red () and blue () boundaries are segmentation boundaries of and , respectively. In Figs. 8(a) and (b), all the three boundaries have good consistency, with DSC values larger than 0.9. These two segmentation results are general cases for metastases larger than 0.1 cm. Fig. 8(c) is one example where DeepMedic achieves lower DSC values. The tumorous region in this case is difficult to define in single-modal data as some part of the whole tumor has necrosis after treatment. In addition, the active parts, which are enhanced by contrast agents, are separated in many horizontal slices. Therefore, and both segment the active parts as two separated metastases. As a consequence, the DSC values are low. Nevertheless, segments the active parts better than . Tab. 2 indicates that achieves better mDSC values than the baseline DeepMedic, while is slightly worse. Since the union of and segmentation masks is used in our ensemble learning, the good segmentation of is preserved.

(a) 0.941, 0.916
(b) 0.910, 0.910
(c) 0.596, 0.133
Figure 8: Segmentation boundaries of three exemplary metastases. Green boundaries are manual reference segmentation boundaries, while red and blue boundaries are segmentation boundaries of and , respectively. The DSC values of each metastasis volume are displayed on the left and right in the subscaptions for and , respectively. Note for the mixed colors: yellow = green + red; magenta = red + blue; cyan = green + blue; white = red + green + blue.

Iii.C. DeepMedic ensemble

Figure 9: The segmentation results of three exemplary patients (Patient 1: (a)-(c); Patient 2: (d)-(f); Patient 3: (g)-(i)) by the baseline DeepMedic and DeepMedic ensemble ( + ), rendered by 3D Slicer. Left: ground truth (green), middle: prediction of original DeepMedic, right: our proposed DeepMedic ensemble. The metastases indicated by the green arrow are missed by , but detected by DeepMedic ensemble. The red metastases are TP metastases detected by baseline DeepMedic. The pink metastases are true positive metastases detected by . The cyan metastases indicated by cyan arrows are additional true positive metastases detected by . The yellow metastases indicated by yellow arrows are false positive metastases.

The segmentation results of three exemplary patients by the baseline DeepMedic and DeepMedic ensemble ( + ), rendered by 3D Slicer, are displayed in Fig. 9. The metastases indicated by the green arrows are missed by , but these missing ones are all detected by DeepMedic ensemble. The red metastases are TP metastases detected by . The pink metastases are detected by , which are all TP. The cyan metastases are additional TP metastases detected by . In Fig. 9 and Fig. 9, one additional TP metastasis is detected by , while two additional TP metastases are detected in Fig. 9. The yellow metastases are FP metastases. One FP metastasis is present in Fig. 9 and Fig. 9. In Fig. 9 and Fig. 9, three and two FP metastases are present, respectively. Fig. 9 demonstrates that with the ensemble of and , the majority of TP metastases are confirmed. Only a few metastasis candidates marked exclusively by in every patient need expert confirmation, additional diagnostic studies or imaging follow-up. Note that the three examples in Fig. 9 have relatively a large number of metastases. On average, each patient volume has about 0.25 additional TP metastases to include and 1.5 additional FP metastases to exclude, according to Tab. 2.

Iv. Discussion

With the selected subvolume size , each subvolume typically contains a single metastasis or no metastasis, so that the sensitivity and specificity definitions derived from predicted voxel-level probabilities in the JVSS loss should impact the sensitivity and specificity for detection of individual metastases. Although a few subvolumes may contain more than one metastases in the training data, they do not make a significant difference due to their low incidence. For inference, the trained model is still able to detect multiple metastases in a single subvolume, as demonstrated in Fig. 9 where two close metastases are both detected by .

In Fig. 4, the volume sizes of the three metastases are all smaller than 0.1 cm. Therefore, it is very challenging for DeepMedic, and for human experts as well, to identify them, especially with many vascular structures being similar to them. These small metastases consist of very few voxels and there is ambiguity due to partial volume effects in the periphery of the metastases. For such tiny metastases, DeepMedic only detects a few voxels, or even one voxel only, in each metastasis region. For example, in Fig. 4, not all the metastasis voxels are covered by the segmentation mask. Nevertheless, detecting their existence already has important clinical value. In general the segmentation accuracy of DeepMedic does not change or changes slightly for all other large metastases. This is demonstrated by the mDSC values in Tab. 2 and the 3D rendering in Fig. 9. Interestingly, mDSC is even improved using the proposed JVSS loss in as compared to BCE alone, which illustrates that the introduction of the proposed VSS loss does not compromise the accuracy of individual segmentation masks in detected metastases.

Ensemble learning is also used by other research groups for BM identification 15, 17. Hu et al. 15 proposed to use the average probability map from U-Net and DeepMedic. However, for each individual segmented metastasis, there is still no guarantee whether it is a TP or FP metastasis. Bousabarah et al. 17 applied majority voting to decide the final segmentation mask from three different U-Net models to achieve high precision. However, its sensitivity is only 0.77. Our ensemble learning puts high sensitivity and high precision together in one result. As displayed in Fig. 9 and Tab. 2, the metastases detected by are 99.6% (almost 100%) sure to be TP metastases. For the remaining metastasis candidates detected by , it is very efficient to further evaluate whether they are TP or FP as there are only 1.75 metastases per patient remaining on average. Compared with the state-of-the-art methods, the FP rate of our method is very small. For example, the DeepMedic method proposed by Charron et al. 21 achieves a sensitivity of 93%, but with 7.8 FP metastases per patient. The ensemble solution thereby combines high confidence for metastasis detection with high sensitiviy, efficiently guiding expert attention to metastases candidates that are uncertain or require additional follow-up, where it is required the most. We therefore believe that this approach is especially well-fit to the requirements of expert support in real clinical practice.

Note that DeepMedic is an exemplary network to demonstrate the efficacy of our proposed sensitivity-specificity loss and temporal prior for BM identification in longitudinal MRI data. We do not claim DeepMedic the best network for BM identification. With the rapid development of deep learning techniques, new neural architectures or existing neural architecture variants with our proposed VSS loss and temporal prior may achieve comparable or even better performance. Nevertheless, the ensemble approach of DeepMedic has already important value in real clinical practice.

V. Conclusion And Outlook

Tiny BM detection is a challenging task due to their small size, low contrast, and similar appearance to vessels. Our proposed JVSS loss can adjust the detection sensitivity and precision in a large range to either achieve high sensitivity or high precision. The temporal prior information is able to further reduce FP metastases. The ensemble learning is able to distinguish high confidence true positive metastases from metastases candidates that require special expert attention or further follow-up, which facilitates metastasis detection and segmentation for neuroradiologists in diagnostic and radiation oncologists in therapeutic clinical applications.


  • 1 E. Tabouret, O. Chinot, P. Metellus, A. Tallet, P. Viens, and A. Goncalves, Recent trends in epidemiology of brain metastases: an overview, Anticancer Res. 32, 4655–4662 (2012).
  • 2 D. Steinmann et al., Effects of radiotherapy for brain metastases on quality of life (QoL), Strahlenther. Onkol. 185, 190–197 (2009).
  • 3 E. Le Rhun et al., EANO–ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up of patients with brain metastasis from solid tumours, Ann. Oncol. (2021).
  • 4 E. L. Chang et al., Neurocognition in patients with brain metastases treated with radiosurgery or radiosurgery plus whole-brain irradiation: a randomised controlled trial, Lancet Oncol. 10, 1037–1044 (2009).
  • 5 M. Kocher, A. Wittig, M. D. Piroth, H. Treuer, H. Seegenschmiedt, M. Ruge, A.-L. Grosu, and M. Guckenberger, Stereotactic radiosurgery for treatment of brain metastases, Strahlenther. Onkol. 190, 521–532 (2014).
  • 6 P. D. Brown et al., Effect of radiosurgery alone vs radiosurgery with whole brain radiation therapy on cognitive function in patients with 1 to 3 brain metastases: a randomized clinical trial, Jama 316, 401–409 (2016).
  • 7 P. W. Sperduto et al., Beyond an updated graded prognostic assessment (breast GPA): a prognostic index and trends in treatment and survival in breast cancer brain metastases from 1985 to today, Int. J. Radiat. Oncol. Biol. Phys. 107, 334–343 (2020).
  • 8 M. Kocher, M. I. Ruge, N. Galldiks, and P. Lohmann, Applications of radiomics and machine learning for radiotherapy of malignant brain tumors, Strahlenther. Onkol. 196, 856–867 (2020).
  • 9 Ú. Pérez-Ramírez, E. Arana, and D. Moratal, Brain metastases detection on MR by means of three-dimensional tumor-appearance template matching, J. Magn. Reson. Imaging 44, 642–652 (2016).
  • 10 L. Sunwoo et al., Computer-aided detection of brain metastasis on 3D MR imaging: Observer performance study, PLoS One 12, e0178265 (2017).
  • 11 G. Gonella, E. Binaghi, P. Nocera, and C. Mordacchini, Investigating the behaviour of machine learning techniques to segment brain metastases in radiation therapy planning, Appl. Sci. 9, 3335 (2019).
  • 12 M. Zhang, G. S. Young, H. Chen, J. Li, L. Qin, J. R. McFaline-Figueroa, D. A. Reardon, X. Cao, X. Wu, and X. Xu, Deep-learning detection of cancer metastases to the brain on MRI, J. Magn. Reson. Imaging 52, 1227–1236 (2020).
  • 13 S. Bakas et al., Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge, arXiv preprint arXiv:1811.02629 (2018).
  • 14 S. J. Cho, L. Sunwoo, S. H. Baik, Y. J. Bae, B. S. Choi, and J. H. Kim, Brain metastasis detection using machine learning: a systematic review and meta-analysis, Neuro Oncol. 23, 214–225 (2021).
  • 15 S.-Y. Hu, W.-H. Weng, S.-L. Lu, Y.-H. Cheng, F. Xiao, F.-M. Hsu, and J.-T. Lu, Multimodal volume-aware detection and segmentation for brain metastases radiosurgery, in Proc. AIRT, pages 61–69, Springer, 2019.
  • 16 S. Lu, S. Hu, W. Weng, Y. Chen, J. Lu, F. Xiao, and F. Hsu, Automated Detection and Segmentation of Brain Metastases in Stereotactic Radiosurgery Using Three-Dimensional Deep Neural Networks, Int. J. Radiat. Oncol. Biol. Phys. 105, S69–S70 (2019).
  • 17 K. Bousabarah et al., Deep convolutional neural networks for automated segmentation of brain metastases trained on clinical data, Radiat. Oncol. 15, 1–9 (2020).
  • 18 J. Xue et al., Deep learning–based detection and segmentation-assisted management of brain metastases, Neuro Oncol. 22, 505–514 (2020).
  • 19 K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, and B. Glocker, Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation, Med. Image Anal. 36, 61–78 (2017).
  • 20 Y. Liu et al., A deep convolutional neural network-based automatic delineation strategy for multiple brain metastases stereotactic radiosurgery, PloS One 12, e0185844 (2017).
  • 21 O. Charron, A. Lallement, D. Jarnet, V. Noblet, J.-B. Clavier, and P. Meyer, Automatic detection and segmentation of brain metastases on multimodal MR images with a deep convolutional neural network, Comput. Biol. Med. 95, 43–54 (2018).
  • 22 E. Grøvik, D. Yi, M. Iv, E. Tong, D. Rubin, and G. Zaharchuk, Deep learning enables automatic detection and segmentation of brain metastases on multisequence MRI, J. Magn. Reson. Imaging 51, 175–182 (2020).
  • 23 Z. Zhou et al., Computer-aided detection of brain metastases in T1-weighted MRI for stereotactic radiosurgery using deep learning single-shot detectors, Radiol. 295, 407–415 (2020).
  • 24 M. Losch, Detection and segmentation of brain metastases with deep convolutional networks, 2015.
  • 25 E. Dikici, J. L. Ryu, M. Demirer, M. Bigelow, R. D. White, W. Slone, B. S. Erdal, and L. M. Prevedello, Automated brain metastases detection framework for T1-weighted contrast-enhanced 3D MRI, IEEE J. Biomed. Health Inform. 24, 2883–2893 (2020).
  • 26 J. V. Hajnal et al., Use of fluid attenuated inversion recovery (FLAIR) pulse sequences in MRI of the brain, J. Comput. Assist Tomogr. 16, 841–841 (1992).
  • 27 D. Le Bihan, J.-F. Mangin, C. Poupon, C. A. Clark, S. Pappata, N. Molko, and H. Chabriat, Diffusion tensor imaging: concepts and applications, J. Magn. Reson. Imaging 13, 534–546 (2001).
  • 28 S. Jadon, A survey of loss functions for semantic segmentation, in Proc. CIBCB, pages 1–7, IEEE, 2020.
  • 29 J. Ma, J. Chen, M. Ng, R. Huang, Y. Li, C. Li, X. Yang, and A. L. Martel, Loss odyssey in medical image segmentation, Med. Image Anal. , 102035 (2021).
  • 30 A. Swift, R. Heale, and A. Twycross, What are sensitivity and specificity?, Evid. Based Nurs. 23, 2–4 (2020).
  • 31 T. Brosch, Y. Yoo, L. Y. Tang, D. K. Li, A. Traboulsee, and R. Tam, Deep convolutional encoder networks for multiple sclerosis lesion segmentation, in Proc. MICCAI, pages 3–11, Springer, 2015.
  • 32 D. Oft et al., Volumetric Regression in Brain Metastases After Stereotactic Radiotherapy: Time Course, Predictors, and Significance, Front. Oncol. 10 (2020).
  • 33 F. Putz et al., FSRT vs. SRS in Brain Metastases—Differences in Local Control and Radiation Necrosis—A Volumetric Study, Front. Oncol. 10 (2020).
  • 34 T. Tieleman et al., Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, COURSERA: Neural Netw. Mach. Learn. 4, 26–31 (2012).
  • 35 I. Sutskever, J. Martens, G. Dahl, and G. Hinton, On the importance of initialization and momentum in deep learning, in Proc. ICML, pages 1139–1147, PMLR, 2013.