Interpretable deep learning regression for breast density estimation on MRI

12/08/2020 ∙ by Bas H. M. van der Velden, et al. ∙ 0

Breast density, which is the ratio between fibroglandular tissue (FGT) and total breast volume, can be assessed qualitatively by radiologists and quantitatively by computer algorithms. These algorithms often rely on segmentation of breast and FGT volume. In this study, we propose a method to directly assess breast density on MRI, and provide interpretations of these assessments. We assessed breast density in 506 patients with breast cancer using a regression convolutional neural network (CNN). The input for the CNN were slices of breast MRI of 128 x 128 voxels, and the output was a continuous density value between 0 (fatty breast) and 1 (dense breast). We used 350 patients to train the CNN, 75 for validation, and 81 for independent testing. We investigated why the CNN came to its predicted density using Deep SHapley Additive exPlanations (SHAP). The density predicted by the CNN on the testing set was significantly correlated with the ground truth densities (N = 81 patients, Spearman's rho = 0.86, P < 0.001). When inspecting what the CNN based its predictions on, we found that voxels in FGT commonly had positive SHAP-values, voxels in fatty tissue commonly had negative SHAP-values, and voxels in non-breast tissue commonly had SHAP-values near zero. This means that the prediction of density is based on the structures we expect it to be based on, namely FGT and fatty tissue. To conclude, we presented an interpretable deep learning regression method for breast density estimation on MRI with promising results.



There are no comments yet.


page 2

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Breast density – the ratio between fibroglandular tissue (FGT) and breast tissue – is an important risk factor for breast cancer [1, 2, 3]. This density can be assessed on e.g. mammography and magnetic resonance imaging (MRI), and is typically scored by a radiologist as one of four incremental categories [4].

Computer algorithms have provided quantitative assessment of breast density on MRI. These algorithms often rely on identification of the breast volume by removing pectoral muscle and air, followed by segmentation of the FGT [5, 6, 7]. More recently, deep learning has been proposed for FGT segmentation [8, 9].

Deep learning could also be used to directly assess breast density without segmentation steps. In this case, it would, however, be desirable to interpret on what basis the algorithm gave its result. In this study, we propose a method to directly assess density and provide interpretations of these assessments.

2 Materials and methods

We used a regression convolutional neural network (CNN) to assess breast density, and examined why the CNN came to its result using Deep SHapley Additive exPlanations (SHAP) [10]. The following paragraphs describe this in more detail.

2.1 Patients

We consecutively included 506 patients with early-stage unilateral invasive breast cancer. These patients received a preoperative T1-weighted MRI with imaging parameters: repetition time 8.1 ms, echo time 4.0 ms, flip angle 20, isotropic voxel size 1.35 1.35 1.35 mm3.

2.2 Data preparation and ground truth creation

For each patient, we removed field inhomogeneities using N4 biasfield correction [11]. We normalized the MR image between zero and one based on the 2.5th and 97.5th intensity percentiles and clipped intensities outside that range. We extracted a slab of 20 sagittal slices around the center of each breast – yielding 20 240 breast slices – and resized them to 128 128 voxels.

We based the ground truth on previously generated breast and FGT segmentations [12]. These segmentations were manually checked in previous studies [12, 13, 14, 15, 16]. For each slice, we defined the density as fraction of the number of FGT voxels divided by the number of breast voxels (Figure 1).

Figure 1: Example of a sagittal T1-weighted MR image slice (left) with corresponding breast segmentation (middle) and fibroglandular tissue (FGT) segmentation (right). The ground truth was created by dividing the number of voxels in the FGT segmentation by the number of voxel in the breast segmentation for each slice.

2.3 Regression CNN

We used a regression convolutional neural network (CNN)[17] to estimate the density per slice. This CNN consisted of five convolution layers with a 3 3 kernel size, a 2

2 stride, a rectified linear unit activation, 50% dropout, and batch normalization. These five layers were followed by two densely connected layers and an output node with a linear activation. We used the mean absolute percentage error as loss and an Adam optimizer with a learning rate of 0.001


We split the slices on patient-level: 14 000 slices corresponding to 350 patients were used for training the CNN (100 epochs, mini-batches of 100 slices), 3 000 (75 different patients) for validation, and 3 240 (81 different patients) for independent testing. For each slice in testing, the CNN returned a density-value between 0 (fatty breast) and 1 (dense breast). The correlation of these density-values with the ground truth density was assessed using Spearman’s


2.4 Interpretation of CNN results

We used Deep SHAP for interpretation of the CNN results [10]. Deep SHAP is a combination of DeepLIFT and SHapley Additive exPlanations [19, 10]. To assess the background signal needed for the Deep SHAP analysis, we randomly sampled 100 training slices.

For each slice, Deep SHAP yields a map of SHAP-values. Each pixel in this SHAP-values map represents the contribution of that pixel to the final decision. Hence, a higher absolute value corresponds to the pixel being more important for the prediction.

3 Results

3.1 Regression CNN

The density-values predicted by the CNN on the testing set were significantly correlated with the ground truth densities (N = 81 patients, Spearman’s , ).

Figure 2: The density-values predicted by the regression convolutional neural network in the testing set (N = 81 patients) show a significant correlation with the ground truth densities (Spearman’s , ).

3.2 Interpretation of CNN results

Inspection of the SHAP-value maps shows that in slices where the density predicted by the CNN matched the ground truth density, positive SHAP-values commonly occur in the glandular tissue, while negative SHAP-values occur in the fatty tissue (Figure 3). Voxels in the air, heart, or pectoral muscle are mostly ignored (Figure 3).

In slices where the predicted density deviated from the ground truth density, the SHAP-value maps are able to visualize where this deviation originated from. For example, in Figure 4B, the SHAP-value map shows overestimation in a patient with extremely dense breasts.

Figure 3: Four examples where the CNN correctly predicted the density of the slice: all these predictions were within 0.01 difference from the ground truth. Of each image pair, the left image is the slice the density was predicted on, the right image is a SHapley Additive exPlanations (SHAP)-map. The slice is plotted underneath with an opacity of 50% to show anatomical information. It can be seen that positive SHAP-values (red) occur in the fibroglandular tissue, while negative SHAP-values occur in the fatty tissue. The areas of the slice that do not influence density – such as air and pectoral muscle – are indeed not predictive of the CNNs explanation.
Figure 4: Two examples where the CNN overestimates the density of the slice. A: the predicted density is 0.17 and the ground truth density 0.08. In patient B, the predicted density is 0.55 and the ground truth density 0.45. These errors could be due to differences in anatomy with respect to the training set. Interpretation of the result using the SHAP-map shows uncertainty around the breast in patient A – who had relatively small breasts – and in the pectoral muscle in patient B – who had extremely dense breast. The images are formatted identically to Figure 3.

4 Discussion

We presented a combination of deep learning regression and an interpretation method of this regression for density assessment of breast MRI. The regression method was significantly correlated with the ground truth density.

The interpretation method supported the predictions of the CNN regression by identifying which regions of the segmentation were important. Positive SHapley Additive exPlanations (SHAP)-values commonly occurred in the fibroglandular tissue (FGT), while negative SHAP-values occurred in the fatty tissue. This is as expected: more FGT in a breast means a higher density; while the same amount of FGT in a larger breast means a lower density.

Our regression method could be used as a stand-alone solution for density assessment. It does not need intermediate steps such as segmentation to assess breast density. If a radiologist chooses to know why the method came to its result, he or she can check the interpretation using the SHAP-values. The method could in principle also be used to confirm other density assessment methods.

Our method did not always coincide with the ground truth. This mainly occurred in patients who had variations in anatomy that were not common in the training set. Future work could mitigate this by using more data.

5 New or Breakthrough Work

We presented an interpretable deep learning regression method for breast density estimation on MRI with promising results.


This work was funded by the Dutch Cancer Society (KWF), grant number 10755.


  • [1] Wolfe, J. N., “Breast patterns as an index of risk for developing breast cancer,” American Journal of Roentgenology 126(6), 1130–1137 (1976).
  • [2] McCormack, V. A. and dos Santos Silva, I., “Breast density and parenchymal patterns as markers of breast cancer risk: a meta-analysis,” Cancer Epidemiology and Prevention Biomarkers 15(6), 1159–1169 (2006).
  • [3] Boyd, N. F., Guo, H., Martin, L. J., Sun, L., Stone, J., Fishell, E., Jong, R. A., Hislop, G., Chiarelli, A., Minkin, S., et al., “Mammographic density and the risk and detection of breast cancer,” New England Journal of Medicine 356(3), 227–236 (2007).
  • [4] Morris, E., Comstock, C., Lee, C., Lehman, C., Ikeda, D., Newstead, G., et al., “ACR BI-RADS® atlas, breast imaging reporting and data system,” Reston, VA: American College of Radiology , 56–71 (2013).
  • [5] Wei, J., Chan, H.-P., Helvie, M. A., Roubidoux, M. A., Sahiner, B., Hadjiiski, L. M., Zhou, C., Paquerault, S., Chenevert, T., and Goodsitt, M. M., “Correlation between mammographic density and volumetric fibroglandular tissue estimated on breast MR images,” Medical Physics 31(4), 933–942 (2004).
  • [6] Nie, K., Chen, J.-H., Chan, S., Chau, M.-K. I., Yu, H. J., Bahri, S., Tseng, T., Nalcioglu, O., and Su, M.-Y., “Development of a quantitative method for analysis of breast density based on three-dimensional breast MRI,” Medical Physics 35(12), 5253–5262 (2008).
  • [7] Wu, S., Weinstein, S. P., Conant, E. F., and Kontos, D., “Automated fibroglandular tissue segmentation and volumetric density estimation in breast MRI using an atlas-aided fuzzy C-means method,” Medical Physics 40(12), 122302 (2013).
  • [8] Dalmış, M. U., Litjens, G., Holland, K., Setio, A., Mann, R., Karssemeijer, N., and Gubern-Mérida, A., “Using deep learning to segment breast and fibroglandular tissue in MRI volumes,” Medical Physics 44(2), 533–546 (2017).
  • [9] Ivanovska, T., Jentschke, T. G., Daboul, A., Hegenscheid, K., Völzke, H., and Wörgötter, F., “A deep learning framework for efficient analysis of breast volume and fibroglandular tissue using MR data with strong artifacts,” International Journal of Computer Assisted Radiology and Surgery , 1–7 (2019).
  • [10] Lundberg, S. M. and Lee, S.-I., “A unified approach to interpreting model predictions,” in [Advances in Neural Information Processing Systems ], 4765–4774 (2017).
  • [11] Tustison, N. J., Avants, B. B., Cook, P. A., Zheng, Y., Egan, A., Yushkevich, P. A., and Gee, J. C., “N4ITK: improved N3 bias correction,” IEEE transactions on medical imaging 29(6), 1310 (2010).
  • [12] Van der Velden, B. H., Dmitriev, I., Loo, C. E., Pijnappel, R. M., and Gilhuijs, K. G., “Association between parenchymal enhancement of the contralateral breast in dynamic contrast-enhanced MR imaging and outcome of patients with unilateral invasive breast cancer,” Radiology 276(3), 675–685 (2015).
  • [13] Knuttel, F. M., Van der Velden, B. H., Loo, C. E., Elias, S. G., Wesseling, J., van den Bosch, M. A., and Gilhuijs, K. G., “Prediction model for extensive ductal carcinoma in situ around early-stage invasive breast cancer,” Investigative Radiology 51(7), 462–468 (2016).
  • [14] Van der Velden, B. H., Elias, S. G., Bismeijer, T., Loo, C. E., Viergever, M. A., Wessels, L. F., and Gilhuijs, K. G., “Complementary value of contralateral parenchymal enhancement on dce-mri to prognostic models and molecular assays in high-risk er+/her2- breast cancer,” Clinical Cancer Research 23(21), 6505–6515 (2017).
  • [15] Van der Velden, B. H., Sutton, E. J., Carbonaro, L. A., Pijnappel, R. M., Morris, E. A., and Gilhuijs, K. G., “Contralateral parenchymal enhancement on dynamic contrast-enhanced MRI reproduces as a biomarker of survival in ER-positive/HER2-negative breast cancer patients,” European Radiology 28(11), 4705–4716 (2018).
  • [16] Van der Velden, B. H., Bismeijer, T., Canisius, S., Loo, C. E., Lips, E. H., Wesseling, J., Viergever, M. A., Wessels, L. F., and Gilhuijs, K. G., “Are contralateral parenchymal enhancement on dynamic contrast-enhanced MRI and genomic ER-pathway activity in ER-positive/HER2-negative breast cancer related?,” European Journal of Radiology 121, 108705 (2019).
  • [17] De Vos, B. D., Viergever, M. A., De Jong, P. A., and Išgum, I., “Automatic slice identification in 3D medical images with a ConvNet regressor,” in [Deep Learning and Data Labeling for Medical Applications ], 161–169, Springer (2016).
  • [18] Kingma, D. P. and Ba, J., “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).
  • [19] Shrikumar, A., Greenside, P., and Kundaje, A., “Learning important features through propagating activation differences,” in [

    Proceedings of the 34th International Conference on Machine Learning-Volume 70

     ], 3145–3153, JMLR. org (2017).