Automated Detection of Individual Micro-calcifications from Mammograms using a Multi-stage Cascade Approach

by   Zhi Lu, et al.

In mammography, the efficacy of computer-aided detection methods depends, in part, on the robust localisation of micro-calcifications (μC). Currently, the most effective methods are based on three steps: 1) detection of individual μC candidates, 2) clustering of individual μC candidates, and 3) classification of μC clusters. Where the second step is motivated both to reduce the number of false positive detections from the first step and on the evidence that malignancy depends on a relatively large number of μC detections within a certain area. In this paper, we propose a novel approach to μC detection, consisting of the detection and classification of individual μC candidates, using shape and appearance features, using a cascade of boosting classifiers. The final step in our approach then clusters the remaining individual μC candidates. The main advantage of this approach lies in its ability to reject a significant number of false positive μC candidates compared to previously proposed methods. Specifically, on the INbreast dataset, we show that our approach has a true positive rate (TPR) for individual μCs of 40% at one false positive per image (FPI) and a TPR of 80% at 10 FPI. These results are significantly more accurate than the current state of the art, which has a TPR of less than 1% at one FPI and a TPR of 10% at 10 FPI. Our results are competitive with the state of the art at the subsequent stage of detecting clusters of μCs.



There are no comments yet.


page 1

page 2

page 4


Efficient pedestrian detection by directly optimize the partial area under the ROC curve

Many typical applications of object detection operate within a prescribe...

Deep learning for clustering of continuous gravitational wave candidates II: identification of low-SNR candidates

Broad searches for continuous gravitational wave signals rely on hierarc...

FishNet: A Unified Embedding for Salmon Recognition

Identifying individual salmon can be very beneficial for the aquaculture...

An End-to-end Framework For Integrated Pulmonary Nodule Detection and False Positive Reduction

Pulmonary nodule detection using low-dose Computed Tomography (CT) is of...

Robust Method of Vote Aggregation and Proposition Verification for Invariant Local Features

This paper presents a method for analysis of the vote space created from...

Mindless Attractor: A False-Positive Resistant Intervention for Drawing Attention Using Auditory Perturbation

Explicitly alerting users is not always an optimal intervention, especia...

Single-view 2D CNNs with Fully Automatic Non-nodule Categorization for False Positive Reduction in Pulmonary Nodule Detection

Background and Objective: In pulmonary nodule detection, the first stage...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Breast cancer is the most diagnosed cancer amongst women worldwide, with 23% of all diagnosed cancers [1]. Breast screening programs aims to detect breast cancer at its early stages, when treatment is generally more effective [2]. These programs are usually based on the analysis of mammograms, where one of the main goals is the detection of micro-calcifications (C) given that almost half of all breast cancers are associated with [3]. As shown in Fig. 1,

Cs are represented by tiny calcium deposits that are displayed as small white spots on a mammogram. Their automated localisation by computer-aided detection (CADe) methods has the potential to streamline mammogram analysis and reduce the inter-user variance of

Cs [4].

Figure 1: Appearance of a micro-calcification in a mammogram.

The current state-of-the-art methods for the automated detection of Cs consists of the following standard pipeline: S.1) detection of individual C candidates, S.2) clustering of individual

C candidates based on their geometric distribution, and S.3) classification of those

C clusters. The fact that step S.1 above usually produces a large number of false positive individual Cs combined with the evidence that calcification malignancy is correlated with clusters of [3] has motivated the inclusion of S.2 and S.3. These last two steps are able to eliminate a large number of isolated false positive C detections, but they often fail to reject individual false positive Cs within clusters, which can potentially bias the analysis of a mammogram. In this paper, we propose a novel pipeline comprising the following steps: P.1) detection of individual C candidates, P.2) classification of individual C candidates, and P.3) clustering of individual C candidates based on their geometric distribution. Compared to the standard pipeline, steps P.1 and S.1 are the same, step P.2 is new, step P.3 is the same as step S.2, and step S.3 has been removed. We have two goals with our new approach: 1) a significant reduction of the number of false positive individual C detections, especially within true positive clusters, and 2) competitive detection rate of C clusters. A quantitative analysis of our approach is performed using the publicly available INBreast dataset [5], where the main results obtained show that our method achieves a true positive detection of individual Cs (TPR) of 40% at 1 false positive detection per image (FPI) and 80% TPR at 10 FPI, which is significantly more accurate than the current state of the art [6]. We also show competitive results in terms of the detection of clusters of Cs.

2 Related Work

Automatic detection of

Cs from mammograms is usually carried out with a combination of image processing and machine learning methods 

[7]. Image processing methods [8] rely on prior knowledge about the appearance of Cs (such as local gradient and intensity). However this approach is unlikely to provide a robust characterisation of all variations in the appearance of Cs. Alternatively, machine learning [9, 10] aims to provide a robust characterisation of Cs from the information available from an annotated training set. In general, with a relatively large and diverse training set, machine learning methods can outperform image processing methods. However, they face some challenges: 1) the selection of an appropriate feature set to be extracted and used by the models, 2) the class imbalance problem that provides a much larger number of negative than positive samples on which to train the model (caused by the significantly smaller area occupied by Cs compared with the area filled with normal breast tissue), and 3) the selection of the model to be used. State-of-the-art machine learning methods address these challenges with the use of a cascade of boosting classifiers that rely on general appearance and shape features [6, 11, 7]. Therefore, we follow this strategy here.

The design of current state-of-the-art methods consisting of steps S.1-S.3 (defined in Sec. 1) is based on the clinical importance that clusters of Cs have in comparison to individual Cs [3]. Another motivation is that step S.1 tends to generate a large number of individual false positive C detections that need to be subsequently removed. Hence, by clustering individual Cs using proximity and number of detections within a small area, it is possible to reject a large number of isolated false positive detections[6]. However, we observed that within a cluster, false positive detections were still prevalent. This motivated us to propose a new pipeline with the introduction of a classification step between standard steps S.1 and S.2 that filters out individual C candidates using their shape and appearance features with a cascade of boosting classifiers [12]. Given that this new process is effective for removing false positive C detections, we no longer require step S.3 (cluster classification).

Figure 2: Pipeline of the proposed methodology with two examples, where the red contours on the mammograms denote the automated detections and green contours represent the manual annotation.

3 Methodology

The proposed methodology consists of a initial pre-processing step based on quantum noise equalisation [6], which is followed by three steps: P.1) detection of individual C candidates using pixel-based cascade of boosting classifiers [12] and Haar like features [6]; P.2) classification of the individual C detections (from step P.1) using a region-based cascade of boosting classifiers [12] with appearance and shape features [13]; and P.3) clustering of the individual Cs detected in step P.2. Each step is explained below, where we assume the availability of a training set , where denotes the mammogram ( represents the image lattice) and represents the set of C annotations for image with (i.e., each C annotation is a binary map, where pixels of labelled with denote part of the C of image).

Pre-processing: Our pre-processing is based on quantum noise equalisation proposed by Bria et al. [6]

, where the source of noise fluctuations in full-field digital mammograms (FFDM) can be described by a Poisson distribution with a standard deviation that can be estimated from the image.

Step P.1: Detection of individual C candidates: This first step consists of a pixel-based classifier [6], represented by , which estimates the likelihood that the pixel represents part of a C given the information extracted from a sub-window of size around the pixel. This classifier is represented by a cascade of boosting classifiers, where a pixel is accepted to be part of a C if it is positively classified by all stages of the cascade. In this cascade classifier [14], the detection rate and false positive rate of a cascade with stages are computed with and , where and represent the detection and false positive rates of stage . Therefore, if and and , then and .

The training of the classifier at each cascade stage , denoted by , uses a set of positive samples and negative samples , where each sample consists of a sub-image of of size centred at position , such that one of the C annotations contains (a negative sample is similarly defined with ). The main issue in training such classifier is the fact that , and this is solved by under-sampling the negative set, such that the proportion is constant over the training of each cascade stage. The classifier utilised in this work is the RUSBoost [12], which is designed to deal with such class imbalance with this under-sampling procedure. Finally, the feature set used is the Haar-like features [14], which are efficiently computed using integral image [14] (note that we use a set of features instead of the original features from [6] as this smaller set is faster to train and we did not notice a significant difference in the results). The final part of this step consists of finding the connected components of the pixel-based classification to form the C candidates, where connected components that have width and length larger than mm are removed because they represent macro-calcifications that are not to be processed further [6]. The step P.1 is defined by:


where denotes a binary map of the C candidate, is the classifier parameter set, and represents the top-left and bottom-right corner coordinates of the bounding box of this detection.

Step P.2: Classification of individual C detections with shape and appearance features: The contribution of this paper consists of this individual C classification step, where we extract a large set of shape and appearance features [13] from each C candidate in (1), and use a second cascade of RUSBoost [12] classifiers to further eliminate false positive C detections. These features are extracted with:


A set of 11 shape features are calculated from in (2), which describe the following geometric information: area, perimeter, ratio of perimeter to area, rectangularity, circularity, and etc. Another set of 27 appearance features in (2) are calculated from the sub-image of limited by the bounding box , consisting of information (energy, correlation, entropy, inertia, and etc.) extracted from the spatial grey level dependence (SGLD) matrix [13, 15]. In addition, we compute the Haar-like features of step P.1 and the local binary pattern (LBP) [16] from the sub-image limited by the bounding box . The step P.2 is defined by:


which selects a subset of the detections from step P.1, with , where is the parameter set of the classifier.

Step P.3: Clustering of individual C detections: The clustering of the C detections from step P.2 is based on the following algorithm [6]: 1) construction of a weighted graph formed by nodes represented by the centroid of the detected Cs, and edges that connect nodes that are closer than mm; and 2) estimation of clusters from the connected components of this graph, where clusters with fewer than 3 Cs are rejected. Step P.3 is defined by:


where is the parameter set of the classifier, and represents the set of clusters, where each element of this set is formed by a graph computed from a subset of from step P.2.

Figure 3: FROC curves of the individual Cs detections for our methodology (red) and the baseline (blue) [6].
Figure 4: FROC curves of the cluster Cs detections for our methodology (red) and the baseline (blue) [6].
Figure 5: ROC curves of the case-based detections of clusters of Cs for our methodology (red) and the baseline (blue) [6].

4 Materials and Methods

The experiments use the INBreast dataset [5], which contains 115 cases with 410 images, where 19 cases have no findings, 68 cases have benign findings and 28 cases have malignant findings (note that findings include C and masses), where 6,880 individual Cs have been identified by two radiologists. The experiments are performed using this dataset for the following reasons: it is a public domain (allowing direct comparison with other methods) full-field digital dataset where the individual manual C annotations are both precise and reliable. In order to evaluate the detection of C clusters, we produce the annotation of C clusters using step P.3 of Sec. 3 from the individual C manual annotations. We perform a quantitative evaluation of the individual C detection and cluster of

Cs detection by randomly dividing the 115 INBreast cases into five cross-validation folds with 60% of cases for training, 20% for validation and 20% for testing. We show the mean and standard deviation of performance on the both the train and test sets (note that we interpolate the ROC and FROC curves for each fold at fixed FPR values in order to plot the mean and standard deviation error bars).

The training of the classifier in step P.1 (Sec. 3) uses sub-images of size , with pixels (slightly less than mm), which is approximately the maximum size of the C of interest (i.e., between and mm [6]). For step P.1, we train five cascade stages, where the ratio between the sizes of the negative and positive sets (step P.1 of Sec. 3) is fixed to be for all these stages. The RUSBoost classifier has 2, 3, 5, 12, and 40 weak classifiers [6] in each of the stages, where the detection rate is and false positive rate is  [6]. For step P.2, the connected components forming the C candidates are also resized to a fixed size of patches using bicubic interpolation, from which the shape and appearance features are calculated. The classifier in step P.2 is a single RUSBoost classifier with 1000 weak learners. Model selection was performed on the validation set extracted from the training set of each cross validation fold. We compare our method with the baseline approach by Bria et al. [6], which consists of a methodology that contains the same pre-processing, followed by steps P.1 and P.3, and a final step that classifies the clusters (this baseline is the standard baseline described in Sec. 1). The implementation of this cluster classification is based on a cascade of RUSBoost [12] classifiers that use the mean, standard deviation, minimum and maximum values of 35 shape and appearance features from individual Cs and 5 topological cluster features (i.e., a total of features = features).

The quantitative evaluation of individual Cs detections and clusters of Cs is based on free-response receiver operating characteristic (FROC) curve that measures the true positive rate (TPR). An individual Cs is considered as true positive detection if it has an overlap of at least with one of the manual annotations and a false positive detection if it has an overlap less than with any manual annotation. Similarly, clusters of C detections are regarded as a true positive if they overlap with a manually annotated cluster and where they have at least two individual Cs in common [17, 6]. Finally, the case-based ROC curve evaluates the performance of the method in terms of finding C clusters independently of whether they are in the correct location. Here, a true positive is defined as a cluster detection in a case that has at least one manually annotated cluster of Cs, and a false positive is a detection in case that has no manually annotated cluster. Fig. 3 shows the FROC curves of individual Cs detections on training and testing sets for our methodology (red) and the baseline (blue) [6] (note that we crop the FPI at , but TPR continues to increase for all curves). In Fig. 4, we show the results of our methodology on cluster of Cs detection (red) and the baseline result (blue) [6]. Similarly, Fig. 5 shows the ROC curves of the case-based detections of clusters of Cs on training and testing sets for our methodology (red) and the baseline (blue) [6]. Finally, we show some visual examples of the final detections of our method in Fig. 6.

Figure 6: Examples of C detection produced by our method (green = manual annotation, red = automated detection).

5 Discussion and Conclusions

The results from the Fig. 3 show that our approach is significantly more effective at the detection of individual Cs compared to the baseline [6]. It is also interesting to note from Fig. 4 and Fig. 5 that our approach is competitive with the baseline in terms of cluster detection and case-based performance (note that the the results in those figures agree with the published results by Bria at al. [6], even though we use a different dataset). This apparent discrepancy in results is explained by the large number of individual false positive C detections that are preserved within true positive clusters of C detections by the baseline approach [6]. Our method is able to eliminate a significant number of these false positives and thus provide a more reliable result on which to perform further assessment of the mammogram. Finally, Fig. 6(a) shows that our proposed methodology is robust to normal mammograms, while Fig. 6(b-d) displays visually accurate detection of the individual as well as clusters of Cs.

In this paper we propose a new C detection pipeline that introduces a step that effectively filters out individual false positive C detections using shape and appearance features in a cascade of boosting classifiers. We empirically show that our method displays a significantly more effective detection of individual Cs compared to the current state-of-the-art approach [6]. This has the potential to improve the mammogram analysis in breast screening programs.


  • [1] Ahmedin Jemal, Rebecca Siegel, Elizabeth Ward, Yongping Hao, Jiaquan Xu, Taylor Murray, and Michael J Thun, “Cancer statistics, 2008,” CA: a cancer journal for clinicians, vol. 58, no. 2, pp. 71–96, 2008.
  • [2] Edward A Sickles, “Breast cancer screening outcomes in women ages 40-49: clinical experience with service screening using modern mammography.,” Journal of the National Cancer Institute. Monographs, , no. 22, pp. 99–104, 1996.
  • [3] Ulrich Bick, “Mammography: How to interpret microcalcifications,” in Diseases of the Abdomen and Pelvis 2014–2017, pp. 313–318. Springer, 2014.
  • [4] Maryellen L Giger, Nico Karssemeijer, and Julia A Schnabel, “Breast image analysis for risk assessment, detection, diagnosis, and treatment of cancer,” Annual review of biomedical engineering, vol. 15, pp. 327–357, 2013.
  • [5] Inês C Moreira, Igor Amaral, Inês Domingues, António Cardoso, Maria João Cardoso, and Jaime S Cardoso, “Inbreast: toward a full-field digital mammographic database,” Academic radiology, vol. 19, no. 2, pp. 236–248, 2012.
  • [6] Alessandro Bria, Nico Karssemeijer, and Francesco Tortorella, “Learning from unbalanced data: a cascade-based approach for detecting clustered microcalcifications,” Medical image analysis, vol. 18, no. 2, pp. 241–252, 2014.
  • [7] Seung Yeon Shin, Soochahn Lee, Il Dong Yun, Ho Yub Jung, Yong Seok Heo, Sun Mi Kim, and Kyoung Mu Lee, “A novel cascade classifier for automatic microcalcification detection,” PloS one, vol. 10, no. 12, pp. e0143725, 2015.
  • [8] Athanasios Papadopoulos, Dimitrios I Fotiadis, and Lena Costaridou, “Improvement of microcalcification cluster detection in mammography utilizing image enhancement techniques,” Computers in biology and medicine, vol. 38, no. 10, pp. 1045–1055, 2008.
  • [9] Issam El-Naqa, Yongyi Yang, Miles N Wernick, Nikolas P Galatsanos, and Robert M Nishikawa,

    “A support vector machine approach for detection of microcalcifications,”

    IEEE transactions on medical imaging, vol. 21, no. 12, pp. 1552–1563, 2002.
  • [10] Liyang Wei, Yongyi Yang, Robert M Nishikawa, Miles N Wernick, and Alexandra Edwards, “Relevance vector machine for automatic detection of clustered microcalcifications,” IEEE transactions on medical imaging, vol. 24, no. 10, pp. 1278–1285, 2005.
  • [11] Arnau Oliver, Albert Torrent, Xavier Lladó, Meritxell Tortajada, Lidia Tortajada, Melcior Sentís, Jordi Freixenet, and Reyer Zwiggelaar, “Automatic microcalcification and cluster detection for digital and digitised mammograms,” Knowledge-Based Systems, vol. 28, pp. 68–75, 2012.
  • [12] Chris Seiffert, Taghi M Khoshgoftaar, Jason Van Hulse, and Amri Napolitano,

    “Rusboost: improving classification performance when training data is skewed,”

    in Pattern Recognition, 2008. ICPR 2008. 19th International Conference on. IEEE, 2008, pp. 1–4.
  • [13] C Varela, S Timp, and N Karssemeijer, “Use of border information in the classification of mammographic masses,” Physics in Medicine and Biology, vol. 51, no. 2, pp. 425, 2006.
  • [14] Paul Viola and Michael Jones, “Rapid object detection using a boosted cascade of simple features,” in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on. IEEE, 2001, vol. 1, pp. I–511.
  • [15] Neeraj Dhungel, Gustavo Carneiro, and Andrew P Bradley,

    “Automated mass detection in mammograms using cascaded deep learning and random forests,”

    in Digital Image Computing: Techniques and Applications (DICTA), 2015 International Conference on. IEEE, 2015, pp. 1–8.
  • [16] Timo Ojala, Matti Pietikainen, and David Harwood, “Performance evaluation of texture measures with classification based on kullback discrimination of distributions,” in Pattern Recognition, 1994. Vol. 1-Conference A: Computer Vision & Image Processing., Proceedings of the 12th IAPR International Conference on. IEEE, 1994, vol. 1, pp. 582–585.
  • [17] Hao Jing, Yongyi Yang, and Robert M Nishikawa, “Detection of clustered microcalcifications using spatial point process modeling.,” Physics in medicine and biology, vol. 56, no. 1, pp. 1, 2010.