Dual Convolutional Neural Networks for Breast Mass Segmentation and Diagnosis in Mammography

08/07/2020 ∙ by Heyi Li, et al. ∙ 0

Deep convolutional neural networks (CNNs) have emerged as a new paradigm for Mammogram diagnosis. Contemporary CNN-based computer-aided-diagnosis (CAD) for breast cancer directly extract latent features from input mammogram image and ignore the importance of morphological features. In this paper, we introduce a novel deep learning framework for mammogram image processing, which computes mass segmentation and simultaneously predict diagnosis results. Specifically, our method is constructed in a dual-path architecture that solves the mapping in a dual-problem manner, with an additional consideration of important shape and boundary knowledge. One path called the Locality Preserving Learner (LPL), is devoted to hierarchically extracting and exploiting intrinsic features of the input. Whereas the other path, called the Conditional Graph Learner (CGL) focuses on generating geometrical features via modeling pixel-wise image to mask correlations. By integrating the two learners, both the semantics and structure are well preserved and the component learning paths in return complement each other, contributing an improvement to the mass segmentation and cancer classification problem at the same time. We evaluated our method on two most used public mammography datasets, DDSM and INbreast. Experimental results show that DualCoreNet achieves the best mammography segmentation and classification simultaneously, outperforming recent state-of-the-art models.



There are no comments yet.


page 1

page 2

page 6

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Breast cancer, according to the International Agency for Research on Cancer [5], is the most frequently diagnosed cancer. Screening mammography is widely employed and has shown its significance especially for invasive breast tumours when they are too small to be palpable or cause symptoms. The manual inspection of a mammogram typically requires the lesion’s identification as either benign or malignant, and sometimes the according delineation. However, the manual inspection is tedious, subjective, and prone to errors [42, 46, 49]. Striving for the optimal health care, mammographical computer aided diagnosis (CAD) systems are designed as an alternative to a double reader, aiming to achieve similar inspection results to that of human experts (Fig. 1).

(a) Annotation of a benign mass
(b) Annotation of a malignancy
Fig. 1: Manual inspection examples for breast masses in full field digital mammography (FFDM) INBreast dataset [41]. (a) contains a benign mass delineated with green lines, which is of oval shape and circumscribe boundaries in a CC view. (b) shows a malignant mass in red lines captured in a MLO view, which is of irregular shape and spiculated boundaries.
Fig. 2: The flow diagram of proposed DualCoreNet. With the extracted multi-scale ROIs as the input of LPL and CGL path separately, the DualCoreNet outputs both segmentation mask and diagnosis label.

Many conventional machine learning algorithms have been proposed to tackle this problem, which typically comprises various image processing operations (such as image segmentation, feature extraction, feature selection, and classification). The performance of a conventional CAD often relies heavily on cumbersome hand-engineered features


, which are subsequently introduced into various classifiers. Oliver et al.

[42] has demonstrated in their review paper that an accurate segmentation is the foundation of subsequent cancerous diagnosis, since the likelihood of malignancy depends on the shape and margin of lesions [28]. This statement has been empirically verified by a number of works [18, 50, 13, 49], which all claim that the most accurate breast mass diagnosis was obtained by the shape-related descriptors when compared with other conventional hand-crafted features. In fact, traditional machine learning algorithms are still popular in recent commercial CADs. However, there is significant room for improvement, especially for breast mass diagnosis.

In recent years, leveraging the insights from the success of deep neural networks (deep learning) [33]

in computer vision tasks

[11, 55, 45, 10, 21, 9, 7, 8], a noticeable shift to deep learning based CADs has been seen. Some works have proposed the use of extracting segmentation-related features by CNNs with radiologists’ pixel-level annotations, in order to further improve automatic diagnosis performance [17, 14]. However, this method requires large volume of accurate pixel-wise annotations, which are very difficult to obtain in practice. In order to enhance the network without using binary mask labeling, some authors explored the performance with an automatic segmentation algorithm [14]. Yet this automatic setup has caused a considerable performance drop. The poor performance is likely caused by the multi-stage process training. Based on these observations we are motivated to construct a CNN architecture trained in an end-to-end fashion, in order to jointly solve the breast cancer diagnosis (benign vs malignant) and the segmentation problem in mammography.

In this paper, we presented a multi-scale dual-path CNNs as shown in Fig. 2, to solve the image to diagnosis label mapping in a dual-problem manner. In particular, the dual-problem here especially refers to the segmentation and classification problem. A preliminary version of this work appeared in [35]. This paper extends [35] by providing a more detailed description of the work and more comprehensive experimental evaluations. Based on the accurate breast mass segmentation algorithm presented in [37] and the related breast mass classifiers [7, 3, 31, 17, 36], a Dual-path Conditional Residual Network (DualCoreNet) for mammography analysis is introduced. Firstly, a mass and its context texture learner called the Locality Preserving Learner (LPL) is built with stacks of convolutional blocks, achieving a mapping from relative large scale ROIs to class labels. Secondly, an integrated graphical and CNN model, called the Conditional Graph Learner (CGL) is employed to learn the relative small scale ROI to mask correlation, and the extracted segmentation features will be further used to improve the final mass classification performance. Additionally, we train the model with multi-scale ROIs, since the surrounding tissues of breast masses play a pivotal role for the accurate cancer diagnosis, whereas the contextual tissues are less irrelevant for the segmentation task. For a certain breast mass, a larger scale ROI is used as the input of LPL for richer features extraction, and a smaller scale ROI is employed by the CGL path. DualCoreNet achieves the best mammography segmentation and classification simultaneously, outperforming recent state-of-the-art models. The main contributions of this paper are the following:

  1. To our knowledge, DualCoreNet is the first fully automatic dual-path CNN-based mammogram analysis model that takes advantage of an automatic segmented mask for mass classification;

  2. Our method has achieved the best performance for breast mass segmentation in both low and high resolutions;

  3. DualCoreNet has achieved comparable results with mass classification tasks on publicly available mammography datasets.

Organization. The rest of the paper is arranged as follows: Section II presents the related preliminary techniques, Section III introduces the proposed DualCoreNet methodology, and Section IV shows the experimental results. We conclude this paper in Section V.

Ii Preliminary

In this section, we will introduce three machine learning methods, which are related to our proposed DualCoreNet

. Firstly, we will discuss the residual learning and the inception modules separately, which have improved the deep model generalization in CNNs. Then the conditional random Field (CRF), a type of graphical model, will be discussed for the medical image segmentation.

Fig. 3: Overall architecture of our proposed DualCoreNet architecture.

Ii-a Residual Learning

Residual learning [23]

is proofed a efficient way to accelerate the neural network training and avoid the gradient vanishing/exploding problems, which had been extensively applied method for myriad computer vision tasks. The main idea of of residual learning is the use of residual connections between the input and output to the neural network internal layers or blocks or even entire network. In particular, by letting the desired input to output mapping in a residual module as

, the residual function obtained in each module is defined with:


The gradients of a residual module in each layer are therefore pre-conditioned to be close to the identity function, solving the gradients vanishing problem. In this way, CNNs can be constructed with many more layers with efficacious training. Specifically, residual connections enable the deep neural network to learn the residual between the input and output of a module, instead of directly learning the mapping.

(a) Module A
(b) Module B
(c) Module C
(d) Module D
Fig. 4: Three convolution modules applied in DualCoreNet, where the original input feature map are all of the dimension and the expanded features are concatenated in each module. (a) Module A replaces a convolution with two convolutions. (c)Module B replaces a with smaller kernel convolutions. (c) Module C expands a filter bank of grids. (a) Module D is a dimension reduced convolution module, which halves the module input’s spatial dimension.

Ii-B Conditional Random Fields

Conditional Random Fields (CRF) [32]

, as a variant of Markov Random Fields (MRFs), incorporates the label consistency with similar pixels and provides sharp boundary and fine-grained segmentation. Typically a CRF formulates the label assignment problem as a probabilistic inference problem, in which pixel labels are modeled as MRF random variables and conditioned upon the observations

[32]. Given an observed deep latent feature , the Gibbs distribution of a fully connected CRF with nodes and edges is defined as:


where is pixel-level label state, is the one item in the clique set , is the clique joint label,

is the feature vector, and

is the partition function.

Recently, due to its flexibility and efficiency, CRF has been applied on medical images for various segmentation problems [56, 29] in which the CRF was mainly used as a post processing step which usually lead to an inherent shrinking bias problem [22]. In contrast, in this paper, the CRF is used as a specific neural network layer that comprised of the unary potentials and the pair-wise potentials connecting all other pixels with the aim of modeling long-range connections in arbitrarily large neighbourhoods and simultaneously preserve the advantageous fast inference [32, 29], in order to provide a balanced partitioning.

Iii Methodology

In this section, we will first define the notations used throughout the paper and formulate the problem to solve. After that, the two paths are separately discussed, which is followed by an introduction of two different feature aggregation methods.

Iii-a Notations and Problem Formulation

Given mammograms and the biopsy-confirmed annotations from human experts, the dataset can be then noted as , where represents the mass-contained ROI with spatial dimension , is the pixel-level annotation corresponding to the ROI, and is a scalar indicating the diagnosis class label, and “0” and “1” represents the benign and the malignant class, respectively. Note that the cropped ROIs contain only one mass in our scenario. The main targets solved by DualCoreNet can be formulated as follows: (1) given a mass contained mammogram ROI, DualCoreNet is desired to map the original images to binary masks, so that mass pixels are segmented: . (2) With the original mammogram images and the obtained pixel-level labels, learn a nonlinear mapping to the diagnosis label: , where .

Iii-B Motivation

Practically, radiologists make decisions about breast masses with their shape and boundary features. The more irregular the shape is, the more likely the mass is malignant [15], i.e., the classification results generally heavily rely on the segmentation results [42]. In analogy, decoupling a complicated learning task into several sub-tasks that are easier to solve is has also been proved an efficient learning paradigm in machine learning, there many methods aim to use multi-paths neural networks to solving image classification or other image restoration problems [39, 52, 9, 54, 19, 30, 20, 40, 12, 38]. However, the decoupling of breast mass diagnosis problem is seldom studied and the multi-path architecture has never been exploited with the dual segmentation and classification problem. Based on these, we aim to close this gap by solving these two problems in one, thus further improving the mammography analysis.

Iii-C The proposed DualCoreNet

In this paper, we propose the DualCoreNet architecture, which decouples the differentiation of benign and malignant classes into dual problems: segmentation and classification. In the classification task, each input ROI sample (with surrounding tissues) will be classified into cancer category or not; whereas in the segmentation task, each pixel is labeled as either or so that mass pixels can be accurately identified within the tight bounding box ROI.

The DualCoreNet takes a batch of multi-scaled mammogram ROIs as the input and outputs the mass segmentation masks and the diagnosis labels simultaneously (Fig. 2). The mass segmentation computes the mapping from smaller scale ROI to binary masks, i.e. . The mass classification solves the mapping of , by which the larger scale ROIs are mapped into diagnosis labels. Based on this idea, the DualCoreNet is constructed, which is comprised of the Locality Preserving Learner (LPL) and the Conditional graph learner (CGL) paths.

(a) The validation loss of DDSM
(b) The validation loss of INbreast
Fig. 5: The residual learning and vanilla DualCoreNet validation loss on DDSM and INbreast datasets.
Methodology Dataset Segmentation Classification End-to-end
Arevalo et al. [3] (2016) private - 0.82
Dhungel et al. [17] (2016) INbreast - 0.91
Kooi et al. [31] (2017) private - 0.80
Dhungel et al. [14] (2017) INbreast 0.76
Al-antari et al. [2] (2020) INbreast 92.36 0.95
DualCoreNet INbreast 93.69 0.93
DDSM 92.17 0.85
TABLE I: High-resolutional breast cancer local analysis with both segmentation and classification performance of state-of-art algorithms. The segmentation performance is assessed by the DI (%) matrix and the classification performance (Malignant vs Benign) is evaluated with the AUC score. The “Processed” column represents the pre-processing and post-processing of the list algorithms respectively.

Iii-C1 Locality Preserving Learner

The LPL path is constructed to learn the hierarchical and local intrinsic features from large scale ROIs. Large scale ROIs includes both textural and contextual information which are pivotal for mass classification[50, 46]. Inpisred by the well-known CNN backbone architecture, e.g., VGG [47], the ResNet [24, 25], and the DenseNet [27], in this paper we propose an effective architecture, especially for mammography diagnosis, as illustrated in Fig. 3, and the several separable convolution modules Fig. 4 along residual connections are employed.

In this paper, the LPL path is constructed with 11 convolutional layers. In particular, the first four consecutive layers of the LPL employ naive convolutional layers along with the dimension reduction block (Fig. 4(d)), allowing the network to learn features at every chosen spatial scale. The number of feature maps consistently increase in the first four convolutional layers, from 16 (input) to 728 (the 4 layer output) feature maps, whereas the spatial dimension reduces from to . Regarding the spatial downsampling, the maxpooling layer is employed in the 2 layer, whereas the separable convolution in Figure 4(d) is utilized in the 3 and 4 layers. Instead of using maxpooling followed by the convolutions, this dimension reduction method concatenates different scaled features generated by one or two convolutional operators directly. After that, two blocks of each Block-A (5 and 6 layers), Block-B (7 and 8 layers), and Block-C (9 and 10 layers) are separately constructed. These depth-wise separable convolutional layers produce the same number of feature maps and the identical spatial dimension, i.e. . By utilizing convolutions in the depth-wise separable blocks, the cross-channel correlations are learned first, resulting in a much smaller feature space. Thereby, the LPL is enabled to learn richer features with much fewer parameters, hence alleviating the overfitting problem markedly with the same amount of training data.

Lastly, the generated deep features in the 11 layer are activated by the softmax non-linearity. The loss associated to the LPL layer is defined with categorical cross-entropy as:


where is the class indicator and is the corresponding parameter set in LPL.

Dataset Pre-training Augmentation AUC score
DDSM none none 0.73
DDSM none flips 0.73
DDSM none flips, random crops 0.74
DDSM ImageNet none 0.79
DDSM ImageNet flips 0.79
DDSM ImageNet flips, random crops 0.85
INbreast none none 0.80
INbreast none flips 0.85
INbreast none flips, random crops 0.84
INbreast DDSM none 0.86
INbreast DDSM flips 0.89
INbreast DDSM flips, random crops 0.93
TABLE II: Breast cancer diagnosis performance (Malignant vs Benign) of the LPL path in DualCoreNet.

Iii-C2 Conditional Graph Learner

The CGL path aims to extract segmentation-related or geometrical features from the resulted binary mask produced by an image to pixel-level label mapping. However, adapting CNNs to pixel-level labelling tasks is a significant challenge, since convolutional filters produce coarse outputs and max-pooling layers further reduce the sharpness of segmented boundaries. Although many methods have been utilized for this problem

[44, 53, 37], unfortunately the balanced partitioning with high pixel resolution is still a challenge to solve [22].

Thereby, we propose a novel breast mass segmentation CNN architecture for the CGL path, as shown in Fig. 3, which is expected to not only precisely segment high-resolutional breast mass but also to control the model complexity. To do that, a CRF inference layer is applied in the low resolutional latent space and a concatenation is added to connect the high-resolutional feature maps, so that exhaustive textural features are interlaced and fully used.

(a) DDSM
(b) INbreast
Fig. 6: Exampled high-resolutional breast mass segmentation results on DDSM and INbreast datasets, with an visualized comparison between radiologists’ annotation (red lines) and DualCoreNet segmentation results (green lines).
Methodology INbreast DDSM Spatial Dimension Pre/Post-processed
Beller et al. [4] (2005) - - - / -
Cardoso et al. [6] (2015) - - / -
Dhungel et al. [16] (2015) ✓/ ✓
Dhungel et al. [15] (2015) ✓/ ✓
Zhu et al. [56] (2018) ✓/ ✗
Al-antari et al. [1] (2018) - ✓/ ✓
U-Net [44] (2015) ✗/ ✗
Li et al. [37] (2018) ✗/ ✗
U-Net [44] (2015) ✗/ ✗
Dhungel et al. [14] (2017) - original ✓/ ✓
Wang et al. [51] (2019) 91.69 ✓/ ✓
Singh et al. [48] (2020) - ✓/ ✓
DualCoreNet (2020) ✗/ ✗
TABLE III: Quantitative breast mass segmentation performance (Dice coefficient, %) of DualCoreNet and several state-of-the-art methods on test sets. The “Processed” column represents for the pre-processing and post-processing of the list algorithms respectively.

In particular, the fully connected CRF can be defined as follows:


where is the partition function, and is the deep latent feature of input

calculated by the softmax layer in CGL path.

is the unary potential function, which is initialized as . is the pair-wise potential function which is formulated as


where and are the predicted labels of connected nodes for position and respectively. is the label compatibility defined by the Pott’s Model [43]. is the learned weight and is the pre-defined weighed Gaussians over feature vectors at position and [32, 37].

Fig. 7: Mass classification ROC curves of the DualCoreNet experimented on INbreast and CBIS-DDSM for identifying the cancerous masses from the union of Benign and Malignant ROIs.

In order to improve the segmentation performance for the unbalanced mass contained ROIs, we proposed to minimize the below two dice losses and for the CGL path to output a high-resolutional binary label mask.


where is the output of the final softmax layer in the CGL path, are the parameters, is the trade-off factor in the CGL path.

Iii-C3 Fusion Module

So far, the textual features and shape features have been extracted by the LPL and CGL path, it is natural to integrate these two separate features in the feature fusion block, in order to further improve the diagnosis performance. To do that, we propose a fusion module as shown in Fig. 3. In particular, we first use two transformation blocks, each of which consists of several convolution layers followed by an average pooling and two fully connected layers, to transfer the output feature maps of LPL and CGL paths, respectively, such transformed feature maps from two paths will feed into a softmax layer to output the final classification result. The overall categorical cross-entropy based loss for classification task is defined as:


where is the diagnosis class indicator and is the entire network parameter vector.

Finally, by integrating the losses for LPL path e.g. (3), CGL path e.g. (6) and Fusion modules e.g. eq:fusion, the entire DualCoreNet loss is thereby defined as:


where and are two trade-off factors to control the importance of and paths, which are empirically set as in our experiments.

Iii-D Implementation

We use the Adam to optimize our DualCoreNet . In order to alleviate overfitting and improve generalization, we use several training techniques for the DualCoreNet model. (1) Regarding the initialization of the LPL path, the DDSM dataset is first trained in the LPL path and the parameters are then fine-tuned in the INbreast dataset. (2) The dropout layers are employed with 50% random parameters dropping.

Iv Experiments

Iv-a Materials

In this paper, we validate the proposed DualCoreNet with two public mammography datasets: CBIS-DDSM [34] and INbreast [41]. The CBIS-DDSM is a modernized subset of Digital Database for Screening Mammography (DDSM) [26], in which 2478 digitized mammograms are formatted in DICOM format. On top of that, the CBIS-DDSM has already been partitioned into the training (1318 masses) and test set (378 masses). In this paper, we adopt the same data division method as the website did. In the matter of class balance, either training or test set involves the equivalent amount of two classes of lesions. Particularly, the malignant and benign ratio for the training and test sets are both roughly . The INbreast dataset [41] is a Full Field Digital Mammography (FFDM) dataset, which was acquired at a Breast Centre in Hospital de São João, Porto, Portugal. There are a total of 115 cases in the INbreast dataset, which contains 410 mammogram images. Analogously, there are also two mammographic views for each breast and the images were annotated by human experts in a pixel-level labeling fashion. Regarding the data division for the evaluation of DualCoreNet, the INbreast data set is divided by patients into a training set and a test set as 80%: 20%.

In terms of the ROIs selection, masses are center cropped by two scales. One scale is the rectangular tight bounding box padded with 5 pixels on each boarder, which is utilized by the CGL to explore the segmentation related features. The other scale is to crop the contextual rectangular region with proportional padding, so that mass-centered ROI includes regions

times the size of the bounding box. These contextual ROIs are utilized by the LPL path to extract latent and hierarchical features from masses and their surrounding tissues. The selected ROIs are then all resized into identical dimension

by bicubic interpolation. Accordingly, the ground truth binary masks are cropped and resized but with the nearest neighbor interpolation. To avoid overfitting and provide better generalization, the selected ROIs are augmented with horizontal and vertical flips and random crop (with augmentation probability of 50% for each instance) after data division.

Fig. 8: CGL segmentation performance for DDSM and INbreast datasets in DualCoreNet. Dice coefficients (%) are averaged over 10 experiments with different in loss (6) and residual configurations (”R” represents residual and ”No R” means without residual skips) on INbreast and DDSM datasets.

Iv-B Results and Analysis

Iv-B1 Comparison with State-of-the-art

We first compared the propose DualCoreNet with five related state-of-the-art breast mass diagnosis methods [3, 17, 31, 14, 2]. The results are listed in Table I where the performances of these compared methods are obtained from the results presented by their papers. In particular, [14] and [2] have solved both breast mass segmentation and classification problem. Compared with these methods, DualCoreNet is the only algorithm which has experimented on the DDSM dataset and DualCoreNet has achieved leading diagnosis performance (second-best) on INbreast dataset. DualCoreNet produces a 0.93 and 0.85 AUC score for mass diagnosis on INbreast and DDSM dataset, respectively. When compared with [2], there is only a 0.001 AUC difference, which is mainly because [2] randomly divided the training and test set after data augmentation. In our paper, however, we first divide the original data into training and test set, which are followed by augmentation. Furthermore, DualCoreNet has obtained the best segmentation performance when compared with all other algorithms, yielding 93.69% and 92.17% for INbreast and DDSM dataset.

Additionally, we evaluated the effectiveness of residual learning employed in DualCoreNet, we compare the test loss on two datasets between the configuration with or without residual skips in the best performing architecture (Fig. 3

). Generally, a desired validation loss is expected to be stable after some training epochs, after consistently decreasing with the increasing training epoch. As shown in Fig.

5, the validation loss of the vanilla DualCoreNet either did not generalize or even slightly increased on both datasets. On the contrary, the residual learning DualCoreNet has shown a good ability of generalization, in which the validation loss was decreasing and saturated with the increasing number of training epochs. It is noted that the validation loss of the residual learning is much higher than that without residual skips (Figure 5). This is caused by the weight decay regularization term. Since the number of parameters in the residual learning is larger than that in the no residual connection DualCoreNet .

In addition, as shown in Table II, the cancer diagnosis performance of DualCoreNet (Malignant vs Benign) with different regularization configurations (augmentation or pre-training) have been listed. It can be noticed that the pre-training has markedly improved the model performance and augmentation method further competently increased the generalization. When DualCoreNet is trained with pre-training and data augmentation, the diagnosis AUC score for DDSM and INbreast has achieved 0.85 and 0.93, respectively.

Iv-B2 The importance of Dual-paths

We are interested in studying the importance of dual-paths. We evaluate the CGL and the LPL path for the segmentation and classification performance, each of which is individually trained by and only.

Fig. 9: LPL diagnosis performance (Malignant vs Benign) by various backbone networks

Regarding the segmentation performance of CGL, the dice coefficients comparison between related works have been listed in Table III. For all the listed algorithms, only mass contained ROIs are provided as the input. In terms of low-resolutional mass segmentation with output dimension , [37] has achieved the best performance with 93.66% and 92.23% on INbreast and CBIS-DDSM, respectively. With respect to the high-resolutional segmentation, the DualCoreNet is so far the best performing algorithm for both datasets, with a 93.69% DI score in INbreast and 92.17% in DDSM dataset. Fig. 8 has shown the segmentation performance when CGL is trained with various values. It can be noticed that the over all segmentation performance on the INbreast dataset is significantly better than that on the DDSM dataset, which is mainly attributed to the higher quality data of INbreast. Specifically for the INbreast dataset, the no residual connections and residual configuration generally performs equivalently. However, the best performance was obtained by no residual connection configuration when is 0.65. Note that the worst performance on INbreast (with either residual skips or not) was at , where the CGL segmentation loss is contributed to the CNN and graphical model as ratio . In terms of the DDSM dataset, the overall better performance was produced by the residual learning configuration. The best performance was obtained at , where the segmentation DI achieves 92.17%.

The visualized segmentation results of CGL and radiologists’ delineations can be seen in Fig. 6(a) for both datasets. It can be noticed that, the proposed segmentation method performs very well with higher resolution mammograms, in which fine boundary details and irregular shape contours are both well depicted. There are no resulting spurious regions in DualCoreNet, which is mainly due to the structural consistency restriction by the graphical inference layer. Although we implement the graphical inference in a small spatial size before converting to high resolutions, the performance is not affected. By doing so, a more efficient inference can be obtained with much less parameters and computing time.

As for the classification performance of the LPL path, we compared the LPL classification performance with various backbone networks, such as the Inception-v3, Xception, VGG16, VGG19, and ResNet-50 (Fig. 9). It can be seen that all networks perform better on the INbreast dataset. The best performances for INbreast and DDSM are obtained by the proposed LPL architecture (0.92 AUC score) and Xception network, respectively. However, the performance difference margin between our LPL (0.81 AUC score) and the Xception (0.83 AUC score) on DDSM is very small, with only a 0.02 AUC score. Generally speaking, the proposed LPL architecture achieved the leading performance for breast cancer diagnosis on both INbreast and DDSM datasets.

Iv-B3 Ablation study on the training loss

Finally, We compared the DualCoreNet

with with different loss function configurations.:

111Note we observed in our experiments that if we trained the model with only use would lead to a not converged model. , , and . The diagnosis ROC curves of DualCoreNet with the Fusion loss combined with the LPL path, CGL path and dual-paths loss are shown in Fig. 7. It can be noticed that DualCoreNet performs best when activating both paths. And the second best performing training loss configuration is obtained by the Fusion and LPL path loss. This indicates that the features integrated from bot paths can learn richer information from the data.

V Conclusions

In this paper, we propose an innovative dual-path CNN architecture called DualCoreNet for segmentation and classification problem. DualCoreNet first embeds the original mammography data into two heterogeneous data domains (i.e. the original image and the binary mask domain), where deep features are jointly learned. By integrating the conditional graph learner path and the locality preserving learner path, our DualCoreNet works in a simple but effective way to jointly learn segmentation and classification. The integrated intrinsic localized textural features and semantic information extracted from binary masks contribute to an interpretable and more discriminative representation, which can maximize the similarity margins between benign and malignant instances in the deep latent space. Extensive experiments have shown that our method outperforms the state-of-the-arts on both breast mass segmentation and classification tasks in mammography. In addition DualCoreNet performs better on higher quality dataset (the INbreast dataset).


  • [1] M. A. Al-antari, M. A. Al-masni, M. Choi, S. Han, and T. Kim (2018) A fully integrated computer-aided diagnosis system for digital x-ray mammograms via deep learning detection, segmentation, and classification. International journal of medical informatics 117, pp. 44–54. Cited by: TABLE III.
  • [2] M. A. Al-antari, M. A. Al-masni, and T. S. Kim (2020) Deep Learning Computer-Aided Diagnosis for Breast Lesion in Digital Mammogram. Adv. Exp. Med. Biol. 1213, pp. 59–72. External Links: Document, ISBN 9783030331283, ISSN 22148019 Cited by: TABLE I, §IV-B1.
  • [3] J. Arevalo, F. A. González, R. Ramos-Pollán, J. L. Oliveira, and M. A. G. Lopez (2016) Representation learning for mammography mass lesion classification with convolutional neural networks. Computer methods and programs in biomedicine 127, pp. 248–257. Cited by: §I, TABLE I, §IV-B1.
  • [4] M. Beller, R. Stotzka, T. O. Müller, and H. Gemmeke (2005) An example-based system to support the segmentation of stellate lesions. In Bildverarbeitung für die Medizin 2005, pp. 475–479. Cited by: TABLE III.
  • [5] P. Boyle, B. Levin, et al. (2008) World cancer report 2008.. IARC Press, International Agency for Research on Cancer. Cited by: §I.
  • [6] J. S. Cardoso, I. Domingues, and H. P. Oliveira (2015) Closed shortest path in the original coordinates with an application to breast cancer.

    International Journal of Pattern Recognition and Artificial Intelligence

    29 (01), pp. 1555002.
    Cited by: TABLE III.
  • [7] G. Carneiro, J. Nascimento, and A. P. Bradley (2017) Automated analysis of unregistered multi-view mammograms with deep learning. IEEE transactions on medical imaging 36 (11), pp. 2355–2365. Cited by: §I, §I.
  • [8] D. Chen, M. E. Davies, and M. Golbabaee (2020) Compressive mr fingerprinting reconstruction with neural proximal gradient iterations. In International Conference on Medical image computing and computer-assisted intervention (MICCAI), Cited by: §I.
  • [9] D. Chen and M. E. Davies (2020) Deep decomposition learning for inverse imaging problems. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §I, §III-B.
  • [10] D. Chen, J. Lv, and Z. Yi (2017) Unsupervised multi-manifold clustering by learning deep representation. In Workshops at the 31th AAAI conference on artificial intelligence (AAAI), pp. 385–391. Cited by: §I.
  • [11] D. Chen, J. Lv, and Z. Yi (2018)

    Graph regularized restricted boltzmann machine

    IEEE transactions on neural networks and learning systems 29 (6), pp. 2651–2659. Cited by: §I.
  • [12] F. Ciompi, K. Chung, S. J. Van Riel, A. A. A. Setio, P. K. Gerke, C. Jacobs, E. T. Scholten, C. Schaefer-Prokop, M. M. Wille, A. Marchiano, et al. (2017) Towards automatic pulmonary nodule management in lung cancer screening with deep learning. Scientific reports 7, pp. 46479. Cited by: §III-B.
  • [13] S. Dhahbi, W. Barhoumi, and E. Zagrouba (2015)

    Breast cancer diagnosis in digitized mammograms using curvelet moments

    Computers in biology and medicine 64, pp. 79–90. Cited by: §I.
  • [14] N. Dhungel, G. Carneiro, and A. P. Bradley (2017) A deep learning approach for the analysis of masses in mammograms with minimal user intervention. Med. Image Anal. 37, pp. 114–128. External Links: Document, ISSN 13618423, Link Cited by: §I, TABLE I, TABLE III, §IV-B1.
  • [15] N. Dhungel, G. Carneiro, and A. P. Bradley (2015) Deep learning and structured prediction for the segmentation of mass in mammograms. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 605–612. Cited by: §III-B, TABLE III.
  • [16] N. Dhungel, G. Carneiro, and A. P. Bradley (2015) Deep structured learning for mass segmentation from mammograms. In Image Processing (ICIP), 2015 IEEE International Conference on, pp. 2950–2954. Cited by: TABLE III.
  • [17] N. Dhungel, G. Carneiro, and A. P. Bradley (2016) The automated learning of deep features for breast mass classification from mammograms. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 106–114. Cited by: §I, §I, TABLE I, §IV-B1.
  • [18] I. Domingues, E. Sales, J. Cardoso, and W. Pereira (2012) Inbreast-database masses characterization. XXIII CBEB. Cited by: §I.
  • [19] J. Fan, X. Cao, P. Yap, and D. Shen (2019) BIRNet: brain image registration using dual-supervised fully convolutional networks. Medical image analysis 54, pp. 193–206. Cited by: §III-B.
  • [20] X. W. Gao, R. Hui, and Z. Tian (2017) Classification of ct brain images based on deep learning networks. Computer methods and programs in biomedicine 138, pp. 49–56. Cited by: §III-B.
  • [21] M. Golbabaee, D. Chen, P. A. Gómez, M. I. Menzel, and M. E. Davies (2019) Geometry of deep learning for magnetic resonance fingerprinting. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7825–7829. Cited by: §I.
  • [22] F. Guo, M. Ng, M. Goubran, S. E. Petersen, S. K. Piechnik, S. Neubauer, and G. Wright (2020) Improving cardiac mri convolutional neural network segmentation on small training datasets and dataset shift: a continuous kernel cut approach. Medical Image Analysis 61, pp. 101636. Cited by: §II-B, §III-C2.
  • [23] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §II-A.
  • [24] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §III-C1.
  • [25] K. He, X. Zhang, S. Ren, and J. Sun (2016) Identity mappings in deep residual networks. In European conference on computer vision, pp. 630–645. Cited by: §III-C1.
  • [26] M. Heath, K. Bowyer, D. Kopans, R. Moore, and W. P. Kegelmeyer (2000) The digital database for screening mammography. In Proceedings of the 5th international workshop on digital mammography, pp. 212–218. Cited by: §IV-A.
  • [27] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708. Cited by: §III-C1.
  • [28] A. Jalalian, S. B. Mashohor, H. R. Mahmud, M. I. B. Saripan, A. R. B. Ramli, and B. Karasfi (2013) Computer-aided detection/diagnosis of breast cancer in mammography and ultrasound: a review. Clinical imaging 37 (3), pp. 420–426. Cited by: §I.
  • [29] K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, and B. Glocker (2017) Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation. Medical image analysis 36, pp. 61–78. Cited by: §II-B.
  • [30] B. C. Kim, J. S. Yoon, J. S. Choi, and H. I. Suk (2019) Multi-scale gradual integration CNN for false positive reduction in pulmonary nodule detection. Neural Networks. External Links: Document, 1807.10581, ISSN 18792782 Cited by: §III-B.
  • [31] T. Kooi, B. van Ginneken, N. Karssemeijer, and A. den Heeten (2017) Discriminating solitary cysts from soft tissue lesions in mammography using a pretrained deep convolutional neural network. Medical physics 44 (3), pp. 1017–1027. Cited by: §I, TABLE I, §IV-B1.
  • [32] P. Krähenbühl and V. Koltun (2011) Efficient inference in fully connected crfs with gaussian edge potentials. In Advances in neural information processing systems, pp. 109–117. Cited by: §II-B, §II-B, §III-C2.
  • [33] Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. nature 521 (7553), pp. 436. Cited by: §I.
  • [34] R. S. Lee, F. Gimenez, A. Hoogi, K. K. Miyake, M. Gorovoy, and D. L. Rubin (2017) A curated mammography data set for use in computer-aided detection and diagnosis research. Scientific data 4, pp. 170177. Cited by: §IV-A.
  • [35] H. Li, D. Chen, W. H. Nailon, M. E. Davies, and D. Laurenson (2019) A deep dual-path network for improved mammogram image processing. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1224–1228. Cited by: §I.
  • [36] H. Li, D. Chen, W. H. Nailon, M. E. Davies, and D. I. Laurenson (2019) Signed laplacian deep learning with adversarial augmentation for improved mammography diagnosis. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 486–494. Cited by: §I.
  • [37] H. Li, D. Chen, W. H. Nailon, M. E. Davies, and D. Laurenson (2018) Improved breast mass segmentation in mammograms with conditional residual u-net. In Image Analysis for Moving Organ, Breast, and Thoracic Images, pp. 81–89. Cited by: §I, §III-C2, §III-C2, TABLE III, §IV-B2.
  • [38] D. Liang, L. Lin, X. Chen, H. Hu, Q. Zhang, Q. Chen, Y. Iwamoto, X. Han, Y. Chen, R. Tong, and J. Wu (2019)

    Multi-stream scale-insensitive convolutional and recurrent neural networks for liver tumor detection in dynamic ct images

    In 2019 IEEE International Conference on Image Processing (ICIP), Vol. , pp. 794–798. Cited by: §III-B.
  • [39] D. Liu, Z. He, D. Chen, and J. Lv (2020) An improved dual-channel network to eliminate catastrophic forgetting. IEEE Transactions on Systems, Man, and Cybernetics: Systems. Cited by: §III-B.
  • [40] X. Liu, F. Hou, H. Qin, and A. Hao (2018) Multi-view multi-scale cnns for lung nodule type classification from ct images. Pattern Recognition 77, pp. 262–275. Cited by: §III-B.
  • [41] I. C. Moreira, I. Amaral, I. Domingues, A. Cardoso, M. J. Cardoso, and J. S. Cardoso (2012) Inbreast: toward a full-field digital mammographic database. Academic radiology 19 (2), pp. 236–248. Cited by: Fig. 1, §IV-A.
  • [42] A. Oliver, J. Freixenet, J. Marti, E. Perez, J. Pont, E. R. Denton, and R. Zwiggelaar (2010) A review of automatic mass detection and segmentation in mammographic images. Medical image analysis 14 (2), pp. 87–110. Cited by: §I, §I, §III-B.
  • [43] R. B. Potts (1952) Some generalized order-disorder transformations. In Mathematical proceedings of the cambridge philosophical society, Vol. 48, pp. 106–109. Cited by: §III-C2.
  • [44] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §III-C2, TABLE III.
  • [45] S. Shams, R. Platania, J. Zhang, J. Kim, and S. Park (2018) Deep generative breast cancer screening and diagnosis. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 859–867. Cited by: §I.
  • [46] L. Shen, L. R. Margolies, J. H. Rothstein, E. Fluder, R. McBride, and W. Sieh (2019) Deep learning to improve breast cancer detection on screening mammography. Scientific reports 9 (1), pp. 1–12. Cited by: §I, §III-C1.
  • [47] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §III-C1.
  • [48] V. K. Singh, H. A. Rashwan, S. Romani, F. Akram, N. Pandey, M. M. K. Sarker, A. Saleh, M. Arenas, M. Arquez, D. Puig, et al. (2020) Breast tumor segmentation and shape classification in mammograms using generative adversarial and convolutional neural network. Expert Systems with Applications 139, pp. 112855. Cited by: TABLE III.
  • [49] B. Swiderski, S. Osowski, J. Kurek, M. Kruk, I. Lugowska, P. Rutkowski, and W. Barhoumi (2017) Novel methods of image description and ensemble of classifiers in application to mammogram analysis. Expert Systems with Applications 81, pp. 67–78. Cited by: §I, §I.
  • [50] C. Varela, S. Timp, and N. Karssemeijer (2006) Use of border information in the classification of mammographic masses. Physics in medicine & biology 51 (2), pp. 425. Cited by: §I, §III-C1.
  • [51] R. Wang, Y. Ma, W. Sun, Y. Guo, W. Wang, Y. Qi, and X. Gong (2019) Multi-level nested pyramid network for mass segmentation in mammograms. Neurocomputing 363, pp. 313–320. Cited by: TABLE III.
  • [52] Z. Wang, X. Liu, H. Li, L. Sheng, J. Yan, X. Wang, and J. Shao (2019)

    Camp: cross-modal adaptive message passing for text-image retrieval

    In Proceedings of the IEEE International Conference on Computer Vision, pp. 5764–5773. Cited by: §III-B.
  • [53] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. Torr (2015) Conditional random fields as recurrent neural networks. In Proceedings of the IEEE international conference on computer vision, pp. 1529–1537. Cited by: §III-C2.
  • [54] Z. Zheng, L. Zheng, M. Garrett, Y. Yang, M. Xu, and Y. Shen (2020) Dual-path convolutional image-text embeddings with instance loss. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 16 (2), pp. 1–23. Cited by: §III-B.
  • [55] W. Zhu, Q. Lou, Y. S. Vang, and X. Xie (2017) Deep multi-instance networks with sparse label assignment for whole mammogram classification. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 603–611. Cited by: §I.
  • [56] W. Zhu, X. Xiang, T. D. Tran, G. D. Hager, and X. Xie (2018) Adversarial deep structured nets for mass segmentation from mammograms. In Biomedical Imaging (ISBI 2018), 2018 IEEE 15th International Symposium on, pp. 847–850. Cited by: §II-B, TABLE III.