Breast Mass Segmentation and Shape Classification in Mammograms Using Deep Neural Networks

09/05/2018 ∙ by Vivek Kumar Singh, et al. ∙ Universitat Rovira i Virgili 0

Mammogram analysis to manually extract breast masses is a tough assignment that radiologists must frequently carry out. Therefore, image analysis methods are needed for the detection and delineation of breast masses, which portray crucial morphological information that will support reliable diagnosis. In this paper, we proposed a conditional Generative Adversarial Network (cGAN) devised to segment a breast mass within a region of interest (ROI) in a mammogram. The generative network learns to recognize the breast mass area and to create the binary mask that outlines the breast mass. In turn, the adversarial network learns to distinguish between real (ground truth) and synthetic segmentations, thus enforcing the generative network to create binary masks as realistic as possible. The cGAN works well even when the number of training samples are limited. Therefore, the proposed method outperforms several state-of-the-art approaches. This hypothesis is corroborated by diverse experiments performed on two datasets, the public INbreast and a private in-house dataset. The proposed segmentation model provides a high Dice coefficient and Intersection over Union (IoU) of 94 Convolutional Neural Network (CNN) is proposed to classify the generated masks into four mass shapes: irregular, lobular, oval and round. The proposed shape descriptor was trained on Digital Database for Screening Mammography (DDSM) yielding an overall accuracy of 80 state-of-the-art.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Breast cancer is the most common diagnosed cause of death from cancer in women in the world siegel2017cancer . Mammography is a world recognized tool that has been proven effective to reduce the mortality rate, since it allows early detection of breast diseases lauby2015breast .

Breast masses are the most important findings among diverse types of breast abnormalities, such as micro-calcification and architectural distortion. All these findings may point out the presence of carcinomas rangayyan2010computer . Moreover, morphological information of tumor shape (irregular, lobular, oval and round) and margin type (circumscribed, ill defined, spiculated and obscured) also play crucial roles in the diagnosis of tumor malignancy tang2009computer .

Computer aided diagnosis (CAD) systems are highly recommended to assist radiologists in detecting breast tumors and outlining their borders. However, breast tumor segmentation and classification are still challenges due to low signal-to-noise ratio and variability of tumors in shape, size, appearance, texture and location. Recently, many studies based on deep representation of breast images and combining features have been proposed to improve performance on breast mass classification jiao2018parasitic .

In addition, based on mammographic images, it is very complicated for an expert radiologist to discern the molecular subtypes, i.e., Luminal-A, Luminal-B, HER-2 (Human Epidermal growth factor receptor 2) and Basal-like (triple negative), which are key for prescribing the best oncological treatment cho2016molecular , liu2016there , tamaki2011correlation . However, recent studies point out some loose correlations between visual tumor features (e.g., texture and shape) and molecular subtypes. Recently, a Convolutional Neural Network (CNN) was used to classify molecular subtypes using texture patches extracted from mammography singh2017classification , which yielded an overall accuracy of 67%. However, depending only on texture feature is not sufficient to classify the breast cancer molecular subtypes from mammograms tamaki2011correlation . Thus, some studies attempt to use morphological information of tumor shape in classifying breast cancer molecular subtypes.

Figure 1: General framework of breast tumor segmentation and shape classification.

Consequently, in this paper, a method of two stages of breast tumor segmentation and shape classification is proposed as shown in Figure 1. In the first stage, our method segments the breast tumor as a binary mask. In the second stage, the binary mask is classified to a shape type (irregular, lobular, oval and round). Unlike traditional object classifiers kisilev2015semantic , kim2018icadx that use texture, intensity or edge information, our method is forced to learn only morphological features from the binary masks. To be more specific, we present a thorough improvement of our previous work singh2018conditional . The major contributions of this paper are as follows:

  1. We believe this is the first adaptation of cGAN in the area of breast tumor segmentation in mammograms. The adversarial network yields more reliable learning than other state-of-the-art algorithms since training data is scarce (i.e., mammograms with labeled breast tumor boundaries), while it does not increase the computational complexity at prediction time.

  2. The implementation of a multi-class CNN architecture to predict the four breast tumor shapes (i.e., irregular, lobular, oval and round) using the binary mask segmented in the previous stage (cGAN output).

  3. An in-depth evaluation of our system’s performance using two public (1,274 images) and one private (300 images) databases. The obtained results outperform current state-of-the-art in both tumor segmentation and shape classification.

  4. A study of the correlation between the tumor shape and molecular subtypes of breast cancer is also provided.

This paper is organized as follows. Section II provides the related work of both tumor segmentation and shape classification. The proposed architectures for tumor segmentation (using cGAN) and shape classification (using CNN) are described in Section III. In Section IV, extensive experiments are performed on the two stages of the proposed method and the obtained results are compared with the state-of-the-art results. In addition, the limitations of the proposed models are explained in Section IV. Finally, Section V concludes our work and suggests some future lines of research.

2 Related Work

2.1 Tumor Segmentation Background

Convolutional Neural Networks (CNNs) can automatically learn features from the given images to represent objects at different scales and orientations. By increasing the number of layers (depth of CNN model) more detailed features can be obtained, which play crucial part in solving different computer vision problems, such as object detection, classification and segmentation. Thus, numerous methods has been proposed to solve the image segmentation problem based on deep learning approaches 

schmidhuber2015deep . One of the well-known architectures for semantic segmentation is the Fully Convolutional Network (FCN) long2015fully , which is based on encoding (convolutional) and decoding (deconvolutional) layers. This approach gets rid of the fully connected layers of CNNs to convert the image classification networks into image filtering networks. An improvement of this scheme was proposed by the U-Net architecture ronneberger2015u , where skip connections between encoding and decoding layers are added to retain significant information from the input features. Later on, a new variation of FCN was proposed badrinarayanan2017segnet

named SegNet, which consists of hierarchy of decoders, each one corresponding to each encoder. The decoder network uses the max-pooling indices received from the corresponding encoder to perform non-linear upsampling of their input feature maps.

Since semantic segmentation has achieved great progress with deep learning, there is recent popularity in applying such models to medical imaging litjens2017survey . For instance, to segment skin lesions on dermoscopic images, the SLSDeep model sarker2018slsdeep was proposed to upscale the feature maps from the encoding layers at multi-scale to preserve small details (e.g., lesion borders). In fu2018joint , a multi-scale deep model with multi-level loss was proposed for segmenting optic disk and cup in Fundus images. Also, singh2018retinal proposed a GAN to segment the optic disc from fundus image. Many segmentation approaches can be trained from scratch tajbakhsh2016convolutional but also can reuse the weights obtained for the starting CNN layers of other architectures such as ResNet he2016deep and VGG simonyan2014very

trained on ImageNet data

deng2009imagenet .

Regarding breast tumor segmentation, many works have been proposed. A tumor classification and segmentation method was proposed rouhi2015benign using an automated region growing algorithm whose threshold was obtained by a trained Artificial Neural Network (ANN) and Cellular Neural Network (CeNN). In turn, to reduce the computational complexity and increase the robustness, a quantized and non-linear CeNN for breast tumor segmentation was proposed in liu2018efficient

. After segmenting the breast tumor region, a Multilayer Perceptron Classifier was used for tumor classification as benign or malignant.

Furthermore, Dhungel et al. dhungel2015deep

segmented breast tumors using Structured Support Vector Machines (SSVM) and Conditional Random Fields (CRF). Both graphical models minimize a loss function build on pixel probabilities provided by a CNN and Deep Belief Network, a Gaussian Mixture Model (GMM) and shape prior. The SSVM is based on graph cuts and the CRF relies on tree re-weighted belief propagation with truncated fitting training  

dhungel2015tree . Cardoso et al. cardoso2015 ; cardoso2017mass tackled the same problem by employing a closed contour fitting in the mammogram and minimizing a cost function depending on the radial derivative of the tumor contour. A measure of regularity of the gray pixel values inside and outside the tumor was also included in cardoso2017mass .

In turn, Zhu et al. zhu2018adversarial proposed an FCN concatenated to a CRF layer to impose the compactness of the segmentation output taking into account pixel position. This approach was trained end-to-end, since the CRF and FCN can exchange data in the forward-backward propagation. An adversarial term was introduced to prevent the samples with the worst perturbation in the loss function, which reduced the overfitting and provided a robust learning with few training samples. In addition, Al-antari et al. al2018fully proposed a CAD system consisting of three deep learning stages for detecting, segmenting and classifying the tumors in mammographic images. To locate tumors in a full mammogram, the YOLO network proposed in redmon2016you was used. A Full resolution Convolutional Network (FrCN) was then used for segmenting the located tumor region. Finally, a CNN network was used for classifying segmented ROI as either benign or malignant.

We believe that yang2017automatic is the first work that exploits GAN goodfellow2014 for medical image segmentation. In particular, they performed three-dimensional (3D) liver segmentations using abdominal computerized tomography (CT) scans. In singh2018conditional

, we adapted a cGAN image-to-image translation algorithm

isola2017image to address the tumor segmentation in two-dimensional (2D) mammograms. Our system provided state-of-the-art performance on both public and private databases.

2.2 Shape Classification Background

In the literature, many approaches based on deep learning architectures have been designed recently for 2D and 3D shape classification kurnianggoro2018survey . For example, topological data analysis (TDA) using deep learning was proposed in hofer2017deep to extract relevant 2D/3D topological and geometrical information. In turn, a CNN model was formulated, which used spectral graph wavelets in conjunction with the Bag of Features paradigm to target the shape classification problem masoumi2017 . In addition, the authors in fang20153d proposed a CNN based shape descriptor for retrieving the 3D shapes. A deep neural network named PointNet was proposed qi2017pointnet , which directly consumes point cloud for object classification, localized and global semantic segmentation. Moreover, a deep learning framework for efficient 3D shape classification luciano2018deep

used geodesic moments by inheriting various properties from the geodesic distance, such as the intrinsic geometric structure of 3D shapes and the invariance to isometric deformations.

To date, numerous shape classification methods are applied for medical image analysis litjens2017survey . Fourier shape descriptors with a CNN were used xie2018fusing to characterize the lung nodules heterogeneity in CT scans. A CNN architecture coupled with neighboring ensemble predictor invariant to the neighborhood was proposed sirinukunwattana2016locality for nucleus detection and classification in histological images.

An automated method for textual description of anatomical breast tumor lesions was proposed by Kisilev et al. kisilev2015semantic

, which performs joint semantic estimation from image measurements to classify the tumor shape. In addition, Kisilev et al.

kisilev2016medical also presented a multi-task fast region-based CNN ren2015faster to classify three tumor shapes: irregular, oval and round. Furthermore, the work in kim2018icadx utilized a GAN to diagnose and classify tumors in mammograms into four shapes: irregular, lobular, oval and round. Previously, Singh et al. singh2018conditional proposed a multi-class CNN to categorize the tumor shapes into four classes as in kim2018icadx from the public dataset DDSM111

3 Proposed Methodology

The proposed CAD system shown in Fig. 1 is divided into two stages: breast tumor segmentation and shape classification. In the first stage, mammograms are pre-processed for noise removal (Gaussian filter with = 0.5) and then contrast is enhanced using histogram equalization (pixel values are rescaled between [0..1]). Afterwards, the cGAN input is prepared by rescaling the image crops to 256256 pixels containing different framing of the breast tumor region (ROI): full mammogram, loose and tight frames (see Fig. 6). The prepared data is then fed to the cGAN to obtain a binary mask of the breast tumor, which is post-processed using morphological operations (33 closing, 22 erosion, and 33 dilation) to remove small speckles. In the second stage, the output binary mask is downsampled into 6464 pixels, which is then fed to a multi-class CNN shape descriptor to categorize it into four classes: irregular, lobular, oval and round.

3.1 Tumor Segmentation Model (cGAN)

Figure 2: Proposed cGAN architecture: generator G (top), and discriminator D (down).

Our previous work singh2018conditional demonstrated the feasibility of applying the cGAN image-to-image translation approach isola2017image to breast tumor segmentation, since it can be adapted to our problem in the following senses:

  1. The Generator network of the cGAN is an FCN composed of encoding and decoding layers, which learn the intrinsic features (gray-level, texture, gradients, edges, shape, etc.) of healthy and unhealthy (tumor) breast tissue, and generate a binary mask according to these features.

  2. The Discriminative network of the cGAN assesses if a given binary mask is likely to be a realistic segmentation or not. Therefore, including the adversarial score in the computation of the generator loss strengthens its capability to provide a correct segmentation.

The combination of and networks allows robust learning with few training samples. Since the ROI image is a conditioning input for both and , the segmentation result is better fitted to the tumor appearance. Otherwise, regular (unconditional) GAN goodfellow2014 will infer the segmentation just from random noise, which will require more training iterations compared to the cGAN to obtain an acceptable segmentation result.

Fig. 2 represents the suggested architectures for and . The former consists of several encoding and decoding layers (see Fig. 2

-top). Encoding layers are composed of a set of convolutional filters followed by batch normalization and the leaky ReLU (slope

) activation function. Similarly, decoding layers are composed of a set of deconvolutional filters followed by batch normalization, dropout and ReLU.

Convolutional and deconvolutional filters are defined with a kernel of 4

4 and stride of 2

2, which respectively downsample and upsample the activation maps by a factor of 2. Batch normalization is not applied after the first and the last convolutional filters ( and ). After , the ReLU activation function is applied instead of leaky ReLU. Dropout is applied only at the first three decoding layers (, and ). There is no skip connection in the last decoding layer (), after which the activation function is applied to generate a binary mask of the breast tumor.

The architecture of shown in Fig. 2-down consists of five encoding layers with convolutional filters with a kernel of 44, stride 22 at the first three layers and stride 11 at 4 and 5 layers. Batch normalization is applied after , and and a leaky ReLU (slope ) is applied after each layer except for the last one. The sigmoid activation function is used after the last convolutional filter (). The network input is the concatenation of the ROI and the binary mask to be evaluated (ground truth or predicted). The output segmentation is an array of 3030 values, each one from (completely fake) to (perfectly plausible or real). Each output value is the degree of proper segmentation likelihood of a crop of the binary mask and the input image, which corresponds to a 7070 receptive field for each value.

Figure 3: Proposed cGAN framework based on dice and BCE losses.

Let be a tumor ROI, the ground truth mask,

a random variable,

an empirical weighting factor, and the outputs of and , respectively. Then, the loss function of is defined as:


where is introduced as dropout in the decoding layers , and at both training and testing phases, which provides stochasticity to generalize the learning processes and avoid overfitting.

The optimization process of will try to minimize both expected values, i.e., the values should approach to (correct tumor segmentations), and the dice loss should approach to (generated masks are equal to ground truth). Both terms of generator loss enforce the proper optimization of : the dice loss term fosters a rough prediction of the mask shape (central tumor area) while the adversarial term fosters an accurate prediction of the mask outline (tumor borders). Neglecting one of the two terms may lead to either very poor segmentation results or slow learning speed.

In addition, is the dice loss of the predicted mask with respect to ground truth, which is defined as:


where is the pixel wise multiplication of the two images and is the total sum of pixel values of a given image. If inputs are binary images, then each pixel can be considered as a boolean value (white is / black is ). The formulation in (2) is equivalent to the dice coefficient i.e., , but it must be subtracted from because the loss function will be minimized. Let be the ground truth of the ROI and the segmented region. Then the true positive degree (TP) is defined as , which is the area of the segmented region common in both and . The false positive degree (FP) is defined as , which is the segmented area not belonging to . Similarly, the false negative degree (FN) is defined as , which is the true area missed by the proposed segmentation method.

In our previous work singh2018conditional , the generator network loss was formulated by combining the logistic Binary Cross Entropy (BCE) loss and the -norm. In this work, we replace the -norm loss with the dice loss as shown in Fig. 3. -norm loss minimizes the sum of absolute differences between the ground truth label and estimated binary mask obtained from the generator network, which takes all pixels into account. In turn, dice loss is highly dependent on TP predictions, which is the most influential term in foreground segmentation. Fig. 7 shows that the dice loss achieves lower values (more optimal) than the -norm loss.

Figure 4: Dice and 1-norm loss comparison over iterations.

Moreover, the loss function of is defined in (3):


The optimizer will fit to maximize the loss values for ground truth masks (by minimizing ) and minimize the loss values for generated masks (by minimizing ). These two terms compute BCE loss using both masks, assuming that the expected class for ground truth and generated masks is and , respectively.

The optimization of and is done concurrently, i.e., one optimization step for both networks at each iteration, where learns how to compute a valid tumor segmentations and learns how to differentiate between synthetic and real segmentations.

In this work, we experimented on different hyper-parameters to improve the segmentation accuracy of our previous contribution in singh2018conditional

. Besides introducing the dice loss, we have reduced the number of filters of each network from 64 to 32. We also explored different learning rates and loss optimizers (SGD, AdaGrad, Adadelta, RMSProp and Adam), finding Adam with

= 0.5, = 0.999 and initial learning rate = 0.0002 with batch size 8 the best combination. In (1), the dice loss weighting factor was found to be the best choice. Finally, the best results were achieved by training both and

from scratch for 150 epochs.

3.2 Shape Classification Model (CNN)

We propose a multi-class CNN architecture for breast tumor shape classification (i.e., irregular, lobular, oval and round) using the binary masks obtained from the cGAN. In the literature, most methods attempted to directly categorize the shape using breast tumor intensity, texture, boundary, etc. (kisilev2015semantic ; kisilev2016medical ; ren2015faster ; kim2018icadx ), which increase computational complexity. We simplify the problem by extracting morphological features from binary masks.

Figure 5: CNN architecture for tumor shape classification.

As shown in Fig. 5, our model consists of three convolutional layers with kernel sizes 99, 55 and 44, respectively, and two fully connected (FC) layers. The first two convolutional layers are followed by 44 max-pooling with stride 4

4. The output of the last convolutional layer is flattened and then fed into the first FC layer with 128 neurons. These four layers use ReLU as activation function. A dropout of 0.5 is used to reduce overfitting in the first FC layer. Finally, the last FC layer with 4 neurons applies the softmax function to generate the final membership degree of the input binary mask to each class. A weighted categorical cross-entropy loss is used to avoid the problem of unbalanced dataset. The class weight is one minus the ratio of samples per class to the total number of samples.

The RMSProp is employed for optimizing the model with learning rate = 0.001, momentum = 0.9 and batch size = 16. The network is trained from scratch and the weights of five layers are randomly initialized. During training, we experimentally found the best architecture, number of layers, filters per layer, and number of neurons in FC layers.

4 Experiments and Discussion

We have evaluated the performance of proposed models on two public mammography datasets and one private dataset:

INbreast dataset222

It is a publicly available database containing a total of cases ( mammograms), which include: masses, calcifications, asymmetries and distortions. For testing our segmentation model, we used breast tumor images along with their respective ground truth binary masks.

DDSM dataset

It is a publicly available digital database for screening mammography containing mammography studies. In this work, cases of breast tumors with their corresponding ground truths are used for shape classification, where , , and tumors are labeled as irregular, lobular, oval and round, respectively. We have used of the images for training and rest for testing the tumor shape classification model.

Hospital Sant Joan de Reus dataset

It is our private dataset that contains malignant tumors (123 Luminal-A, 107 Luminal-B, 33 Her-2 and 37 Basal-like) with their respective ground truth binary masks obtained by radiologists. The proposed cGAN segmentation model is trained and tested using and images, respectively. The duty of confidentiality and security measures were fully complied, in accordance with the current legislation on the Protection of Personal Data (article 7.1 of the Organic Law 15/1999, 13th of December).

The proposed method was implemented using python with Pytorch

333 running on a 64-bit Ubuntu operating system using a 3.4 GHz Intel Core-i7 with 16 GB of RAM and Nvidia GTX 1070 GPU with 8 GB of video RAM.

4.1 Tumor Segmentation Experiments

The proposed breast tumor segmentation method is compared with the state-of-the-art methods and evaluated both quantitatively and qualitatively. For the quantitative analysis, segmentation accuracy is computed using Dice coefficient (F1 score) and Jaccard index (IoU). In turn, for the qualitative analysis, segmentation results with the their respective ground truth binary masks are compared visually.

These experiments have been carried using three different framing of the tumor ROI: full mammogram, loose and tight frames (see Fig. 6). The ideal CAD system should be able to automatically segment the breast tumor from a full mammogram. However, this is a very difficult task due to high similarity between gray level pixel distributions of healthy and tumorous tissue. Therefore, removing most of non-ROI portions of the image logically helps the model on learning the visual features that differentiate breast tumor from non-tumor areas. The loose frame provides a balanced proportion between the number of pixels of the two classes. The tight frame is intended to evaluate the behavior of the segmentation model when the majority of ROI contains tumor pixels. Experimentally, for detecting the tight frame, we used the deep model Single Shot Detector (SSD), recently proposed in liu2016ssd . In turn, the loose frame is selected by doubling the size of the tight frame in each coordinate (see Fig. 8).

Figure 6: Three cropping strategies: (a) full mammogram, (b) loose frame, (c) tight frame.

The three cropping strategies are evaluated on our cGAN and ten baseline segmentation models, referred as FCN, FCN-ResNet101, UNet, UNet-VGG16, SegNet, SegNet-VGG16, CRFCNN, SLSDeep, cGAN-ResNet101 and cGAN-ResNet101 (Dice Loss). FCN, UNet, SegNet, CNNCRF and proposed cGAN are trained from scratch. FCN-ResNet101, UNet-VGG16, SegNet-VGG16 and cGAN-ResNet101 (with and without Dice loss) are modifications of the original models, where the filters of the starting encoding layers are replaced by the starting convolutional layers of the well-known VGG (16 layers) and ResNet (101 layers) models, which were pre-trained on the ImageNet database. Thus, we loaded the pre-trained weights and fine tuned the network. When using cGAN-ResNet101 isola2017image , we replaced the -norm loss with the Dice loss in the generator loss function to see how the base line model will behave under such change. We called this model cGAN-ResNet101 (Dice loss) to compare the segmentation results with our proposal.

Dice (%) IoU (%)
Dataset Methods Full Loose Tight Full Loose Tight
SLSDeep 79.93
cGAN-ResNet101 (Dice Loss)
Private Proposed cGAN 66.38 89.99 88.12 49.68 81.81
cGAN-ResNet101 (Dice Loss)
INbreast Proposed cGAN 68.69 94.07 92.11 52.31 87.03 84.55
Dhungel et al.dhungel2015deep
Cardoso et al.cardoso2017mass
Zhu et al.zhu2018adversarial
Al-antari et al.al2018fully
Table 1: Dice and IoU metrics obtained with the proposed model and ten alternatives evaluated on the testing sets of our private and INbreast datasets, for the three cropping strategies. Best results are marked in bold. Dashes (-) indicate that results are not reported in referred papers.

The results depicted in Table 1 are divided in two sections, one for our private dataset and another for the INbreast dataset. Note that all models are trained on the private dataset, and then tested using our private dataset as well as the INbreast dataset without fine tuning.

According to the results, our method outperforms the compared state-of-the-art methods in all cases except for the IoU computed on tight crops of our private dataset. The SLSDeep approach yielded the best IoU (), whereas our method yielded the second best result ().

All models yielded their worst segmentation results for full mammograms compared to other frame inputs, which is logical taking into account the difficulties stated earlier in this section. Most of the models have obtained their best results for the tight frame crops except for CRFCNN and our proposal, which yielded their best results for loose frame crops. However, the good results for tight crops may be due to the imbalance of tumor/non-tumor pixels, since the former class is present in more than of the image area. The learning can be biased towards this class, which makes rough solutions (almost everything is tumor) to provide very high ranks of performance. Loose frame crops, on the contrary, have a more balanced proportion of pixels for both classes, which makes them ideal to learn and evaluate the model on a realistic situation: it is more convenient for radiologists to provide a fast frame drawing around the breast tumor rather than a tight frame.

Comparing the general results for both datasets, most methods performed better on INbreast rather than on private dataset with loose and tight framing. This effect can be explained by the fact that INbreast provides more detailed ground truths, which leads to better testing results, despite all network training has been conducted on our private dataset.

In general, our proposal has performed well in terms of both Dice and IoU metrics. For private dataset, in Dice/Loose frame column, our model’s percentage () is almost above the second best model, SegNet-VGG16 (). In the IoU/Loose frame column, our model’s percentage () is almost above the second best model, cGAN-ResNet101 (). For INbreast dataset, our Loose frame results for Dice and IoU are again the best (, ), where cGAN-ResNet101 is the second best model for both metrics (, ). Thus, our model provides an improving of and , respectively. The fact that the second best results are obtained by the cGAN-ResNet101 model indicates that the adversarial network really helps in training the generative network. In turn, the results obtained by the cGAN-ResNet101 (Dice Loss) mixture model are in-between the cGAN-ResNet101 and our proposal, since the Dice loss term substitution improves the accuracy of tumor segmentations.

For the INbreast dataset, we have included the results mentioned in four related papers dhungel2015deep , cardoso2017mass , zhu2018adversarial and al2018fully . For these methods, we could not compute the metrics for all columns, since they have not released their source code. Our method outperformed the first three papers under similar framework conditions. However, al2018fully yielded better results for dice () and IoU () than our model in the Tight frame columns. Our results in the Loose frame columns surpass their results. For a fair comparison, however, it should be checked how the referenced methods would perform on loose frame crops.

Figure 7: Boxplot of dice (Top) and IoU(Bottom) score over five models compared to proposed cGAN on loose frames of the test subset of INbreast dataset (106 samples).

The box-plot in Figure 7

shows Dice and IoU values obtained for the 106 testing samples from INbreast dataset with loose frames using FCN-ResNet101, Unet-VGG16, SegNet-VGG16, SLSDeep, cGAN-ResNet101 and proposed cGAN. The two models based on cGAN provide small ranges of Dice and IoU values. For instance, the proposed cGAN is in the range 0.89 to 0.93 for Dice coefficient and 0.80 to 0.91 for IoU values, while other deep segmentation methods, SLSDeep, Unet-VGG16 and FCN-ResNet101, show a wider range of values. Moreover, there are many outliers in the results for the segmentation based on the cGAN using pre-trained ResNet101 layers, while using our cGAN trained from scratch there are few number of outliers.

Figure 8: Segmentation results of two testing samples extracted from the INbreast dataset with the three cropping strategies.

The high Dice and IoU metrics obtained by our model empirically support our hypothesis that it achieves accurate tumor segmentation. In Fig. 8, we show some examples of our model’s segmentations using two tumors from the INbreast dataset by applying all three cropping strategies. For each experiment, we show the original ROI image and the comparison of predicted and ground truth mask, color coded to mark up the true positives (TP:yellow), false negatives (FN:red), false positives (FP:green) and true negatives (TN:black). For the full mammogram, the ROI image (1) is an example of good segmentation, since yellow and black pixels depict a high degree of confidence between predicted and real masks. On the contrary, the ROI image (2) is an example of poor segmentation, since red pixels mark up a high portion of the breast tumor area that has been misclassified as healthy area (FN). At the same time, a tiny region of green pixels shows the misclassification of healthy tissue as breast tumor area (FP). Nevertheless, even in this second segmentation, there is a very high rate of black pixels (TN), which indicates that the model easily recognizes non-tumor areas.

In the loose frame segmentations (middle row), specially with example (2), the results contain very few FN and FP pixels. For example (1), a modest amount of green pixels indicate that our model expands the tumor segmentation beyond its respective ground truth. In the tight frame crops (bottom row), besides the green areas, our model also has missed some tumor areas i.e., the red pixels (FN). The mistaken areas (red and green) are mostly around the tumor borders, since these areas have a mixture of healthy and unhealthy cells. At the same time, the inner part of the tumor as well as the image regions outside of tumors are properly classified, which indicates the stability of our model.

Figure 9: Segmentation results of seven models with the INbreast dataset and two cropping strategies: loose frame (the first four rows) and tight frame (the last four rows). (Col 1) original images, (Col 2) FCN-ResNet101, (Col 3) UNet-VGG16, (Col 4) SegNet-VGG16, (Col 5) CRFCNN, (Col 6) SLSDeep, (Col 7) cGAN-ResNet101, and (Col 8) proposed cGAN.

Fig. 9 shows a comparison between our and other six segmentation models, which worked on loose and tight frame crops using four tumors from the INbreast dataset. For the loose frame cases (four top rows), our method clearly outperforms the rest for all tumors except for the second one, where the majority of models provided a similar degree of accuracy. In these four tumors, UNet-VGG16 and CRFCNN provided the worst results. Moreover, cGAN-ResNet101 also performed bad in the fourth example.

For the tight frame cases (four bottom rows), our method also provides the lowest degrees of FN and FP compared to the rest of the models. Our cGAN and the cGAN-ResNet101 model yield irregular borders compared to FCN-ResNet101 and SLSDeep, since GAN models strive for higher accuracy on edges. However, in the third tight frame sample (seventh row), both cGAN-ResNet101 and our proposal generated an irregular border that slightly differs from the smooth ground truth border, which results in lower segmentation accuracy around the edges. Although the rest of the models generate smoother borders, the resulting segmentations may differ from the ground truth significantly.

From the experimental results, it can be concluded that the proposed breast tumor segmentation method is the most effective to date compared to the currently available state-of-the-art methods. However, our method needs a loose crop around the tumor to obtain a proper segmentation, which can be done by the SSD model. Our segmentation model contains about parameters for tuning the generator part in the cGAN network. In addition, our method is fast in both training i.e., around seconds per epoch ( loose frames) and predicting, around images per second. That is to times faster than the segmentation method proposed in al2018fully and 10 to 15 times faster than the FCN model.

4.2 Shape Classification Experiments

For validating the tumor shape classification performance, we computed the confusion matrix and the overall classification accuracy on the test set of the DDSM dataset. This set contains 292 images divided into 126, 117, 31 and 18 for irregular, lobular, oval and round classes, respectively.

For a quantitative comparison, we compared our model with three state-of-the-art tumor shape classification methods singh2018conditional ; kisilev2015semantic ; kim2018icadx . The three methods were evaluated on the DDSM dataset.

Prediction /
Ground Truth
Irregular Lobular Oval Round Total
Table 2: Confusion matrix of the tumor shape classification of testing samples of the DDSM dataset.

However, The DDSM dataset does not have the ground truth binary masks for the breast tumor segmentation. Thus, we applied active contours Akram2015 , which was also used in our previous work singh2018conditional , to generate the ground truths of the breast tumor regions that were cropped by radiologists. Previously, kisilev2015semantic also used active contours lankton2008localizing to generate the ground truths in a similar fashion. In addition, for reliable performance results, we used a stratified 5 fold cross validation with epochs per fold.

In Table 2, the proposed method yielded around of classification accuracy for irregular and lobular classes. This result is logical, since both lobular and irregular shapes have similar irregular boundaries. In turn, our model yielded classification accuracies of and for oval and round shape classes, respectively.

Figure 10: Mean ROC curve of 5 folds, for TPR and FPR from shape classification result of 292 test images from DDSM dataset.

We have computed the overall accuracy of each method by averaging the correct predictions (i.e., true positive) of the four classes, weighted with respect to the number of samples per class. As shown in Table 3, our classifier yields an overall accuracy of , outperforming the second best results kim2018icadx ; singh2018conditional by . In turn, Multi-task CNN kim2018icadx based on a pre-trained VGG-16 yielded the worst overall accuracy (), probably because the input mammograms are gray-scale images, while the VGG-16 network was trained on color-scale images. In addition, Fig. 10 shows ROC curve illustrating that our model attained AUC about 0.8.

Furthermore, the proposed shape descriptor contains 767,684 parameters, which can be trained in less than a second per epoch, and predict in about 6 milliseconds per image.

Methods Accuracy (%)
Kisilev et al. (SSVM) kisilev2015semantic  
Kim et al. (Multi-task CNN) kim2018icadx  
Kim et al. (ICADx) kim2018icadx  
Singh et al. singh2018conditional  
Proposed  80
Table 3: Shape classification overall accuracy with the DDSM dataset resulting from  kisilev2015semantic ; kim2018icadx ; singh2018conditional and our model. Best result is marked in bold.

4.3 Shape Features Correlation to Breast Cancer Molecular Subtypes

Shape Classes /
Molecular Subtypes
Irregular Lobular Oval Round Total
Table 4: Distribution of breast cancer molecular subtypes samples from the hospital dataset with respect to its predicted mask shape.

Tumor shape could play an important role to predict the breast cancer molecular subtypes tamaki2011correlation . Thus, we have computed the correlation between breast cancer molecular subtypes classes of our in-house private dataset with the four shape classes. As shown in Table 4, most of Luminal-A and -B samples (i.e., 96/123 and 82/107 for Luminal-A and -B, respectively) are mostly assigned to irregular and lobular shape classes. In turn, oval and round tumors give indications to the Her-2 and Basal-like samples, (i.e., 23/33 and 22/37 for Her-2 and Basal-like, respectively). Moreover, some images related to Basal-like are moderately assigned to the lobular class. Afterwards, from the visual inspection, if the tumor shape is irregular or lobular then radiologist can suspect that it belongs to the Luminal group. In turn, if the tumor shape is round or oval then it is more probable that the tumor is a Her-2 or Basal-like tamaki2011correlation . Therefore, this study shows the importance of tumor shape, which can be considered as a key feature to distinguish between different malignancies of breast cancer.

4.4 Limitations

For the segmentation stage, our model has two limitations. The first one is that a prior information about the tumor location must be provided in order to center the tumor inside a loose frame crop for obtaining the best accuracy. To alleviate this requisite, we propose to use the deep SSD model to localize the tumor region to have a complete automatic process, instead of a radiologist manually detecting the tumor region. The second limitation is that our cGAN model is prepared to segment tumors fully contained in the ROI, otherwise, the model fails to segment it. As shown in Fig. 11, we found three samples that are mis-segmented because they contained two tumors, the one in the center, which is properly segmented, and another that is shown partially in the left-down border of the image, which is wrongly ignored as non-tumor region (FN). When the bigger tumor is located in the center of the crop, nevertheless, it is correctly segmented.

Figure 11: Three mis-segmented of non-full tumor shapes with INbreast dataset. The red part in the down-left border.

To classify the tumor shape, we depend only on the DDSM dataset to train our model, since it is the only public dataset that has the shape classification information. Thus, more databases containing more samples are required to improve the classification accuracy of four shape classes.

To study the molecular subtypes of breast cancer, Her-2 and Basal-like classes have less samples compared to the other two classes, Luminal-A and Luminal-B. Indeed, we used a weighted loss function to train our shape classification model in order to make a balance between the four classes. However, we anticipate that, by increasing the samples related to the Her-2 and Basal-like classes, we will improve the prediction of molecular subtypes from tumor shape information.

5 Conclusion

In this paper, we propose a two stage breast tumor segmentation and classification method, which first segments the breast tumor ROI using a cGAN and then classify its binary mask using a CNN based shape descriptor.

The segmentation results reveal the importance of the adversarial network in the optimization of the generative network. cGAN-ResNet101 shows an improvement of about to in both Dice and IoU metrics in comparison to the other non-GAN methods. In turn, the proposed method yields an increment of about over the results of cGAN-ResNet101 by training our model from scratch, and replacing the -norm with the dice loss using loose frame crop on the given datasets. The breast tumor segmentation from full-mammograms yields low segmentation accuracy for all models including the proposed cGAN. For the tight frame crop, the proposed cGAN yields similar or better segmentation accuracy compared to the other methods.

The classification results show that our second stage properly infers the tumor shape from the binary mask of the breast tumor, which was obtained from the first stage (cGAN segmentation). Hence, we have empirically shown that our CNN is focusing its learning on the morphological structure of the breast tumor, while the rest of approaches (kisilev2015semantic , kim2018icadx , kisilev2016medical , ren2015faster ) rely on the original pixel variations of the input mammogram to make the same inference. Moreover, in al2018fully they used a hybrid strategy in which they include the pixel variability within the mask of breast tumor region to retain the intensity and texture information. However, the superior performance obtained by our method supports our initial idea that the second stage CNN can reliably recognize the tumor shape based only on morphological information.

Furthermore, this paper provided a study of correlation between the tumor shape and the molecular subtypes of the breast cancer. Most samples of the Luminal-A and -B group are assigned to irregular shapes. In turn, the majority of Her-2 and Basal-like samples are assigned to regular shapes (e.g., oval and round shapes). That gives an indication that the tumor shape can be considered for inferring the molecular subtype of the tumor.

Future work aims at refining our multi-stage framework to detect other breast tumor features (i.e., margin type, micro-calcifications), which will be integrated into a more comprehensive diagnostic to compute the degree of malignancy of the breast tumors.

Conflict of interest

The authors declare that there is no conflict of interest.


  • (1) R. L. Siegel, K. D. Miller, A. Jemal, Cancer statistics, 2017, CA: a cancer journal for clinicians 67 (1) (2017) 7–30.
  • (2) B. Lauby-Secretan, C. Scoccianti, D. Loomis, L. Benbrahim-Tallaa, V. Bouvard, F. Bianchini, K. Straif, Breast-cancer screening-viewpoint of the IARC working group, New England Journal of Medicine 372 (24) (2015) 2353–2358.
  • (3) R. M. Rangayyan, S. Banik, J. L. Desautels, Computer-aided detection of architectural distortion in prior mammograms of interval cancer, Journal of Digital Imaging 23 (5) (2010) 611–631.
  • (4) J. Tang, R. M. Rangayyan, J. Xu, I. El Naqa, Y. Yang, Computer-aided detection and diagnosis of breast cancer with mammography: recent advances, IEEE Transactions on Information Technology in Biomedicine 13 (2) (2009) 236–251.
  • (5)

    Z. Jiao, X. Gao, Y. Wang, J. Li, A parasitic metric learning net for breast mass classification based on mammography, Pattern Recognition 75 (2018) 292–301.

  • (6) N. Cho, Molecular subtypes and imaging phenotypes of breast cancer, Ultrasonography 35 (4) (2016) 281.
  • (7) S. Liu, X.-D. Wu, W.-J. Xu, Q. Lin, X.-J. Liu, Y. Li, Is there a correlation between the presence of a spiculated mass on mammogram and luminal a subtype breast cancer?, Korean journal of radiology 17 (6) (2016) 846–852.
  • (8) K. Tamaki, T. Ishida, M. Miyashita, M. Amari, N. Ohuchi, N. Tamaki, H. Sasano, Correlation between mammographic findings and corresponding histopathology: potential predictors for biological characteristics of breast diseases, Cancer science 102 (12) (2011) 2179–2185.
  • (9)

    V. K. Singh, S. Romani, J. Torrents-Barrena, F. Akram, N. Pandey, M. M. K. Sarker, A. Saleh, M. Arenas, M. Arquez, D. Puig, Classification of breast cancer molecular subtypes from their micro-texture in mammograms using a vggnet-based convolutional neural network, in: Recent Advances in Artificial Intelligence Research and Development: Proceedings of the 20th International Conference of the Catalan Association for Artificial Intelligence, Deltebre, Terres de L’Ebre, Spain, October 25-27, 2017, Vol. 300, IOS Press, 2017, p. 76.

  • (10) P. Kisilev, E. Walach, S. Y. Hashoul, E. Barkan, B. Ophir, S. Alpert, Semantic description of medical image findings: structured learning approach, in: Proceedings of the British Machine Vision Conference (BMVC), 2015, pp. 171.1–171.11.
  • (11) S. T. Kim, H. Lee, H. G. Kim, Y. M. Ro, ICADx: Interpretable computer aided diagnosis of breast masses, in: Proceedings of the SPIE - Medical Imaging 2018: Computer-Aided Diagnosis, Vol. 10575, 2018.
  • (12) V. K. Singh, S. Romani, H. A. Rashwan, F. Akram, N. Pandey, M. M. K. Sarker, S. Abdulwahab, J. Torrents-Barrena, A. Saleh, M. Arquez, M. Arenas, D. Puig, Conditional generative adversarial and convolutional networks for x-ray breast mass segmentation and shape classification, in: Medical Image Computing and Computer Assisted Intervention - MICCAI 2018 - 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part II, 2018, pp. 833–840. doi:10.1007/978-3-030-00934-2_92.
  • (13) J. Schmidhuber, Deep learning in neural networks: An overview, Neural networks 61 (2015) 85–117.
  • (14) J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2015, pp. 3431–3440.
  • (15) O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: Proceedings of the International Conference on Medical image computing and computer-assisted intervention (MICCAI), 2015, pp. 234–241.
  • (16) V. Badrinarayanan, A. Kendall, R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (12) (2017) 2481–2495.
  • (17) G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. van der Laak, B. Van Ginneken, C. I. Sánchez, A survey on deep learning in medical image analysis, Medical image analysis 42 (2017) 60–88.
  • (18) M. M. K. Sarker, H. A. Rashwan, F. Akram, S. F. Banu, A. Saleh, V. K. Singh, F. U. H. Chowdhury, S. Abdulwahab, S. Romani, P. Radeva, D. Puig, SLSDeep: Skin lesion segmentation based on dilated residual and pyramid pooling networks, in: Medical Image Computing and Computer Assisted Intervention - MICCAI 2018 - 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part II, 2018, pp. 21–29. doi:10.1007/978-3-030-00934-2_3.
  • (19) H. Fu, J. Cheng, Y. Xu, D. W. K. Wong, J. Liu, X. Cao, Joint optic disc and cup segmentation based on multi-label deep network and polar transformation, arXiv preprint arXiv:1801.00926.
  • (20) V. K. Singh, H. Rashwan, F. Akram, N. Pandey, M. Sarker, M. Kamal, A. Saleh, S. Abdulwahab, N. Maaroof, S. Romani, et al., Retinal optic disc segmentation using conditional generative adversarial network, arXiv preprint arXiv:1806.03905.
  • (21) N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall, M. B. Gotway, J. Liang, Convolutional neural networks for medical image analysis: Full training or fine tuning?, IEEE transactions on medical imaging 35 (5) (2016) 1299–1312.
  • (22) K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
  • (23) K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556.
  • (24) J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale hierarchical image database, in: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, Ieee, 2009, pp. 248–255.
  • (25) R. Rouhi, M. Jafari, S. Kasaei, P. Keshavarzian, Benign and malignant breast tumors classification based on region growing and cnn segmentation, Expert Systems with Applications 42 (3) (2015) 990–1002.
  • (26) Z. Liu, C. Zhuo, X. Xu, Efficient segmentation method using quantised and non-linear cenn for breast tumour classification, Electronics Letters.
  • (27) N. Dhungel, G. Carneiro, A. P. Bradley, Deep learning and structured prediction for the segmentation of mass in mammograms, in: Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015, pp. 605–612.
  • (28) N. Dhungel, G. Carneiro, A. Bradley, Tree re-weighted belief propagation using deep learning potentials for mass segmentation from mammograms, in: 12th IEEE International Symposium on Biomedical Imaging, ISBI 2015, Brooklyn, NY, USA, April 16-19, 2015, 2015, pp. 760–763. doi:10.1109/ISBI.2015.7163983.
  • (29) J. S. Cardoso, I. Domingues, H. P. Oliveira, Closed shortest path in the original coordinates with an application to breast cancer, International Journal of Pattern Recognition and Artificial Intelligence 29 (1).
  • (30) J. S. Cardoso, N. Marques, N. Dhungel, G. Carneiro, A. Bradley, Mass segmentation in mammograms: A cross-sensor comparison of deep and tailored features, in: Proceeding of the IEEE International Conference on Image Processing (ICIP), 2017, pp. 1737–1741.
  • (31) W. Zhu, X. Xiang, T. D. Tran, G. D. Hager, X. Xie, Adversarial deep structured nets for mass segmentation from mammograms, in: Proceedings of the IEEE 15 International Symposium on Biomedical Imaging (ISBI), 2018, pp. 847–850.
  • (32) M. A. Al-antari, M. A. Al-masni, M.-T. Choi, S.-M. Han, T.-S. Kim, A fully integrated computer-aided diagnosis system for digital x-ray mammograms via deep learning detection, segmentation, and classification, International Journal of Medical Informatics 117 (2018) 44–54.
  • (33) J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
  • (34) D. Yang, D. Xu, S. K. Zhou, B. Georgescu, M. Chen, S. Grbic, D. Metaxas, D. Comaniciu, Automatic liver segmentation using an adversarial image-to-image network, in: Proceedings of the International Conference on Medical image computing and computer-assisted intervention (MICCAI), 2017, pp. 507–515.
  • (35) I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in: Advances in neural information processing systems, 2014, pp. 2672–2680.
  • (36)

    P. Isola, J.-Y. Zhu, T. Zhou, A. A. Efros, Image-to-image translation with conditional adversarial networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5967–5976.

  • (37) L. Kurnianggoro, K.-H. Jo, et al., A survey of 2d shape representation: Methods, evaluations, and future research directions, Neurocomputing 300 (2018) 1–16.
  • (38) C. Hofer, R. Kwitt, M. Niethammer, A. Uhl, Deep learning with topological signatures, in: Advances in Neural Information Processing Systems, 2017, pp. 1634–1644.
  • (39) M. Masoumi, A. B. Hamza, Spectral shape classification: A deep learning approach, Journal of Visual Communication and Image Representation 43 (2017) 198–211.
  • (40) Y. Fang, J. Xie, G. Dai, M. Wang, F. Zhu, T. Xu, E. Wong, 3d deep shape descriptor, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2319–2328.
  • (41) C. R. Qi, H. Su, K. Mo, L. J. Guibas, Pointnet: Deep learning on point sets for 3d classification and segmentation, Proc. Computer Vision and Pattern Recognition (CVPR), IEEE 1 (2) (2017) 4.
  • (42) L. Luciano, A. B. Hamza, Deep learning with geodesic moments for 3d shape classification, Pattern Recognition Letters 105 (2018) 182–190.
  • (43) Y. Xie, J. Zhang, Y. Xia, M. Fulham, Y. Zhang, Fusing texture, shape and deep model-learned information at decision level for automated classification of lung nodules on chest ct, Information Fusion 42 (2018) 102–110.
  • (44) K. Sirinukunwattana, S. E. A. Raza, Y.-W. Tsang, D. R. Snead, I. A. Cree, N. M. Rajpoot, Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images, IEEE transactions on medical imaging 35 (5) (2016) 1196–1206.
  • (45) P. Kisilev, E. Sason, E. Barkan, S. Hashoul, Medical image description using multi-task-loss cnn, in: Deep Learning and Data Labeling for Medical Applications, Springer, 2016, pp. 121–129.
  • (46) S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, in: Advances in neural information processing systems, 2015, pp. 91–99.
  • (47) W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg, Ssd: Single shot multibox detector, in: European conference on computer vision, Springer, 2016, pp. 21–37.
  • (48) F. Akram, J. Kim, C. Lee, K. N. Choi, Segmentation of regions of interest using active contours with SPF function, Comp. Math. Methods in Medicine 2015 (2015) 710326:1–710326:14. doi:10.1155/2015/710326.
  • (49) S. Lankton, A. Tannenbaum, Localizing region-based active contours, IEEE transactions on image processing 17 (11) (2008) 2029–2039.