Latest investigations demonstrate that breast cancer persists as one of the most threatening cancer types to female, accounting for 29% of cancer incidence and 15% of cancer mortality in women . Early diagnosis of breast cancer is vital for the survival of patients. Mammography is one of the most effective and efficient breast cancer screening tools. However, analyzing mammograms by radiologists is tedious and the interpretations are subject to substantial inter- and intra-observer variations, which may lead to missed cancers as well as overdiagnosis [2, 3]. Therefore, a computer-aided detection/diagnosis (CAD) system that can work as a second reader is important and necessary.
Various types of abnormalities may show in mammograms, such as asymmetrical breast tissues, adenopathy, density, microcalcifications, and masses. Among them, breast masses are believed to contribute significantly to breast cancers . Currently, the majority of breast mass studies concentrated on image-level lesion detection and patch-level mass classification or segmentation [5, 6, 7, 8, 9, 10]. However, image-level lesion detection can only give the bounding box of the mass without the boundary information, which has been identified as an important indicator of its malignancy . And patch extraction around the mass before segmentation is a tedious and difficult work for radiologists. Therefore, mass segmentation of whole mammograms is of high application value for breast cancer detection and diagnosis. Specifically, our focus in this study is the automatic breast mass segmentation in whole mammograms, i.e., the segmentation in full fields of view (FOVs) of input mammograms rather than extracted regions of interest (ROIs).
In this paper, we propose a new model, attention-guided dense-upsampling network (AUNet), for the segmentation of mammographic masses. Different from the classical symmetric encoder-decoder architecture of UNet 
, AUNet employs an asymmetrical structure – different encoder and decoder blocks – through the implementation of residual connections. Furthermore, we design a novel upsampling module, attention-guided dense-upsampling block (AU block), to compensate the information loss caused by bilinear upsampling, effectively fuse the high- and low-level features, and at the same time, highlight the rich-information channels. The performance of the proposed network was evaluated on two public mammographic datasets, CBIS-DDSM and INbreast. With AUNet, we achieved an average Dice score of 81.8% for CBIS-DDSM and 79.1% for INbreast. Both improved the segmentation results of UNet by more than 8%. Our major contributions are: 1) A more effective asymmetric encoder-decoder network architecture is introduced; 2) We propose a new block, AU block, that can effectively extract important information from both high- and low-level features; 3) AU block can serve as a universal decoder module that is compatible with any encoder-decoder segmentation network; 4) Implementing both AU block and the asymmetrical structure, our proposed network, AUNet, is able to accurately segment masses in whole mammograms without the need of ROI extraction; 5) Superior breast mass segmentation performances were achieved by AUNet compared to commonly utilized fully convolutional networks (FCNs) in medical imaging. Our code will be made publicly available soon.
2 Related works
In this section, we review the related works on deep learning models for image segmentation and existing methods for mammographic mass segmentation.
2.1 Segmentation networks
Since the introduction of FCNs in 2015 
, most segmentation models follow a similar encoder-decoder network backbone design. The encoder pathway first extracts high dimensional and high abstract feature maps from the inputs, usually with severely decreased resolutions, and then the decoder pathway is responsible for the recovery of image resolution and generation of the segmentation results. However, due to the information loss during the encoding process by pooling or convolution with strides, the reconstructed segmentation results are usually not satisfactory. To solve this issue, works have been done to include conditional random fields as a post processing method, which has shown a significant improvement[23, 24]. Another direction is the application of dilated convolution . Dilated convolution can increase the receptive field and, in the meantime, keep the image resolution unchanged. Nevertheless, limited by the current available computing power, dilated convolution at high image resolutions is hard to achieve if not impossible . UNet proposed another solution to the problem . The main idea of UNet is to fuse high-level feature maps that are rich in semantic information with low-level feature maps that are rich in location information. By fusing feature maps from different layers, UNet is capable of generating accurate segmentation maps for small datasets. However, the feature fusion of UNet is done through simple concatenation, which is not effective enough and improvement is necessary for different applications [26, 27].
2.2 Upsampling approaches
Different methods have been adopted in literature to upsample the low-resolution feature maps. Bilinear interpolation is a simple and efficient method that has been commonly used[23, 28]. The output of bilinear interpolation is fixed and not learnable, which may cause information loss . Deconvolution was first proposed along with FCNs  and adopted in later works. Deconvolution could be realized in two ways. One is through the reverse operation of convolution 
. The other is through unpooling, where the low-resolution feature maps are first upsampled to high-resolution feature maps using the stored max pooling indices and then the sparse feature maps are densified by convolutions
. Both methods result in learnable upsampling procedure but require zero padding at the first step. The last method is dense upsampling convolution (DUC)
, derived from the sub-pixel convolution method originally developed for image super resolution task. DUC is also learnable. In addition, different from deconvolution, no zero padding is required for DUC.
2.3 Attention mechanism
Attention mechanism in neural networks has attracted a lot of attention recently. It is proposed in accordance with the human visual attention that human beings always focus on a certain part of a given image after quickly glimpsing through it. Attention could be viewed as a tool to force the network focusing on the most informative part of the inputs or features 
. It has been widely applied in natural language processing and image captioning[33, 34]. Studies also found that CNNs could learn implicitly to localize the most important regions of the input images , which could be treated as a kind of attention. To improve image classification accuracies, both spatial and channel-wise attention modules have been proposed in literatures [36, 37]. Attention has also been explicitly used for image segmentation [38, 39]. Different from these works, which utilize attention mechanism to focus on regions of inputs, our proposed AU block implements attention to select important channels for breast mass segmentation.
2.4 Segmentation of mammographic mass
Automatic mammographic mass segmentation methods could be divided into unsupervised and supervised methods. Unsupervised methods include region-based [40, 10], contour-based [41, 42], and clustering models [43, 44]. These models encounter various problems when applied to mammographic mass segmentation . Region-based models rely on region homogeneity and prior information is usually needed, such as the locations of seeding points and shape information . Contour-based models are based on edge detection whereas it is challenging to extract the boundary between masses and normal breast tissues 
. Hierarchical clustering models are computational expansive while partitional clustering models need to know the number of regions in advance
. Supervised methods have a training and testing procedure. Pattern matching is widely used for segmentation and detection[49, 50]. Nonetheless, mammographic masses can be in a wide variety of shapes, which hinders the usage of pattern matching approaches . Deep learning models belong to supervised methods. Deep structured models have been successfully applied to segment masses from ROIs rather than whole mammograms [5, 6, 51]. And using manually extracted ROIs could improve the segmentation performance compared to automatically detected bounding boxes generated by detection models 
, which indicates that the segmentation results depend on the patch extraction process and it is difficult to achieve fully automatic mammographic mass segmentation employing this approach. Very few attempts on mass segmentation of whole mammograms could be found probably caused by the previously discussed difficulties. These studies mainly combined famous segmentation models with some special network modules developed for natural image analysis. For example, atrous spatial pyramid pooling and attention gates have been introduced to FCDenseNet and Dense-U-Net to enhance the segmentation capacity [18, 19]. Considering the gap between medical and natural image domains, these models may not be perfectly suitable for the breast mass segmentation task. Moreover, the experiments were not comprehensive, and the models were not publicly available. Aiming to address these challenges, our AUNet is designed specifically for fully automatic mammographic mass segmentation. Two public datasets have been tested and the models will be made available once the paper is accepted.
In this section, we first describe the datasets used in the study. Then, the proposed network architecture, including the asymmetrical encoder-decoder backbone and the AU block, is presented. After that, loss function selection is discussed. Finally, quantitative evaluation metrics are listed.
We instantiated our proposed network with two publicly available datasets, CBIS-DDSM [52, 53] and INbreast . For CBIS-DDSM, a total of 858 images were used in the current study with 690 images for training and 168 for validation. The INbreast dataset contains 107 images with accurate mass segmentation masks. A 5-fold cross-validation experiment was conducted for INbreast.
All the images along with the masks were first processed to remove the irrelevant background regions (rows and columns have negligible maximum intensities) and then resized to , followed by an intensity normalization. Before inputting into the networks, the gray images were changed to RGB images by copying the pixel values to the other two channels. The importance of this step will be discussed later. No further data processing or augmentation was applied.
Fig.1a shows representative images from the two datasets. It could be observed that mammographic masses are in a wide variety of shapes and sizes, which increases the difficulty of training the segmentation network. Fig. 1b and h give the area ratio distributions of the two datasets. Both indicate that most masses only occupy very small regions of the whole mammograms. Results confirm more than 81.8% masses occupy less than 1% area of the whole mammograms for CBIS-DDSM. For INbreast, more than 81% masses occupy less than 4% area of the whole mammograms. Therefore, it is much more difficult to train a network capable of accurately segmenting masses in whole mammograms than in mass-centered mammographic patches. Other available important information including subtlety, mass shape and margin, BIRADS category, and pathology are also plotted in Fig.1 to comprehensively describe the datasets.
3.2 Asymmetrical network backbone
Our proposed network employs an encoder-decoder architecture backbone (Fig. 2a). The encoder pathway contains five encoder blocks with the first four followed by max pooling. Thus, the downsampling ratio is 16 in total. The decoder pathway is composed of four alternating upsampling and decoder blocks. The upsampling block will be discussed in the next section. The classic UNet employs symmetrical encoder and decoder pathways, where the basic unit (Fig. 2b) is implemented for both the encoder and decoder blocks . Although this simple design contributes to the efficiency of the network, the effectiveness needs to be explored. Inspired by the recent wide spread usage of ResNet , we investigated the feasibility of another two configurations, deep unit (Fig. 2c) and res unit (Fig. 2d).
For the three different units, we have the respective outputs as follows:
where y is the respective output of the different units and x is the corresponding input.
refers to the ReLU function.W and b refer to the weights and bias of the different convolution layers. * is the convolution operation.
Moreover, we also evaluated the different combinations of applying the three units as the encoder/decoder block. In the results section, we will show that constructing an asymmetrical network backbone by applying the res unit as the encoder block and the basic unit as the decoder block, the network could achieve the best segmentation performance.
3.3 Attention-guided dense-upsampling block
Our major novelty regarding the network design lies in the upsampling block, where we introduce our proposed AU block (Fig. 3b). The original UNet used deconvolution to upsample the feature maps . However, our preliminary experiments found that deconvolution was not as effective as bilinear upsampling for our application (supplementary file Table S1), and thus, bilinear upsampling was utilized throughout the study.
The bilinear upsampling block (BU block) of UNet is shown in Fig. 3a, where the high-level features are simply upsampled and concatenated with the low-level features after passing a convolution layer. The goal of the proposed AU block (Fig. 3b) is to extract all important information from both high- and low-level features. The high-level low-resolution features () are firstly upsampled using two different methods. One is dense upsampling convolution (), and the other is bilinear upsampling followed by a convolution layer (
). The convolution layer is always followed by batch normalization and ReLU activation unless otherwise specified. Then,is combined with the low-level features () by summation (). A convolution layer is applied before is concatenated with () to smooth the concatenation process. In this way, we expect that contains all the information from both and .
The next step is to select the important information from . Motivated by the squeeze-and-excitation networks , we adopt a channel-wise attention. Firstly, global average pooling is applied to obtain a channel-wise descriptor :
where F is the () channel of . H and W refer to the height and width of .
passes through two fully connected layers (FC layers), one with ReLU and one without, and a Sigmoid function to get the channel-wise weightsS:
where refers to the Sigmoid function. , , , and are the weights and bias of the FC layers, respectively. r is a reduction ratio. The output of the AU block is:
After that, goes through a basic unit (Fig. 2b), which is composed of two convolution layers, and then, is treated as the high-level feature input to the next AU block.
3.4 Loss function
The commonly used cross-entropy loss function for two-class segmentation task is defined as:
For 2D inputs, N is the total number of pixels in the image. is the ground truth label of the pixel with 0 refers to the background and 1 refers to foreground. is the corresponding predicted probability of the pixel belonging to the foreground class.
From the definition, positive and negative pixels contribute equally to the cross-entropy loss. However, from Fig. 1, we know a severe class imbalance problem exists for both datasets that masses only occupy small regions of the whole mammograms. Minimization of the cross-entropy loss function may bias the model towards correctly predicting the negative class. To solve this issue, we introduced another loss function, the Dice loss. The Dice loss in our situation is defined as:
where is a constant to keep numerical stability. It has been reported that applying only the Dice loss makes the optimization process unstable . Therefore, we use a combined loss function for our model, which is defined as:
where is a weight constant to control the trade-off between the cross-entropy loss and the dice loss.
3.5 Evaluation metrics
To quantitatively evaluate the proposed model, we used dice similarity coefficient (DSC), sensitivity (SEN), relative area difference (), and Hausdorff distance (HAU) to characterize the performances of the methods on the test datasets. We use the overall average metrics to select the best model during the network architecture optimization. To comprehensively compare our final model to the existing networks, in addition to the overall average metrics, we also evaluate the results with respect to the image properties for the CBIS-DDSM dataset (Fig. 1c-g). DSC, SEN, , and HAU are defined as:
where pred refers to network predictions and GT referes to ground truth segmentations. refers to the predicted mass area and refers to the ground-truth mass area. TP, FP, and FN refer to true positives, false positives, and false negatives. and refers to the L2 distance between the two points.
Differences between the different models were evaluated by Wilcoxon signed-rank test with a significance threshold of .
3.6 Experimental Set-up
Our proposed network as well as the comparison models were implemented with PyTorch. Network training and testing were run on a NVIDIA GeForce GTX 1080Ti GPU (11G) with batch size of 4. We used ADAM with the AMSGRAD optimization method . The learning rate was initially set to
, and step decay policy was applied, specifically with [40, 30, 30, 20] epochs at the learning rate of [, , , ]. The INbreast dataset contains 107 images, which may limit the proper training of a deep neural network. Therefore, we tried to fine-tune the models pretrained on the CBIS-DDSM dataset. We set the respective hyper-parameters in (8) and (9) empirically to and . We have tested with different values (0.5, 1.0 and 2) and found that 1.0 achieved the best segmentation performance (supplementary file Table S2). The determination of the reduction ratio r will be discussed in the results section.
To validate the effectiveness of our proposed AUNet, we conducted ablation experiments. Specifically, to select the best network backbone, we have tried to substitute the encoder and decoder blocks in Fig. 2a with the deep unit (Fig. 2c; Deep-UNet) or res unit (Fig. 2d; Res-UNet) but keep the BU block (Fig. 3a) unchanged. In addition, different combinations of the encoder and decoder units have been tested to check the feasibility of symmetric and asymmetric structures. Finally, we compare the segmentation results of the proposed AUNet with three established FCNs, UNet , FusionNet , and FCDenseNet . The original UNet utilizes deconvolution for upsampling. However, experimental results demonstrated that bilinear upsampling is more effective for our application (supplementary file Table S1). We adopted bilinear upsampling for all the networks. FusionNet introduces residual connections to UNet and increases the network depth by adding more convolution layers in each unit (5 convolutions per unit). FCDenseNet103 extends the recently published architecture DenseNet to fully convolutional networks for image segmentation task. Similarly, all the networks were trained from scratch for the CBIS-DDSM dataset and fine-tuning was investigated on the INbreast dataset. We show that although FusionNet and FCDenseNet103 are much deeper than AUNet, AUNet could still generate better segmentation results, which highlights the effectiveness of the proposed AU block. Three independent experiments were done for each network and the results are presented as ().
4 Experimental results
In this section, we present the results on the two public datasets, CBIS-DDSM and INbreast, and compare the results of the proposed AUNet to other FCNs.
4.1 Results on CBIS-DDSM dataset
In this section, we firstly discuss the choice of the different encoder/decoder blocks. Then the determination of the reduction ratio r is demonstrated. Finally, we compare the results of the optimized AUNet to the three FCNs.
4.1.1 Optimization of the network backbone
Results of networks employing different encoder and decoder blocks are presented in Table 1. The model names indicate the units applied with the first word referring to the encoder block and the second referring to the decoder block. For example, the model Basic-Deep-UNet means we utilized the basic unit (Fig. 2b) for the encoder pathway and the deep unit (Fig. 2c) for the decoder pathway. From Table 1, two general conclusions could be made: a) Deeper networks generally achieve better performances with higher DSC, higher SEN, lower , and lower HAU (compare UNet to Deep-Deep-UNet); b) Models with asymmetric structures, especially those employing the basic unit in only one pathway, perform better than models with symmetric structures (compare Res-Basic-UNet to Res-Res-UNet and Res-Deep-UNet).
|Models||DSC (%)||SEN (%)||(%)||HAU|
By taking all the four evaluation parameters into consideration, we selected the model ‘Res-Basic-UNet’ as our network backbone since it achieves the highest average DSC () among all the models and, in the meantime, comparable SEN ( vs. ), ( vs. ), and HAU ( vs. ) to the respective best results.
4.1.2 Performance enhancement by the AU block
The introduction of the AU block (Fig. 3b) to our network backbone brings an obvious performance increment shown by all the four evaluation characteristics (Table 2). The reduction ratio r is very important for the capacity and computational cost of the proposed AUNet. Therefore, we have conducted experiments to finalize the selection. A wide range of r has been tested from 2 to 32. Results indicate that with , the best model performance could be achieved (Table 2). Besides, it could also be observed that regardless of the choice of r, the proposed AU Block could always enhance the segmentation performance compared to the selected network backbone (Res-Basic-UNet), which demonstrates the general effectiveness of the proposed block. For all the following experiments, is applied unless otherwise specified.
|Reduction ratio||DSC (%)||SEN (%)||(%)||HAU|
4.1.3 Comparison to established FCNs
Our proposed AUNet achieves the best segmentation results when compared to established FCNs (Table 3). Comparing among the three established models, FusionNet gives the highest DSC, the lowest , and the lowest HAU whereas FCDenseNet103 presents the highest SEN. This indicates that FCDenseNet103 increases its capability of finding the mass locations by generating more false positives. Since FCDenseNet103 is much deeper than the other networks, it suggests that very deep networks perform worse on the mammographic datasets probably caused by overfitting. On the other hand, our proposed AUNet achieves the best results with the highest DSC, the highest SEN, the lowest , and the lowest HAU, which demonstrates the superiority and robustness of our proposed network. Our model shows an average DSC increase of at least 2% (statistically significant with by Wilcoxon signed-rank test), SEN increase of 0.7%, decrease of 4.4%, and HAU decrease of 0.05 compared to the respective best performed FCNs.
|Models||DSC (%)||SEN (%)||(%)||HAU|
Considering the inherent differences among the images having different categories (subtlety, BIRADS, mass shape, mass margin, and pathology), the segmentation performances of the different networks are also presented with regards to these properties. Combining the different categories (21 in total: 5 subtlety groups, 4 BIRADS categories, 5 shape groups, 5 margin categories, and 2 pathology groups) with the different evaluation metrics (DSC, SEN, , and HAU), there are 84 cases (detalied results in supplementary file Table S3-S6). Overall, our AUNet still achieves the best results, ranking the in 56 cases (16 for DSC, 11 for SEN, 14 for , and 15 for HAU). FusionNet and FCDenseNet103 obtain the best results in 15 and 10 cases, respectively. UNet performs the worst in this aspect with only 3 cases.
To directly compare the performances of the different networks, the empirical cumulative distributions of DSC were plotted (Fig. 4). The closer the distribution line to the lower right position in the figure, the more images are segmented with high DSC values by the corresponding network. Thus, we could conclude that for the CBIS-DDSM dataset, AUNet achieves the best segmentation performance, followed by FusionNet, FCDenseNet, and UNet.
4.2 Results on INbreast dataset
The INbreast dataset is smaller than the CBIS-DDSM dataset. As such, we tried to re-use the CBIS-DDSM trained models and fine-tuned those models using the INbreast dataset. Moreover, 5-fold cross-validation experiments were conducted to generate meaningful and convincing results.
The segmentation results of the proposed AUNet and the three established models with/without pretraining on CBIS-DDSM are listed in Table 4. It could be observed that with or without the pretraining step, AUNet always generates the best segmentation results and pretraining improves the segmentation performance of all the methods significantly. With pretraining on CBIS-DDSM, the results of the three established models present a different pattern from the CBIS-DDSM dataset. Among the three established models, FCDenseNet103 generates the highest DSC and SEN value, UNet shows the lowest , and FusionNet gives the lowest HAU. It is interesting that FusionNet shows much worse performance on INbreast than that on CBIS-DDSM. On the other hand, compared to the three models, our proposed AUNet still gives the best segmentation results with the highest DSC, the highest SEN, the lowest , and the lowest HAU. AUNet shows an average DSC increase of at least 3% (statistically significant with by Wilcoxon signed-rank test), SEN increase of 2.9%, decrease of 6.5%, and HAU decrease of 0.29 (statistically significant with by Wilcoxon signed-rank test). Similarly, the empirical cumulative distribution plot indicates that for INbreast, AUNet still achieves the best segmentation performance, followed by FCDenseNet, FusionNet, and UNet (Fig. 5).
|Models||DSC (%)||SEN (%)||(%)||HAU|
|UNet (w/o )|
|UNet (w/ )|
|w/o–Without pretraining on CBIS-DDSM|
|w/–With pretraining on CBIS-DDSM|
4.3 Qualitative results
Fig. 6 presents several segmentation results generated by the different networks for qualitative comparisons. We can see, overall, our proposed AUNet performs better than the other three FCNs for our whole mammographic mass segmentation task. In addition, it could be observed that AUNet displays an impressive ability to suppress the false positive results of UNet without increasing the number of false negatives, whereas both FusionNet and FCDenseNet103 are not effective in this aspect or even make the situation worse (Fig. 6; the first, second, and last rows). This observation is consistent with the quantitative results discussed before. Lastly, our AUNet could give accurate segmented masses for difficult samples when the other three networks could barely find the targeted regions at all, such as the third example in Fig. 6.
4.4 Results on extracted image patches
In order to compare the performance of proposed network directly to the literature on breast mass segmentation, we also conducted experiments on extracted mass-centered image patches for the INbreast dataset. For each mammogram, we first found the smallest rectangular that could accommodate the mass. Then, the mass-centered image patch was extracted through enlarging the rectangular by 20% in area with an equal elongation ratio in width and height of . Similar to the whole mammogram situation, 5-fold cross-validation experiments with three replicates were done. Results in Table 5 confirms that our proposed AUNet could also achieve the best segmentation results on mass-centered image patches compared to both the three FCNs and the literature reported results.
|Models||DSC (%)||SEN (%)||(%)||HAU|
|Cardoso et al., 2015 ()|
|Dhungel et al., 2015b ()|
|Dhungel et al., 2017 ()|
|Patches were extracted based on detection results.|
|Convolutional and FC layers||23||50||103||44|
|FPS (with inputs)||59||36||27||32|
4.5 Model complexity
Table 6 lists the total number of convolutional and FC layers, the optimizable parameters, and the inference time in terms of frames per second (FPS) with input resized to . Obviously, UNet is the simplest and fastest model, and the other three models (FusionNet, FCDenseNet103, and AUNet) have similar inference speeds with AUNet achieves the best segmentation performance.
Segmentation of mammographic masses is a challenging task as mammograms have low signal-to-noise ratio and breast masses may vary in shapes and sizes. An easy alternative is to segment masses from extracted ROIs. However, manual extraction of ROIs is a tedious task. Automatic detection algorithms still subject to high false positives and specially designed post processing methods are required to achieve expected performance . Therefore, automatic breast mass segmentation in whole mammograms is of great clinical value. There are several reports targeting at developing deep learning models for whole mammographic mass segmentation, such as the ASPP-FC-DenseNet and the Attention Dense-U-Net [18, 19]. ASPP-FC-DenseNet achieved a Dice similarity coefficient of 76.97% on the private dataset and Attention Dense-U-Net achieved a sensitivity of 77.89% on the selected DDSM dataset. Both are much smaller than the results achieved in this study, which confirms the superiority of our proposed AUNet. Fig. 7 presents a few segmentation results of AUNet. We admit that compared to inputs with irregular or small masses, AUNet performs slightly better for inputs with large and regular masses. However, Fig. 6 indicates that AUNet still performs better than the three FCNs for inputs with small and irregular masses. Overall, Fig. 6 and 7 conclude that for inputs with different mass shapes and sizes, our AUNet could always give very accurate segmentation results.
Mammograms are taken with high resolutions. Images from CBIS-DDSM dataset have a width ranging from 1786 to 5431 pixels and a height ranging from 3920 to 6931 pixels. Images form INbreast have either or pixels. To facilitate the training and testing of deep neural networks, necessary image preprocessing steps are required, such as image patch extraction or resizing. Although patch extraction method can preserve all the original image information and researchers have developed elegant approaches to extract informative image patches , we adopted resizing in this study. On one hand, it has been suggested in the computer vision field that global contextual information is important for accurate image segmentation . Patch extraction restricts the field of view of the network, which may influence the segmentation performance. Therefore, the correlations between the patches need to be carefully considered, which we will investigate in the following work. On the other hand, after resizing, most masses still occupy hundreds to thousands of pixels. We believe these downsampled masses are large enough to preserve the overall mass information. Moreover, different input settings have been tested with gray or RGB inputs, with different resolutions ( inputs or inputs), and with fixed aspect ratios by zero padding the images before resizing (Table 7
). Although different inputs show influence on the final segmentation results, our proposed AUNet always achieves the best performance (more results in supplementary file Table S7-S9). Thus, it can be anticipated that our method should also be able to achieve the best segmentation performance if the full resolution inputs are utilized. With detailed inspection, the results show that RGB inputs could improve the segmentation performance. Even though it was not investigated in the current study, RGB inputs can also facilitate the direct transfer learning of networks trained on natural images. Resizing to the higher resolution (pixels) showed negative effects on the segmentation performance, which was also observed for the three established FCNs (supplementary file Table S8). This weakened performance might be caused by two reasons. One is due to the GPU memory limitation, batch size of 2 was applied for inputs with pixels instead of 4 for inputs with pixels. The other is it is difficult to accurately define the mass boundaries in mammograms. At higher resolutions, the images are more sensitive to manual label errors. Zero padding brings large regions of background to the inputs and hinders the segmentation process. In this study, our experiments were done with RGB inputs resized to pixels to maximize the segmentation performance.
|Inputs||DSC (%)||SEN (%)||(%)||HAU|
(pad & resize
Our AUNet, as well as the three comparison networks, showed severely worse performance on the INbreast dataset compared to that on the CBIS-DDSM dataset when trained from scratch (Table 3 and 4). A major cause could be the large difference in the sample size. Much better results were obtained when the networks were pretrained on the CBIS-DDSM dataset. But still, the performance is not as good as that on the CBIS-DDSM dataset. Except the sample size, another observable difference between the two datasets is the different image contrasts (Fig. 8). CBIS-DDSM images have a higher contrast in the breast regions than the INbreast images. Although intensity normalization was conducted before the images were inputted into the networks, the differences in the original image contrast might also affect the results. Besides, as shown in Fig. 1, the image distributions are also different between the two datasets, which might influence the results a little bit.
UNet is a very powerful network for biomedical image segmentation  and is the template for many following-up studies [16, 17]. Our proposed AUNet adopts a similar encoder-decoder architecture. To enhance the performance, we first investigated the network backbone design. Compared to the basic unit (Fig. 2b) used in both the encoder and decoder pathways of naive UNet, we found that our asymmetrical network backbone Res-Basic-UNet was more suitable for our application. This is reasonable as the res unit (Fig. 2d) in the encoder pathway promotes the information and gradient propagation while the basic unit (Fig. 2b) in the decoder pathway better preserves important semantic information of the high-level features. Our results show that Res-Basic-UNet improves the DSC by 3.9% over UNet.
Then, we believe that the simple bilinear upsampling method and the feature fusion through concatenation adopted by UNet are not effective enough. Significant information loss might happen, which could greatly worsen the segmentation results. Therefore, we proposed a new upsampling block, AU block, to solve these problems. AU block utilizes the high-level features in two means. In one way, the high-level features are densely upsampled and fused with the low-level features by summation. In the other, the high-level features are bilinear upsampled and concatenated with the convolution smoothed summation (Fig. 3b). Moreover, in order to select the rich-informative channels, a channel-wise attention component is used after the concatenation. With AU block, our AUNet increases the DSC by another 4.3% over Res-Basic-UNet. Besides, AUNet outperforms the three widely used FCN segmentation networks and recently by a large margin for both CBIS-DDSM and INbreast datasets.
False positive and false negative are important issues that need to be considered for CAD systems. False positive is commonly found to be the problem that hinders the application of automatic detection algorithms to medical imaging [62, 65]. It can bring huge psychological stress and depression to patients and result in unnecessary biopsies. False negative, on the other hand, is detrimental for clinical applications which can miss early diagnosis. It is important to reduce both false positive and false negative results. The low signal-to-noise ratio of a mammogram makes it difficult to clearly differentiate the masse from the normal breast tissues (Fig. 1a and Fig. 6). All the three FCNs show serious false positive segmentation results, which greatly affected the evaluation metrics (Fig. 6). On the contrary, AUNet is able to effectively reduce the false positive incidences without increasing the false negative results through the information selection by channel-wise attention. Moreover, thanks to the full utilization of the feature map information, AUNet also performs better at decreasing the false negative results (Fig. 6; the third example).
Breast masses are significant contributors to breast cancers . Mass segmentation is an important step for the following disease diagnosis and treatment planning. After the mass segmentation, image features can be extracted from and surrounding the specific regions and different analyses can be conducted. These image features could be used to differentiate breast cancer subtypes . They were found to be associated with tumor-infiltrating lymphocytes in breast cancer, which is a promising predictive biomarker for the effectiveness of immunotherapy treatment . Some of them were identified as valuable prognostic markers for adjuvant and neoadjuvant chemotherapies [68, 69]
. As a necessary next step for our current work, we will study the corresponding image feature extraction methods as well as imaging-based disease diagnosis and treatment plan selection in the future.
In this work, we propose a new network, AUNet, for the mass segmentation in whole mammograms. Specifically, we utilized an asymmetrical encoder-decoder architecture and introduced a new upsampling block, AU block, to boost the segmentation performance. Comprehensive experiments have been conducted. AUNet presented improved segmentation behaviors on both CBIS-DDSM and INbreast datasets compared to existing FCN models, which proves its effectiveness and robustness. In addition, AUNet could greatly reduce both false negative and false positive results. We will make our code available, by which we hope our work can attract and inspire more following-up studies in the field.
This work was supported by funding from the National Natural Science Foundation of China (61601450, 61871371, and 81830056), Science and Technology Planning Project of Guangdong Province (2017B020227012), the Department of Science and Technology of Shandong Province (2015ZDXX0801A01), and the Fundamental Research Funds of Shandong University (2015QY001 and 2017CXGC1502).
-  R. L. Siegel, K. D. Miller, and A. Jemal. Cancer statistics, 2017. CA. Cancer J. Clin, 67:7–30, 2017.
-  R. L. Birdwell, D. M. Ikeda, K. F. O’Shaughnessy, and E. A. Sickles. Mammographic characteristics of 115 missed cancers later detected with screening mammography and the potential utility of computer-aided detection. Radiology, 219(1):192–202, 2001.
-  M. Lberg, M. L. Lousdal, M. Bretthauer, and M. Kalager. Benefits and harms of mammography screening. Breast Cancer Res., 17(63):1–12, 2015.
-  M. L. Giger, N. Karssemeijer, and J. A. Schnabel. Breast image analysis for risk assessment, detection, diagnosis, and treatment of cancer. Annu. Rev. Biomed. Eng, 15:327–357, 2013.
-  N. Dhungel, G. Carneiro, and A. P. Bradley. Deep learning and structured prediction for the segmentation of mass in mammograms. In In MICCAI, pages 2950–2954, 2015.
-  N. Dhungel, G. Carneiro, and A. P. Bradley. A deep learning approach for the analysis of masses in mammograms with minimal user intervention. Med. Image Anal., (37):114–128, 2017.
-  S. Han, H. Kang, J. Jeong, M. Park, W. Kim, W. Bang, and Y. Seong. A deep learning framework for supporting the classification of breast lesions in ultrasound images. Phys. Med. Biol., 62(19):7714–7728, 2017.
-  M. Jiang, S. Zhang, Y. Zheng, and D. N. Metaxas. Mammographic mass segmentation with online learned shape and appearance priors. In In MICCAI, pages 35–43, 2016.
-  S. T. Kim, J. Lee, H. Lee, and Y. M. Ro. Visually interpretable deep network for diagnosis of breast masses on mammograms. Phys. Med. Biol, 63(23):235025, 2018.
-  J. Wei, H. P. Chan, B. Sahiner, L. M. Hadjiiski, M. A. Helvie, M. A. Roubidoux, C. Zhou, and J. Ge. Computer-aided detection of breast masses on mammograms: Dual system approach with two-view analysis. Med. Phys., 36(11):4157–4168, 2006.
-  D. Guliato, R. M. Rangayyan, J. D. Carvalho, and S. A. Santiago. Polygonal modeling of contours of breast tumors with the preservation of spicules. IEEE Trans. Biomed. Eng, 55(1):14–20, 2008.
-  H. Greenspan, B. v. Ginneken, and R. M. Summers. Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE Trans. Med. Imaging, 35(5):1153–1159, 2016.
-  A. Hamidinekoo, E. Denton, A. Rampun, K. Honnor, and R. Zwiggelaar. Deep learning in mammography and breast histology, an overview and future trends. Med. Image Anal., 47:45–67, 2018.
-  G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A.W.M. van der Laak, van Ginneken B., and C. I. Sánchez. A survey on deep learning in medical image analysis. Med. Image Anal., 42:60–88, 2017.
-  O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In In MICCAI, pages 234–241, 2015.
-  A. Balagopal, S. Kazemifar, D. Nguyen, M. Lin, R. Hannan, A. Owrangi, and S. Jiang. Fully automated organ segmentation in male pelvic ct images. Phys. Med. Biol., 63(24):245015, 2018.
-  X. Li, Y. Hong, D. Kong, and X. Zhang. Automatic segmentation of levator hiatus from ultrasound images using u-net with dense connections. Phys. Med. Biol., 64(7):075015, 2019.
-  J. Hai, K. Qiao, J. Chen, H. Tan, J. Xu, L. Zeng, D. Shi, and B. Yan. Fully convolutional densenet with multiscale context for automated breast tumor segmentation. J. Healthc. Eng., 2019:1–11, 2019.
-  S. Li, M. Dong, G. Du, and X. Mu. Attention dense-u-net for automatic breast mass segmentation in digital mammogram. IEEE Access, 7:59037–59047, 2019.
-  C. D. Lehman, R. D. Wellman, D. S. M. Buist, K. Kerlikowske, N. A. Tosteson, D. L. Miglioretti, and for the Breast Cancer Surveillance Consortium. Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Intern. Med., 175(11):1828–1837, 2015.
-  T. Kooi, G. Litjens, B. v. Ginneken, A. Gubern-Mérida, C. I. Sánchez, R. Mann, A. d. Heeten, and N. Karssemeijer. Large scale deep learning for computer aided detection of mammographic lesions. Med. Image Anal., 35:303–312, 2017.
-  J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In In IEEE CVPR, pages 3431–3440, 2015.
-  L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell, 40(4):834–848, 2018.
-  K. Kamnitsas, C. Ledig, V. F. J. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, and B. Glocker. Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation. Med. Image Anal., 36:61–78, 2017.
-  F. Yu, V. Koltun, and T. Funkhouser. Dilated residual networks. In In IEEE CVPR, pages 472–480, 2017.
-  G. Lin, A. Milan, C. Shen, and I. Reid. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In In IEEE CVPR, pages 1925–1934, 2017.
-  Z. Zhang, X. Zhang, C. Peng, D. Cheng, and J. Sun. Exfuse: Enhancing feature fusion for semantic segmentation. In In ECCV, pages 1–16, 2018.
-  H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. Pyramid scene parsing network. In In IEEE CVPR, pages 2881–2890, 2017.
-  P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell. Understanding convolution for semantic segmentation. In In WACV, pages 1451–1460, 2018.
-  V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell, 39(12):2481–2495, 2017.
-  W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In In IEEE CVPR, pages 1874–1883, 2016.
-  V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu. Recurrent models of visual attention. In In NIPS, pages 1–9, 2014.
-  L. Chen, H. Zhang, J. Xiao, L. Nie, J. Shao, W. Liu, and T. S. Chua. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In In IEEE CVPR, pages 5659–5667, 2017.
-  A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, and L. Kaise. Attention is all you need. In In NIPS, pages 1–11, 2017.
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba.
Learning deep features for discriminative localization.In In IEEE CVPR, pages 2921–2929, 2016.
-  J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. In In IEEE CVPR, pages 7132–7141, 2018.
-  A. G. Roy, N. Navab, and C. Wachinger. Concurrent spatial and channel ’squeeze & excitation’ in fully convolutional networks. In In MICCAI, pages 421–429, 2018.
-  Z. Mirikharaji and G. Hamarneh. Star shape prior in fully convolutional networks for skin lesion segmentation. In In MICCAI, pages 737–745, 2018.
-  D. Nie, Y. Gao, L. Wang, and D. Shen. Asdnet: Attention based semi-supervised deep networks for medical image segmentation. In In MICCAI, pages 370–378, 2018.
-  T. O. Gulsrud, K. Engan, and T. Hanstveit. Watershed segmentation of detected masses in digital mammograms. In In Proc. EMBS, pages 3304–3307, 2005.
-  P. Rahmati, A. Adler, and G. Hamarneh. Mammography segmentation with maximum likelihood active contours. Med. Image Anal., 16:1167–1186, 2012.
-  J. Shi, B. Sahiner, H. P. Chan, J. Ge, L. Hadjiiski, M. A. Helvie, A. Nees, Y. T. Wu, J. Wei, C. Zhou, Y. Zhang, and J. Cui. Characterization of mammographic masses based on level set segmentation with new image features and patient information. Med. Phys., 35(1):280–290, 2008.
-  A. R. Abdel-Dayem and M. R. EI-Sakka. Fuzzy entropy based detection of suspicious masses in digital mammogram images. In In Proc. IEEE EMBS, pages 4017–4022, 2005.
-  J. E. Ball, T. W. Butler, and L. M. Bruce. Towards automated segmentation and classification of masses in digital mammograms. In In Proc. IEEE EMBS, pages 1814–1817, 2004.
-  A. Oliver, J. Freixenet, J. Martí, E. Pérez, J. Pont, and E. R. E. Denton. A review of automatic mass detection and segmentation in mammographic images. Med. Image Anal., 14:87–110, 2010.
-  M. A. Kupinski and M. L. Giger. Automated seeded lesion segmentation on digital mammograms. IEEE. Trans. Med. Imaging, 17(4):510–517, 1998.
-  B. Sahiner, N. Petrick, H. P. Chan, L. M. Hadjiiski, C. Paramagul, M. A. Helvie, and M. N. Gurcan. Computer-aided characterization of mammographic masses: Accuracy of mass segmentation and its effects on characterization. IEEE Trans. Med. Imaging, 20(12):1275–1284, 2001.
-  L. Li, R. A. Clark, and J. A. Thomas. Computer-aided diagnosis of masses with full-field digital mammography. Acad. Radiol., 9:4–12, 2002.
-  J. Freixenet, A. Oliver, R. Marti, and X. Lladó. Eigendetection of masses considering false positive reduction and breast density information. Med. Phys., 35(5):1840–1853, 2008.
-  E. Song, S. Xu, X. Xu, J. Zeng, Y. Lan, S. Zhang, and C. C. Hung. Hybrid segmentation of mass in mammograms using template matching and dynamic programming. Acad. Radiol., 17(11):1414–1424, 2010.
-  N. Dhungel, G. Carneiro, and A. P. Bradley. Deep structured learning for mass segmentation from mammograms. In In IEEE ICIP, pages 2950–2954, 2015.
-  M. Heath, K. Bowyer, D. Kopans, R. Moore, and Jr. P. Kegelmeyer. The digital database for screening mammography. In In Proc. 5th Inter. Work. on Digital Mammography, pages 212–218, 2000.
-  R. S. Lee, F. Gimenez, A. Hoogi, K. K. Miyake, M. Gorovoy, and D. L. Rubin. Data descriptor: A curated mammography data set for use in computer-aided detection and diagnosis research. Sci. Data, 4:170177, 2017.
-  I. C. Moreira, I. Amaral, I. Domingues, A. Cardoso, M. J. Cardoso, and J. S. Cardoso. Inbreast: Toward a full-field digital mammographic database. Acad. Radiol., 19(2):236–248, 2012.
-  K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In In IEEE CVPR, pages 770–778, 2016.
-  W. Zhu, Y. Huang, H. Tang, Z. Qian, N. Du, and W. Fan. Anatomynet: Deep 3d squeeze-and-excitation u-nets for fast and fully automated whole-volume anatomical segmentation. arXiv: 1808.05238, pages 1–14, 2018.
-  A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. In In NIPS, pages 1–4, 2017.
-  S. J. Reddi, S. Kale, and S. Kumar. On the convergence of adam and beyond. In In ICLR, pages 1–23, 2017.
-  T. M. Quan, D. G. C. Hildebrand, and W. K. Jeong. Fusionnet: A deep fully residual convolutional neural network for image segmentation in connectomics. arXiv: 1612.05360, 2016.
-  S. Jégou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio. The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In In IEEE CVPRW, pages 1175–1183, 2017.
J. S. Cardoso, I. Domingues, and H. P. Oliveira.
Closed shortest path in the original coordinates with an application
to breast cancer.
Int. J. Pattern Recognit. Artif. Intell, 29(1):1555002, 2015.
N. Dhungel, G. Carneiro, and A. P. Bradley.
Automated mass detection in mammograms using cascaded deep learning and random forests.In In IEEE DICTA, pages 1–8, 2015.
-  W. Qin, J. Wu, Y. Yuan, W. Zhao, B. Ibragimov, J. Gu, and L. Xing. Superpixel-based and boundary-sensitive convolutional neural network for automated liver segmentation. Phys. Med. Biol., 63(9):095017, 2018.
-  X. Wang, R. Girshick, A. Gupta, and K. He. Non-local neural networks. In In CVPR, pages 7794–7803, 2018.
-  R. K. Samala, H. P. Chan, L. M. Hadjiiski, K. Cha, and M. A. Helvie. Deep-learning convolution neural network for computer-aided detection of microcalcifications in digital breast tomosynthesis. In In Proc. SPIE, volume 9785, pages 97850Y–1–7, 2016.
-  J. Wu, Y. Cui, X. Sun, G. Cao, B. Li, D. M. Ikeda, A. W. Kurian, and R. Li. Unsupervised clustering of quantitative image phenotypes reveals breast cancer subtypes with distinct prognoses and molecular pathways. Clin. Cancer Res., 23(13):3334–3342, 2017.
-  J. Wu, X. Li, X. Teng, D. L. Rubing, S. Napel, B. L. Daniel, and R. Li. Magnetic resonance imaging and molecular features associated with tumor-infiltrating lymphocytes in breast cancer. Breast Cancer Res., 20(1):101, 2018.
-  J. Wu, B. Li, G. Cao, D. L. Rubin, S. Napel, D. M. Ikeda, A. W. Kurian, and R. Li. Heterogeneous enhancement patterns of tumor-adjacent parenchyma at mr imaging are associated with dysregulated signaling pathways and poor survival in breast cancer. Radiology, 285(2):401–413, 2017.
-  J. Wu, G. Cao, X. Sun, J. Lee, D. L. Rubin, S. Napel, A. W. Kurian, B. L. Daniel, and R. Li. Intratumoral spatial heterogeneity at perfusion mr imaging predicts recurrence-free survival in locally advanced breast cancer treated with neoadjuvant chemotherapy. Radiology, 288(1):26–35, 2018.