1 Introduction
According to the American Cancer Society, breast cancer is the most frequently diagnosed solid cancer and the second leading cause of cancer death among U.S. women. Mammogram screening has been proven to be an effective way for early detection and diagnosis, which significantly decrease breast cancer mortality. Mass segmentation provides morphological features, which play crucial roles for diagnosis.
Traditional studies on mass segmentation rely heavily on handcrafted features. Modelbased methods build classifiers and learn features from masses
[1, 2]. There are few works using deep networks for mammogram [3]. Dhungel et al. employed multiple deep belief networks (DBNs), Gaussian mixture model (GMM) classifier and a priori as potential functions, and structured support vector machine (SVM) to perform segmentation
[4]. They further used CRF with tree reweighted belief propagation to boost the segmentation performance [5]. A recent work used the output from a convolutional network (CNN) as a complimentary potential function, yielding the stateoftheart performance [6]. However, the twostage training used in these methods produces potential functions that easily overfit the training data.In this work, we propose an endtoend trained adversarial deep structured network to perform mass segmentation (Fig. 1). The proposed network is designed to robustly learn from a small dataset with poor contrast mammographic images. Specifically, an endtoend trained FCN with CRF is applied. Adversarial training is introduced into the network to learn robustly from scarce mammographic images. Different from DI2INAN using a generative framework [7], we directly optimize pixelwise labeling loss. To further explore statistical property of mass regions, a spatial priori is integrated into FCN. We validate the adversarial deep structured network on two public mammographic mass segmentation datasets. The proposed network is demonstrated to outperform other algorithms for mass segmentation consistently.
Our main contributions in this work are: (1) We propose an unified endtoend training framework integrating FCN+CRF and adversarial training. (2) We employ an endtoend network to do mass segmentation while previous works require a lot of handdesigned features or multistage training. (3) Our model achieves the best results on two most commonly used mammographic mass segmentation datasets.
2 FCNCRF Network
Fully convolutional network (FCN) is a commonly used model for image segmentation, which consists of convolution, transpose convolution, or pooling [8]
. For training, the FCN optimizes maximum likelihood loss function
(1) 
where is the label of th pixel in the th image , is the number of training mammograms, is the number of pixels in the image, and is the parameter of FCN. Here the size of images is fixed to and is 1,600.
CRF is a classical model for structured learning, well suited for image segmentation. It models pixel labels as random variables in a Markov random field conditioned on an observed input image. To make the annotation consistent, we use
to denote the random variables of pixel labels in an image, where . The zero denotes pixel belonging to background, and one denotes it belonging to mass region. The Gibbs energy of fully connected pairwise CRF is [9](2) 
where unary potential function is the loss of FCN in our case, pairwise potential function defines the cost of labeling pair ,
(3) 
where label compatibility function is given by the Potts model in our case, is the learned weight, pixel values and positions can be used as the feature vector , is the Gaussian kernel applied to feature vectors [9],
(4) 
Efficient inference algorithm can be obtained by mean field approximation [9]. The update rule is
(5)  
where the first equation is the message passing from label of pixel to label of pixel , the second equation is reweighting with the learned weights , the third equation is compatibility transformation, the fourth equation is adding unary potentials, and the last step is normalization. Here denotes background or mass. The initialization of inference employs unary potential function as
. The mean field approximation can be interpreted as a recurrent neural network (RNN)
[10].3 Adversarial FCNCRF Nets
The shape and appearance priori play important roles in mammogram mass segmentation [11, 6]. The distribution of labels varies greatly with position in the mammographic mass segmentation. From observation, most of the masses are located in the center of region of interest (ROI), and the boundary areas of ROI are more likely to be background (Fig. 2(a)).
The conventional FCN provides independent pixelwise predictions. It considers global class distribution difference corresponding to bias in the last layer. Here we employ a priori for position into consideration
(6) 
where is the empirical estimation of mass varied with the pixel position , and
is the predicted mass probability of conventional FCN. In the implementation, we added an image sized bias in the softmax layer as the empirical estimation of mass for FCN to train network. The
is used as the unary potential function for in the CRF as RNN. For multiscale FCN as potential functions, the potential function is defined as , where is the learned weight for unary potential function, is the potential function provided by FCN of each scale.Adversarial training provides strong regularization for deep networks. The idea of adversarial training is that if the model is robust enough, it should be invariant to small perturbations of training examples that yield the largest increase in the loss (adversarial examples [12]). The perturbation can be obtained as . In general, the calculation of exact is intractable especially for complicated models such as deep networks. The linear approximation and norm box constraint can be used for the calculation of perturbation as , where . For adversarial FCN, the network predicts label of each pixel independently as . For adversarial CRF as RNN, the prediction of network relies on mean field approximation inference as .
The adversarial training forces the model to fit examples with the worst perturbation direction. The adversarial loss is
(7) 
In training, the total loss is defined as the sum of adversarial loss and the empirical loss based on training samples as
(8) 
where is the regularization factor for , is either mass probability prediction in the FCN or a posteriori approximated by mean field inference in the CRF as RNN for the th image .
4 Experiments
We validate the proposed model on two most commonly used public mammographic mass segmentation datasets: INbreast [13] and DDSMBCRP dataset [14]. We use the same ROI extraction and resize principle as [4, 6, 5]. Due to the low contrast of mammograms, image enhancement technique is used on the extracted ROI images as the first 9 steps in [15], followed by pixel position dependent normalization. The preprocessing makes training converge quickly. We further augment each training set by flipping horizontally, flipping vertically, flipping horizontally and vertically, which makes the training set 4 times larger than the original training set.
For consistent comparison, the Dice index metric is used to evaluate segmentation performance and is defined as . For a fair comparison, we reimplement a twostage model [6], and obtain similar result (Dice index ) on the INbreast dataset.

[noitemsep]

FCN is the network integrating a position priori into FCN (denoted as FCN 1 in Table 1).

Adversarial FCN is FCN with adversarial training.

Joint FCNCRF is the FCN followed by CRF as RNN with an endtoend training scheme.

Adversarial FCNCRF is the Jointly FCNCRF with endtoend adversarial training.

MultiFCN, Adversarial multiFCN, Joint multiFCNCRF, Adversarial multiFCNCRF employ 4 FCNs with multiscale kernels, which can be trained in an endtoend way using the last prediction.
The prediction of MultiFCN, Adversarial multiFCN is the average prediction of the 4 FCNs. The configurations of FCNs are in Table 1. Each convolutional layer is followed by max pooling. The last layers of the four FCNs are all two
transpose convolution kernels with softmax activation function. We use hyperbolic tangent activation function in middle layers. The parameters of FCNs are set such that the number of each layer’s parameters is almost the same as that of CNN used in the work
[6]. We use Adam with learning rate 0.003. The is in the two datasets. The used in adversarial training are and for INbreast and DDSMBCRP datasets respectively. Because the boundaries of masses on the DDSMBCRP dataset are smoother than those on the INbreast dataset, we use larger perturbation . For the CRF as RNN, we use 5 time steps in the training and 10 time steps in the test phase empirically.


Net.  First layer  Second layer  Third layer 
FCN 1  conv.  
FCN 2  conv.  
FCN 3  conv.  
FCN 4  conv.  



Methodology  INbreast  DDSMBCRP  



88  N/A  

N/A  70  

88  87  

89  89  

90  90  


FCN  89.48  90.21  

89.71  90.78  

89.78  90.97  

90.07  91.03  

90.47  91.17  

90.71  91.20  

90.76  91.26  

90.97  91.30  

The INbreast dataset is a recently released mammographic mass analysis dataset, which provides more accurate contours of lesion region and the mammograms are of high quality. For mass segmentation, the dataset contains 116 mass regions. We use the first 58 masses for training and the rest for test, which is of the same protocol as [4, 6, 5]. The DDSMBCRP dataset contains 39 cases (156 images) for training and 40 cases (160 images) for testing [14]. After ROI extraction, there are 84 ROIs for training, and 87 ROIs for test. We compare schemes with other recently published mammographic mass segmentation methods in Table 2.
Table 2 shows the CNN features provide superior performance on mass segmentation, outperforming handcrafted feature based methods [2, 1]. Our enhanced FCN achieves 0.25% Dice index improvement than the traditional FCN on the INbreast dataset. The adversarial training yields 0.4% improvement on average. Incorporating the spatially structured learning further produces 0.3% improvement. Using multiscale model contributes the most to segmentation results, which shows multiscale features are effective for pixelwise classification in mass segmentation. Combining all the components together achieves the best performance with 0.97%, 1.3% improvement on INbreast, DDSMBCRP datasets respectively. The possible reason for the improvement is adversarial scheme eliminates the overfitting.We calculate the pvalue of McNemar’s ChiSquare Test to compare our model with [6] on the INbreast dataset. We obtain pvalue , which shows our model is significantly better than model [6].
To better understand the adversarial training, we visualize segmentation results in Fig. 3. We observe that the segmentations in the second and fourth rows have more accurate boundaries than those of the first and third rows. It demonstrates the adversarial training improves FCN and FCNCRF.
We further employ the prediction accuracy based on trimap to specifically evaluate segmentation accuracy in boundaries [16]. We calculate the accuracies within trimap surrounding the actual mass boundaries (groundtruth) in Fig. 4. Trimaps on the DDSMBCRP dataset is visualized in Fig. 2(b). From the figure, accuracies of Adversarial FCNCRF are 23 % higher than those of Joint FCNCRF on average and the accuracies of Adversarial FCN are better than those of FCN. The above results demonstrate that the adversarial training improves the FCN and Joint FCNCRF both for whole image and boundary region segmentation.
5 Conclusion
In this work, we propose an endtoend adversarial FCNCRF network for mammographic mass segmentation. To integrate the priori distribution of masses and fully explore the power of FCN, a position priori is added to the network. Furthermore, adversarial training is used to handle the small size of training data by reducing overfitting and increasing robustness. Experimental results demonstrate the superior performance of adversarial FCNCRF on two commonly used public datasets.
References
 [1] M. Beller et al., “An examplebased system to support the segmentation of stellate lesions,” Springer, 2005.
 [2] J. S Cardoso et al., “Closed shortest path in the original coordinates with an application to breast cancer,” IJPRAI, 2015.
 [3] W. Zhu et al, “Deep multiinstance networks with sparse label assignment for whole mammogram classification,” MICCAI, 2017.
 [4] N. Dhungel et al., “Deep structured learning for mass segmentation from mammograms,” in ICIP. IEEE, 2015.

[5]
N. Dhungel et al.,
“Tree reweighted belief propagation using deep learning potentials for mass segmentation from mammograms,”
in ISBI. IEEE, 2015.  [6] N. Dhungel et al., “Deep learning and structured prediction for the segmentation of mass in mammograms,” in MICCAI, 2015.
 [7] D. Yang et al., “Automatic liver segmentation using an adversarial imagetoimage network,” in MICCAI. Springer, 2017.
 [8] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, 2015.
 [9] P. Krähenbühl and V. Koltun, “Efficient inference in fully connected crfs with gaussian edge potentials,” in NIPS, 2011.
 [10] S. Zheng et al., “Conditional random fields as recurrent neural networks,” in ICCV, 2015.
 [11] M. Jiang et al., “Mammographic mass segmentation with online learned shape and appearance priors,” in MICCAI, 2016.
 [12] C. Szegedy et al., “Intriguing properties of neural networks,” ICLR, 2014.
 [13] I. C Moreira et al., “Inbreast: toward a fullfield digital mammographic database,” Academic radiology, 2012.
 [14] M. Heath et al., “Current status of the digital database for screening mammography,” in Digital mammography. 1998.
 [15] J. Ball et al., “Digital mammographic computer aided diagnosis using adaptive level set segmentation,” in EMBI, 2007.
 [16] P. Kohli et al., “Robust higher order potentials for enforcing label consistency,” IJCV, 2009.