adversarialdeepstructuralnetworks
Adversarial Deep Structural Networks for Mammographic Mass Segmentation https://arxiv.org/abs/1612.05970
view repo
Mass segmentation is an important task in mammogram analysis, providing effective morphological features and regions of interest (ROI) for mass detection and classification. Inspired by the success of using deep convolutional features for natural image analysis and conditional random fields (CRF) for structural learning, we propose an endtoend network for mammographic mass segmentation. The network employs a fully convolutional network (FCN) to model potential function, followed by a CRF to perform structural learning. Because the mass distribution varies greatly with pixel position, the FCN is combined with position priori for the task. Due to the small size of mammogram datasets, we use adversarial training to control overfitting. Four models with different convolutional kernels are further fused to improve the segmentation results. Experimental results on two public datasets, INbreast and DDSMBCRP, show that our endtoend network combined with adversarial training achieves thestateoftheart results.
READ FULL TEXT VIEW PDFAdversarial Deep Structural Networks for Mammographic Mass Segmentation https://arxiv.org/abs/1612.05970
According to the American Cancer Society, breast cancer is the most frequently diagnosed solid cancer and the second leading cause of cancer death among U.S. women [1]. Mammogram screening has been demonstrated to be an effective way for early detection and diagnosis, which can significantly decrease breast cancer mortality [18]. Mass segmentation provides morphological features, which play crucial roles for diagnosis.
Traditional studies on mass segmentation rely heavily on elaborate human designed features. Modelbased methods build classifiers and learn the features from the mass
[3, 4]. There are few works using deep networks to process the mammogram [12, 9, 23]. Dhungel et al. employed multiple deep belief networks (DBNs), GMM classifier and a priori as potential functions, and structural SVM to perform segmentation
[6]. They also used CRF with tree reweighted belief propagation to boost the segmentation results [7]. A recent work used the output from a CNN as a complimentary potential function, yielding the stateoftheart performance [5]. However, the twostage training used in these methods produces potential functions that can easily overfit training data.Inspired by the power of deep networks [22, 24], we propose an endtoend trained adversarial deep structural network to perform mass segmentation (Fig. 1(a)). The proposed network is designed to robustly learn from a small dataset with poor contrast mammographic images. Specifically, an endtoend trained fully convolution network (FCN) with CRF is applied. Adversarial training is introduced into the network to learn robustly from scarce mammographic images. To further explore statistical property of mass regions, the spatial priori with categorical distributions considering the positions are added into FCN. We validate our adversarial deep structural network on two public mammographic mass segmentation datasets. The proposed network is demonstrated to consistently outperform other algorithms for mass segmentation.
Our main contributions in this paper are: (1) It is the first time to apply adversarial training to medical imaging. Integrating CNN+CRF and adversarial training into a unified endtoend training framework has not been attempted before. Both components are essential and necessary for achieving the stateoftheart performance. (2) We employ an endtoend trained network to do the mass segmentation while previous works needed a lot of handdesigned features or multistage training, such as calculating potential functions independently. (3) Our model achieves stateoftheart results on two commonly used mammographic mass segmentation datasets.
Fully convolutional network (FCN) is a successful model for image segmentation, which preserves the spatial structure of predictions [16]. FCN consists of convolution, deconvolution [20]
, or maxpooling in each layer. For training, the FCN optimizes maximum likelihood loss function
, where is the label of th pixel in the th image , is the number of training mammograms, is the number of pixels in the image, and is the parameter of FCN. Here the size of images is fixed to and is 1,600.CRF is a commonly used method for structural learning, well suited for image segmentation. It models pixel labels as random variables in a Markov random field conditioned on an observed input image. To make the annotation consistent, we use
to denote the random variables of pixel labels in an image, where . The zero denotes pixel belonging to background, and one denotes it belonging to mass region. The Gibbs energy of fully connected pairwise CRF [15] is , where unary potential function is the loss of FCN in our case, pairwise potential function defines the cost of labeling pair . The pairwise potential function can be defined as(1) 
where label compatibility function is given by the Potts model in our case,
is the Gaussian kernel applied to feature vectors
[15], is the learned weight. Pixel values and positions can be used as the feature vector .Efficient inference algorithm can be obtained by mean field approximation [15]. The update rule is
(2)  
where the first line is the message passing from label of pixel to label of pixel , the second line is reweighting with the learned weights , the third line is compatibility transform, the fourth line is adding unary potentials, and the last step is normalization operator. Here denotes background or mass. The initialization of inference employs unary potential function as
. The above mean field approximation can be interpreted as a recurrent neural network (RNN) in Fig.
1(b)[21].Shape and the appearance priori play an important role in mammogram mass segmentation [11, 5]. The distribution of labels varies greatly with a position in the mammographic mass segmentation. From observation, most of the mass is located in the center of ROI, and the boundary of ROI is more likely to be background (Fig. 2(a)).
The conventional FCN provides predictions for pixels independently. It only considers global class distribution difference corresponding to the number of filters (channels) in the last layer. Here we take the categorical priori of different positions into consideration and add it into the FCN as , where is the categorical priori distribution varied with the pixel position , and is the output of conventional FCN. In the implementation, we assigned the bias of last layer as the average image to train network. The is used as the unary potential function for in the CRF as RNN. For multiple FCNs as potential functions, the potential function is defined as , where is the learned weight for unary potential function, is the potential function provided by one FCN.
Adversarial training provides strong regularization for deep networks [8]. The idea of adversarial training is that if the model is robust enough, it should be invariant to small perturbations of training examples that yield the largest increase in the loss (adversarial examples [19]). The perturbation for adversarial example can be obtained as . In general, the calculation of exact is intractable because the exact minimization is not solvable w.r.t. , especially for complicated models such as deep networks. The linear approximation and norm box constraint can be used for the calculation of perturbation [8] as , where . For adversarial fully convolutional network, the network predicts label of each pixel independently as . For adversarial CRF as RNN, the prediction of network relies on mean field approximation inference as .
The adversarial training forces the model to fit examples with the worst perturbation as well. The adversarial loss is defined as
(3) 
In training, the total loss is defined as the sum of adversarial loss and the empirical loss based on training samples as
(4) 
where is the regularization factor used to avoid overfitting, is either prediction in the enhanced FCN or posteriori approximated by mean field inference in the CRF as RNN for the th image . The regularization term is used only for the parameters in CRF.
We validate the proposed model on two publicly and most frequently used mammographic mass segmentation datasets: INBreast dataset [17] and DDSMBCRP dataset [10]. We use the same ROI extraction and resize principle as [6, 5, 7]. Due to the low contrast of mammographic images, image enhancement technique is used on the extracted ROI images as the first 9 steps of enhancement [2], followed by pixel position dependent normalization. The preprocessing makes training converge quickly. We further augment each training set by flipping horizontally, flipping vertically, flipping horizontally and vertically, which makes the training set 4 times larger than the original training set.
For consistent comparison, the Dice index metric is used for the segmentation results and is defined . For a fair comparison, we also validate the Deep Structure Learning + CNN [5] on our processed data, and obtain similar result (Dice index ) on the INBreast dataset. To investigate the impact of each component in our model, we conduct extensive experiments under different configurations. FCN is the network integrating a position priori into FCN (structure denoted as FCN 1 in Tab. 1). We use the enhanced FCN rather than the conventional FCN in all experiments. Adversarial FCN is FCN with adversarial training. Jointly Trained FCNCRF is the FCN followed by CRF as RNN with an endtoend training scheme. Jointly Trained Adversarial FCNCRF is the Jointly Trained FCNCRF with endtoend adversarial training. MultiFCN, Adversarial MultiFCN, Jointly Trained MultiFCNCRF, Jointly Trained Adversarial MultiFCNCRF are those networks with 4 FCNs. The configuration of FCN and other used three subnetworks in the MultiFCN are in Table 1. The last layers of the four networks are all two
deconvolutional filters with softmax activation function. We use hyperbolic tangent activation function in middle layers. The parameters of FCNs are set such that the number of each layer’s parameters is almost the same as that of CNN used in the work
[5]. For optimization, we use Adam algorithm [13] with learning rate 0.003. The used for weights of CRF as RNN is in the two datasets. The used in adversarial training are and for INBreast and DDSMBCRP datasets respectively, because the boundaries of masses on the DDSMBCRP dataset are smoother than those on the INbreast dataset. For mean field approximation or the CRF as RNN, we use 5 iterations/time steps in the training and 10 iterations/time steps in the test phase.


Net.  First layer  Second layer  Third layer 
FCN 1  conv., max pool.  conv., max pool.  conv. 
FCN 2  conv., max pool.  conv., max pool.  conv. 
FCN 3  conv., max pool.  conv., max pool.  conv. 
FCN 4  conv., max pool.  conv., max pool.  conv. 



Methodology  INBreast Dice(%)  DDSMBCRP Dice(%)  



88  N/A  

N/A  70  

88  87  

89  89  

90  90  


FCN  89.48  90.21  

89.71  90.78  

89.78  90.97  

90.07  91.03  

90.47  91.17  

90.71  91.20  

90.76  91.26  

90.97  91.30  

The INBreast dataset is a recently released mammographic mass analysis dataset, which provides more accurate contours of lesion region and the mammograms are of high quality. For mass segmentation, the dataset contains 116 mass regions. We use the first 58 masses for training and the rest for test, which is of the same protocol used in these works [6, 5, 7]. The DDSMBCRP dataset contains 39 cases (156 images) for training and 40 cases (160 images) for testing [10]. After ROI extraction, there are 84 ROIs for training, and 87 ROIs for test. We compare our schemes with other recently published mammographic mass segmentation methods [4, 6, 7, 5] in Table 2.
Table 2 shows that the successfully used CNN features in natural image provide superior performance on medical image analysis, outperforming handcrafted feature based methods [4, 3]. Our enhanced FCN achieves 0.25% Dice index improvement than the traditional FCN on the INBreast dataset. The adversarial training yields 0.4% improvement on average. Incorporating the spatially structural constraint further produces 0.3% improvement. Using model average or multiple potential functions contributes the most to segmentation results which is consistent with work showing that the best model requires five different unary potential functions [5]. Combining all the components together achieves the best performance with relative 9.7%, 13% improvement on INBreast, DDSMBCRP datasets respectively. In our experiment, the FCN overfits heavily on the training set and can even achieve above 98.60% Dice index. It might explains why the twostage training cannot boost the performance too much. The adversarial training works effectively as a regularization to reduce the overfitting. We believe that the overfitting is mainly caused by the small training set size and we strongly support the creation of a large mammographic analysis dataset to accelerate mammogram analysis research.
We calculate the pvalue of McNemar’s ChiSquare Test to compare our model with the method [5] on the INBreast dataset. The total number of pixels is 92,800. The numbers of pixels classified right and wrong for both models are 76,130 and 8,805, respectively. The number of pixels classified right by only using our model is 4,595. The number of pixels classified right by using model [5] is 3,270. We obtain pvalue , which shows our model is significantly better than model [5].
To further understand the adversarial training, we visualize segmentation results in Fig. 3
. We observe that segmentations in the first row have vague borders and many outliers within the predicted borders. The segmentations in the second row have fewer vague borders and fewer outliers than the predictions in the first row. The results in the last two rows have sharper and more accurate borders than the first two rows. It demonstrates that the CRF based methods achieves better segmentations on the test sets. The structural learning using CRF eliminates outliers within borders effectively, which makes better segmentation results and more accurately predicted borders.
We further employ the metric based on the trimap to specifically evaluate segmentation accuracy in boundaries [14]. We calculate the accuracies within trimap surrounding the actual mass boundaries (groundtruth) in Fig. 4. Trimaps on the DDSMBCRP dataset is visualized in Fig. 2(b). From the figure, accuracies of FCNCRF with Adversarial Training are 23 % higher than those of FCNCRF on average and the accuracies of FCN with Adversarial Training are better than those of FCN. The results demonstrate that the adversarial training regularization improves the FCN and FCNCRF both in the whole image (Dice Index metric) and around the boundaries.
In this paper, we propose an endtoend trained adversarial FCNCRF network for mammographic mass segmentation. To integrate the priori distribution of masses and fully explore the power of FCN, a position priori is added to the network. Furthermore, adversarial training is used to handle the small size of training data by reducing overfitting and increasing robustness. Experimental results demonstrate the stateoftheart performance of our model on the two most used public mammogram datasets.
Dhungel, N., Carneiro, G., Bradley, A.P.: Deep learning and structured prediction for the segmentation of mass in mammograms. In: MICCAI. Springer (2015)