Breast cancer is one of the most frequently diagnosed mortal diseases for women all over the world . Mammography is extensively applied and computer-aided diagnosis systems (CADs) are often employed as a second reader. Leveraging the recent success of deep neural networks on representation learning, deep learning based CADs [7, 20, 16, 17, 11, 12] outperform traditional methods, which rely heavily on handcrafted features. However, two major challenges in mammographic CADs remain (1): limited access to well annotated data  and (2): the similarity between benign and cancerous masses. To alleviate the impact of inadequate data, [7, 20, 11, 12] applied classical geometric transformations for data augmentation (e.g. flips, rotations, random crops etc), and more recently, [16, 17] generated synthetic images on the manifold of real mammograms using adversarial learning , which enjoys a powerful ability to learn the unknown underlying distribution. Unfortunately, the following questions remain unanswered: What kind of data augmentation is most helpful for CADs in mammography? How can we alleviate the impact of the similarity between data, i.e., how can we maximize the margin between manifolds with a small difference?
In this paper, we propose a new deep learning framework that improves mammography diagnosis as follows. Firstly, we propose an adversarial data augmentation strategy, in which both positive and negative samples of specific classes are generated in an unsupervised manner, in order to make more distinct boundaries between different classes. After that, we build a signed graph Laplacian over the augmented data to quantitatively capture the geometric structure of data. Finally, we train a deep neural network by jointly optimizing the graph regularization and classification loss, by which the intra-class difference is minimized, and more importantly, the inter-class manifold margin is maximized in the deep representation space. Extensive experiments show that the proposed DiagNet outperforms the state-of-the-art of breast masses diagnosis in mammography.
2.1 Adversarial learning
Adversarial learning is a technique that attempts to fool models through malicious input  and has achieved impressive results in representation learning. The key idea of its success is to force the output of the generator to be indistinguishable from the real input, which is derived from a Generative Adversarial Network (GAN) . Adversarial training is particularly powerful for image generation, and for learning unknown and complicated distributions from the training data. In this paper, we propose to use adversarial learning to generate off-distribution instances along with on-distribution instances in order to enlarge the medical image data.
2.2 Manifold Learning
In real applications, data typically reside on a low-dimensional manifold embedded into a high-dimensional ambient space . Manifold learning is extensively explored because of its effectiveness for preserving the topological locality, which relies on the assumption that neighbors tend to have the same labels 
. In this paper, we aim to incorporate graph embedding into a deep neural network as a regularizer in the latent space. In addition, local data manifold structure preservation within the hidden representations in deep neural networks offers the possibility of improving the performance of the classifier.
3 Proposed Method
In this section, we formally introduce the details of DiagNet, which is composed of three steps as shown in Fig.1: (1) adversarial augmentation, (2) a signed graph Laplacian built upon the augmented data and (3) joint optimization of the classifier loss and signed graph regularizer. We first define the notation applied throughout the paper. Let be the mammograms with corresponding labels, where is an image sample and is the class label. Let denote the -th class data.
3.1 Adversarial Augmentation
As also mentioned in section 1, inadequate data and the similarity between benign and cancerous masses  are two main reasons causing high false positives in mammographic CADs. Recently, [1, 17, 16] employed GANs to create new instances. Even though they generated on-distribution samples that are not separable by discriminators, they ignored the importance of distinguishable but similar instances, which tend to improve the discriminative ability. To overcome this shortcoming, as shown in Fig.1, we propose to use adversarial learning to generate more instances of both positive neighbors (i.e. instances on the manifold, e.g. and ) and negative neighbors (i.e. instances off the manifold, e.g. and ). Here, there are defined two manifolds: for benign images and for malignant images.
In particular, inspired by , we generate neighboring instances one by one for a certain data class , , where in this paper. Specifically, both positive and negative neighbors are generated based on the noise corrupted seed points (a number of randomly selected samples in ) and they are both close to the original data points. In particular, the positive neighbors are the generated samples that cannot be separated from by a discriminator, while the negative neighbors are the ones that can be separated. Finally, the expanded dataset for class is of the form , and the whole dataset is .
Let be a desired new sample and
be the probability thatis classified as class by a discriminator trained on . Similarly corresponds to a discriminator trained on . Note that and are initialized as empty. In this paper, we trained two SVM classifiers as the discriminators and the corresponding output probability is obtained with logistic sigmoid of the output signed distance. Accordingly, a set of neighboring instances of are iteratively generated. In each iteration , the discriminator is learned and the weights are updated. After iterations of training, we select one desired positive neighbor :
Similarly, we select one desired negative neighbor , with an added distance restriction to force new points to be scattered close to :
where is a distance measure, is a weighting factor, and the radius parameters , and are positive constants.
3.2 Signed graph Laplacian regularizer
Graph embedding trained with distributional context can boost performance in various pattern recognition tasks. In this paper, we aim to incorporate the signed graph Laplacian regularizer to learn a discriminative datum representation by a deep neural network, where discriminative here means that the intra-class data manifold structure is preserved in the latent space and the inter-manifold (slightly different) margins are maximized.
Using the supervision of the adversarial augmentation in section 3.1, we build a signed graph upon the expanded data . Given for class , and all other classes data , for , the corresponding elements in the signed graph is built as follows:
where the () denotes the corresponding () nearest neighborhood of to approximate the locality of the manifold.
Then, we compute the structure preservation in the deep representation space (directly behind the softmax layer as shown in Fig.1) , where . The signed graph Laplacian regularizer is defined as following:
where is a distance metric for the dissimilarity between and .
Note that instead of calculating the manifold embedding by solving an eigenvalue decomposition, we learn the embeddingby a deep neural network. Specifically, inspired by the depth-wise separable convolutions  that are extensively employed to learn mappings with a series of factoring filters, we build stacks of depth-wise separable convolutions with similar topological architecture to that in  to learn such deep representations (Fig.1).
Therefore, by minimizing (4), it is expected that if two connected nodes and are from the same class (i.e. is positive), and are also close to each other, and vice versa. Benefiting from such learned discriminativity, we train a simple softmax classifier to predict the class label, i.e.,
where when , and is the parameter set of the neural network.
4.1 Datasets and ROIs selection
The DiagNet is evaluated on the most frequently used full-field digital mammographic dataset, INbreast 
. 107 mass contained mammograms are divided into a training and a test set containing 80% and 20% of the images respectively. As for ROIs selection, rectangular mass-contained boxes are selected with proportional padding (times) upon original ROI bounding boxes. The selected ROIs are augmented with flips and further adversarially augmented by 40% more (20% positive neighbors and 20% negative neighbors).
4.2 Implementation Details
We first solve the proposed adversarial augmentation in (1) and (2) by the derivative-free optimization approach RACOS algorithm . The distance measure in (1) and (2) is set to be the angular cosine distance because of its superior discriminative information . Let , then we set the radius parameters , and for . Further and is .
Secondly, the signed graph is built upon augmented data . For each graph node, and in (3) are optimally chosen as 1 and 4 respectively using grid search. In addition, the metric in (4) is also the angular cosine distance and is 1.
Finally, the deep neural network is built with stacks of kernel-sized separable convolutional layers. The first three blocks are equipped with increasing feature maps (128, 256, 728) and decreasing spatial squared size (, , ), and the consecutive seven blocks keep the same feature map with size
. After global averaging and three fully connected layers of 1024 neurons, a softmax layer is padded for label prediction. Dropout layers withdropout rate and weight decay with norm rate are used to prevent over-fitting. Residual skips are added in order to solve the gradient diverging and vanishing problems. The regularization parameter in (6) is optimally chosen as .
4.3 Results and analysis
Adversarial Augmentation: To examine the quality of generated images by the proposed adversarial augmentation strategy, we carry out the experiment on the INbreast dataset. Fig.2 visually shows the augmented examples. It can be seen that, for either mass type, the generated positive and negative neighbors are both similar to the original data, but the negative neighbors are more different.
Compare to the state-of-art: We validate DiagNet’s performance with accuracy and AUC (area under the ROC curve) scores. Table.1 compares the state-of-art algorithms, in which  is re-implemented and the results of the remaining ones are taken from the original papers. It shows that, the DiagNet has achieved the state-of-art with mean accuracy 93.4% and AUC score 0.95. When compared with the second best algorithm , the DiagNet
’s AUC score is significantly higher with experiments on the whole dataset without any pre-processing, post-processing or transfer learning. In addition, empirical observations show that our model is robust to noise and geometric transforms, and these results are omitted due to the space limitation.
|(2012) Domingues et. al ||✕||89%||N/A|
|(2016) Dhungel et. al ||✓||91%||0.76|
|(2017) Zhu et. al ||✓||90%||0.89|
|(2018) Shams et. al ||✓||93%||0.92|
|(2019) Li et. al ||✓||88%||0.92|
Importance of Signed Graph Laplacian regularizer: Determining the optimal values of hyper-parameter is a big challenge in deep learning. To explore DiagNet’s performance with different signed graph configurations, the values of and are first grid searched with fixed regularization parameter , as shown in Fig.3(a). The best performance occurs when and , which increases at least by 8% the accuracy rate and by 12% the AUC score compared to the baseline (no graph regularization, ). This confirms the effectiveness of using the signed graph regularization. In addition, results show that the DiagNet achieves good performance only when both and are considered in the corresponding singed graph construction. Fig.3 shows the performances with various values of , where the best result occurs at .
In this paper, we proposed a DiagNet for improved mammogram image analysis. By integrating the signed graph regularizer and the adversarial sampling augmentation, DiagNet works in a simple but effective way to learn discriminative features. Extensive experiments show that our method outperforms state-of-the-art on breast mass diagnosis in mammography.
-  Antoniou, A., Storkey, A., Edwards, H.: Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340 (2017)
-  Chen, D., Lv, J., Davies, M.E.: Learning discriminative representation with signed Laplacian restricted Boltzmann machine. arXiv preprint arXiv:1808.09389 (2018)
Chen, D., Lv, J., Yi, Z.: Unsupervised multi-manifold clustering by learning deep representation. In: Workshops at the 31th AAAI conference on artificial intelligence (AAAI). pp. 385–391 (2017)
Chen, D., Lv, J., Yi, Z.: Graph regularized restricted boltzmann machine. IEEE Transactions on Neural Networks and Learning Systems29(6), 2651–2659 (2018)
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1251–1258 (2017)
-  DeSantis, C., Ma, J., Bryan, L., Jemal, A.: Breast cancer statistics, 2013. CA: a cancer journal for clinicians 64(1), 52–62 (2014)
Dhungel, N., Carneiro, G., Bradley, A.P.: The automated learning of deep features for breast mass classification from mammograms. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 106–114. Springer (2016)
-  Domingues, I., Sales, E., Cardoso, J., Pereira, W.: INbreast-database masses characterization. XXIII CBEB (2012)
-  Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in neural information processing systems. pp. 2672–2680 (2014)
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial machine learning at scale (2017)
-  Li, H., Chen, D., Nailon, W.H., Davies, M.E., Laurenson, D.: A deep dual-path network for improved mammogram image processing. International Conference on Acoustics, Speech and Signal Processing (2019)
-  Li, H., Chen, D., Nailon, W.H., Davies, M.E., Laurenson, D.: Improved breast mass segmentation in mammograms with conditional residual U-Net. In: Image Analysis for Moving Organ, Breast, and Thoracic Images, pp. 81–89. Springer (2018)
-  Moreira, I.C., Amaral, I., Domingues, I., Cardoso, A., Cardoso, M.J., Cardoso, J.S.: INbreast: toward a full-field digital mammographic database. Academic radiology 19(2), 236–248 (2012)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10). pp. 807–814 (2010)
-  Seung, H.S., Lee, D.D.: The manifold ways of perception. science 290(5500), 2268–2269 (2000)
-  Shams, S., Platania, R., Zhang, J., Kim, J., Park, S.J.: Deep generative breast cancer screening and diagnosis. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 859–867. Springer (2018)
-  Wu, E., Wu, K., Cox, D., Lotter, W.: Conditional infilling GANs for data augmentation in mammogram classification. In: Image Analysis for Moving Organ, Breast, and Thoracic Images, pp. 98–106. Springer (2018)
-  Yu, Y., Qian, H., Hu, Y.Q.: Derivative-free optimization via classification. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
-  Yu, Y., Qu, W.Y., Li, N., Guo, Z.: Open-category classification by adversarial sample generation. International Joint Conference on Artificial Intelligence (2017)
-  Zhu, W., Lou, Q., Vang, Y.S., Xie, X.: Deep multi-instance networks with sparse label assignment for whole mammogram classification. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 603–611. Springer (2017)