COIN: Contrastive Identifier Network for Breast Mass Diagnosis in Mammography

12/29/2020 ∙ by Heyi Li, et al. ∙ 22

Computer-aided breast cancer diagnosis in mammography is a challenging problem, stemming from mammographical data scarcity and data entanglement. In particular, data scarcity is attributed to the privacy and expensive annotation. And data entanglement is due to the high similarity between benign and malignant masses, of which manifolds reside in lower dimensional space with very small margin. To address these two challenges, we propose a deep learning framework, named Contrastive Identifier Network (COIN), which integrates adversarial augmentation and manifold-based contrastive learning. Firstly, we employ adversarial learning to create both on- and off-distribution mass contained ROIs. After that, we propose a novel contrastive loss with a built Signed graph. Finally, the neural network is optimized in a contrastive learning manner, with the purpose of improving the deep model's discriminativity on the extended dataset. In particular, by employing COIN, data samples from the same category are pulled close whereas those with different labels are pushed further in the deep latent space. Moreover, COIN outperforms the state-of-the-art related algorithms for solving breast cancer diagnosis problem by a considerable margin, achieving 93.4% accuracy and 95.0% AUC score. The code will release on ***.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Breast cancer is widely acknowledged as the most frequently diagnosed cancer and the second fatal disease for women around the world [4]. Although no effective method has been discovered for prevention, mammography screening is advantageous to early breast mass diagnosis (BMD), which has practically increased the associated survival rates along with early treatments [39]. Screening mammography is particularly useful when tumours are invasive (measuring cm) and too small to be palpable or cause symptoms [13]. However, manual interpretations have been limited by wide variations in pathology and the potential fatigue of human experts [39]. Double reading is thereby employed in many western countries [3, 5], which has been proven to increase both sensitivity and specificity for the interpretations. In recent years, computer-assisted interventions have been designed and employed to benefit researchers and doctors as an alternative to a human double reader for an optimal healthcare [43, 36].

Fig. 1: The illustration of BMD challenges with an INbreast dataset example: - data scarcity and - data entanglement. Red stands for the 2D t-SNE [34] embedding of malignant masses and blue for that of benign lesions. The four images are corresponding mass examples.

I-a Classical Methods for BMD

Breast mass classification between benign and malignant lesions is one of the most important and challenging tasks for commercial computer aided diagnosis systems (CADs). This is not only because of the small proportion of cancerous cases among all screenings, but also due to their high similarities. This characteristic can be illustrated as Fig. 1, where benign and malignant masses are visually very similar as well as they embed in an intersected manner with t-SNE visualization [34]

. Although the speed of development in CADs has not been as rapid as that of medical imaging techniques, the situation has improved as machine learning approaches advancing

[24]. When dealing with the classification or diagnosis task, finding or learning distinctive features of cancerous masses and their surrounding tissues is the most important task, so that inherent regularities or patterns can be well described [39]. Traditionally, meaningful features were hand engineered by domain experts [45], which instill task-specific knowledge [26]. However, the major cons of this process is clear that engineers of machine learning have to exploit essential algorithms with the help from medical domain experts. Additionally, manual designed features may lead to strong bias for the training of the algorithm, resulting in limited performance [23], e.g. high false positive rate and low specificity [35].

I-B Deep Learning Methods for BMD

In recent years, owing to the success of deep neural networks (deep learning) [28] applied in various computer perception tasks [11], a noticeable shift from rule-based, problem specific solutions to increasingly generic, problem agnostic-based algorithms has been seen in mammographical CADs [6, 42, 52, 2, 27, 33, 8, 7]. Specifically, [2] and [27]

claimed that features extracted by a CNN can achieve better performance for breast mass discrimination, when compared to various hand-crafted features. However, passing through the bottleneck in lower dimension of classifying a mammographical mass is very difficult in CNN models, yielding imprecise predictions. This is not only because of the low signal-to-noise ratio of the screening images like other medical imaging modalities

[39], but breast masses in mammography are also suffered from two other major problems:

  • - Data Scarcity [32, 14], which is difficult to solve due to the issue of patients’ privacy and the tremendous workloads of annotation by human experts;

  • - Data entanglement. It is very challenging when compares to natural image recolonization problems, which is attributed to the small margin between benign and malignant data manifolds (Fig. 1).

The detailed recent efforts that have been made on these two major problems will be discussed in Sec. II.

I-C Our contribution

Based on all of the above observations, in this paper, we propose a new deep convolutional neural network, called

Contrastive Identifier Network (COIN

), in which the contrastive learning and manifold learning are integrated for breast mass classification (benign vs. malignant). In particular, we propose to employ the adversarial learning for data augmentation, so that both on- and off-manifold new samples with more distinctive features are created in an unsupervised fashion; We propose a novel triplet contrastive loss, which exploits the merit of the Signed similarity graph. In such a way, the locality of the manifold is approximated as the built deep network being trained. By incorporating these two methods into the deep neural network, we solve the manifold embedding problem by a learning process, instead of computing the expensive eigenvalue decomposition for standard graph spectral learning

[9]. By integrating these two methods, features discriminativity is improved in deep latent space (Fig. 3). In particular, data samples from the same class are pulled close, meanwhile those with different labels are pushed away in the deep latent space. Consequently, the intra-class difference is minimized, and more importantly, the inter-class manifold margin is maximized in the deep representation space. A preliminary version of this work appeared in [31]. This paper extends [31] by discussion and experiments so as to prove the effectiveness of our motivation for solving data scarcity (Q1) and data entanglement (Q2).

Ii Related Work

In this section, we will introduce the existing solutions and their limitations for the purpose of solving -Data Scarcity and -Data Entanglement.

Ii-a Approaches to

In order to alleviate the data scarcity problem, works in [14, 52, 30, 32] have applied classical affine or elastic transformations for data augmentation in mammography (e.g. flips, rotations, random crops, etc.). These methods are straightforward and effective for increasing the total amount of training data. However, the distributions of the generated samples are not clear. Generated samples from unknown distributions are likely to cause an even worse generalization [46]. Accordingly, adversarial learning [17] has been employed to generate synthetic images on the manifold of real mammograms, benefiting from the powerful ability to learn the underlying distribution implicitly without modeling the original data prior. So far, there is only one application on mammography has been noticed to automatically solve the breast mass classification problem [47], in which both benign and malignant mass-contained ROIs are created by a conditional generative adversarial net (GAN). However, the performance is less encouraging. Their experiments have shown a limited AUC score improvement, when compared to conventional augmentation methods [47]. This is potentially because GAN-based augmentations disregard the importance of off-distribution samples, that locate closely to the real data manifold [51]. We believe these off-distribution samples may also play a very important role in increasing discriminativity while training the model.

Ii-B Approaches to

In order to mitigate the challenge of data entanglement, many efforts have been tried with CNNs for increasing the discriminativity of latent features in BCD prlblem. For example, some researchers have proposed the use of extracting segmentation-related features by CNNs, either with radiologists’ pixel-level annotations [14] or with the generated semantic masks from automatic segmentation algorithms [30]. This type of algorithms was originally inspired by the essential of shape and boundary hand-crafted features [39]

. Although these algorithms have improved diagnosis performance, they are typically complicated to construct, either due to their multiple-problem structures, multiple-phase training or large number of parameters. And these are especially challenging for medical experts. More recently, contrastive learning has shown great promise as a type of powerful discriminative approach in various types of computer vision models

[49, 21, 20, 19, 25]

. Nevertheless, this method has never been employed in any mammography-related problems as far as we acknowledge. In essence, the family of contrastive objective functions aim to enlarge the distances of feature vector pairs in the deep latent space by a self-supervised manner

[19]. Although feature vectors can be separated apart from each other by this technique, inherent structural and geometrical features of data are ignored, thus features in latent space cannot be enhanced across various classes. Manifold learning, on the other hand, can mitigate this dilemma by preserving the data topological locality [11]. It is widely employed as a non-linearly dimensionality reduction method, since data typically resides on a low-dimensional manifold embedded into a high-dimensional ambient space in real applications [41]. However, there are few approaches using manifold learning to solve classification problem. In fact, there are neither studies on manifold analysis for mammography nor using manifold learning to alleviate the high data similarity problem. Thereby, it is very meaningful to do some preliminary studies on using manifold learning for mammography screening diagnosis.

Iii Methodology

In this section, after discussing the notations and problem formulation utilized in this paper, we formally introduce the details of COIN, which consists of three steps as demonstrated in Fig. 3): 1) adversarial augmentation for mammography, 2) a signed graph Laplacian built upon the augmented data, 3) the proposed contrastive loss and the overall objective function. Additionally, we also present the details of constructing the deep network and corresponding implementation.

Iii-a Notations and Problem Formulation

Given a dataset , is a real-valued grayscale ROI, and is the corresponding mass diagnosis label. Note that each ROI contains only one mass cropped and resized into the fixed size from a certain mammogram, where and both equal to . With the defined dataset , let be the sub-dataset with samples from the -th category, where , and and are arbitrary data sample and its label in this sub-dataset.

Fig. 2: Augmented mass ROIs by Conditional GAN [22] (first row), and positive and negative neighbors by our proposed adversarial augmentation method in second and third row respectively.

The main targets solved by COIN can be formulated as follows: (1) Given a mass contained mammogram ROI, adversarial augmentation (discussed in Sec. III-B) is first employed for each mass category one by one, so that both on-distribution and off-distribution samples of each class are created: , where is positive (indistinguishable from the real masses in by the discriminator) and is negative (distinguishable by the discriminator). (2) For each mass category, with the expanded dataset , the local Signed graph is then constructed. (3) Based on the results of preceding two steps, contrastive loss is optimized within the localized built signed graph in the deep latent space, learning a nonlinear embedding in the deep latent space , where manifolds of two categories are maximized. Finally, the latent features are transformed into diagnosis label with a softmax function: .

Fig. 3: An illustration of the proposed COIN framework for BMD, which consists of three steps: adversarial augmentation, to build Signed graph, and the joint optimization. In the figure are samples on benign manifold and are on malignant manifold . In the first step (adversarial data augmentation), positive neighbors and are created with Eq. (2) for benign and malignant manifold, separately; and negative neighbors and are generated with either Eq. (3) for benign and malignant manifold, individually. After that, a signed graph is built upon both original and augmented samples as Eq. (4). Finally, the joint loss as Eq. (7) is optimized in the deep latent space, so that the margin between benign and malignant manifold are maximized.

Iii-B Adversarial Augmentation for Data Scarcity (Q1)

Augmentation Method Accuracy AUC
Baseline (no augmentation) 83% 0.85
Conventional augmentation 87% 0.88
CGAN augmentation [22] 88% 0.89
Proposed augmentation 89% 0.92
TABLE I: BMD performance by constructed deep CNN network with conventional and CGAN [22] augmentations on benchmark INbreast dataset.

Iii-B1 Motivation

As previously mentioned, data scarcity and the high resemblance across benign and cancerous categories of masses are the two major causes [14] why mammographical CADs are limited, typically with high false positive rates and low sensitivity. Recent studies [1, 47], and [42] have employed GANs to create new mammogram instances. In particular, Wu et al. [47] have proposed the use of infilling method, by which generated masses are synthesized in a normal mammogram tissue. By utilizing class-conditioned GAN, their new samples produced from the generator are forced to be on the same distribution of the original data. Yet, they have ignored the importance of surrounding tissues, where textures of blood vessel have imposed a vital role for diagnosing cancerous lesions. This can be the reason of limited improvement over affine augmentation method of their approach.

Thereby, it is natural to directly employ a conditional GAN [22] to create mass-contained ROIs either from benign or malignant classes, for the purpose of enlarging the size of training data and preserving the surrounding contextual features. Specifically, the generator in [22] maps an observed image from class and random noise

to the output estimation

, i.e. . The discriminator involves two mapping components: one is the distinguishing mapping , where

is the predicted probability of being a real data image; the other is a distance conditional guidance, by which the deep latent features of a created sample is mapped as those of the real data sample, i.e.

, where is the non-linear function learned by the CNN. As described in [22], the generator is constructed with an auto-encoder with skips and the discriminator applies a dual-path CNN architecture with VGG-19 [44] as the backbone network [29].

The generated augmentation samples by the method of conditional GAN [22] are shown in the first row of Fig. 2, and the empirical comparison of classification is shown in Table I. As shown in Fig. 2, the conditional GAN [22] has performed limited ability in extracting low frequency features, whereas it focus on the high frequency information when comparing with the original mass samples. The shape of the augmented masses are in fact very similar to the realistic ones. In addition, the spiculated lines and blood vessels are vividly shown in mass surroundings, and mass boundaries can be seen with high contrast. Yet, the generated lesions are visually very noisy, especially in the regions within masses, where textual features are merely depicted. As shown in the first row of Fig. 2, there is no surrounding tissue have been generated as background tissue in the last subfigure. In order to examine the effectiveness of increasing model discriminativity, we empirically compare the breast mass diagnosis performance (the classification accuracy and the AUC score) in Table I. It can be seen that both augmentation methods have increased the breast mass diagnosis performance over the baseline model by a analogously small margin, nevertheless the model complexity of conditional GAN is much higher than affine transformation.

This limitation by GAN-based methods may stem from neglecting some distinguished samples by the discriminator, which locate very close to the original data distribution. These off-manifold samples are highly similar to the original data, which may confer diverse benefits to classifier discriminativity as being trained along with on-distribution samples.

Iii-B2 Proposed algorithm

In order to overcome this defect found in previous works and experiments on cGAN [22], we desire to enlarge the mammography dataset meanwhile creating more distinctive samples. Inspired by Yu et al.’s recent research in solving open-category classification problem [51], we propose to use adversarial learning to augment mammographical masses with an optimization free algorithm. In this way, we augment the original dataset with both positive neighbors, that new instances lie on the original data manifold, e.g. and shown in Fig. 3; and negative neighbors, that augmented samples are off the original data manifold, e.g. and in Fig. 3.

Specifically, augmented data samples are generated for each class separately. For every mass type, the positive neighbors and the negative neighbors are created with the same model but with different objective functions. Particularly, the positive neighbors are the generated samples that cannot be separated from by the discriminator, while the negative neighbors are the ones that can be separated. Finally, the expanded dataset for class is of the form , and the whole dataset is .

In terms of the generator, the random noise is utilized to corrupt selected seed points, which are a number of randomly selected samples in . This step is simply a noise addition, thus no optimization with any objective function is involved. By applying the generator, new instances, including both the positive neighbors and the negative neighbors of samples from class , are created. All of the new sample nodes are close to the original data points, no matter whether they are positive or negative neighbors.

After the new instances are generated by the generator, the resulting samples are fed into the discriminator network, which is trained to distinguish the augmented samples and the original data instances. We adopt a SVM classifier as the discriminator for each type of neighbor of class , by which the generated samples are discriminated as the “real” or “fake” category. The output of the generator ranging with indicates how “real” the generated mass is, where represents real and denotes generated. The corresponding probability score by the SVM is calculated by the logistic sigmoid of the output signed distance, which is formulated as

(1)

where is the signed distance to the decision boundary.

With the built generator and discriminator, we create the new masses one by one, in which two SVM classifiers for the positive and negative neighbors are trained separately. Regarding the creation of positive neighbors, let be a desired new sample for class , and be the output probability score of the discriminator trained for positive neighbors. At this point, the discriminator aims to generate new samples that are as analogous as possible to the original instances, thereby it is trained on the union of and . Note that represents the already existing positive neighbors, which is initialized as empty. For each training batch, generated samples and original data images (for the data balanced) are utilized as the input of the discriminator and the weights are updated. After being fully trained, we select only one best generated sample in each batch, according to the objective as follows:

(2)

where is a distance measure, and weights the distance regularization. This regularization term forces the generated points to be different with a minimum distance , allowing the generator a better generalization.

Regarding the creation of negative neighbors, let corresponds to the output of the discriminator, predicting the possibility of labeled as a “real” data sample from class . is the existing negative neighbor set and is initialized as empty. In this scenario, the discriminator would like to select the generated samples, which are not only off the original data manifold but also are located close to the original data. In this way, the new samples can provide discriminative information. Specifically in a training batch, we select the desired negative neighbor from the generated samples, according to the objective:

(3)

where the distance regularization forces generated points to acquire a minimum distance apart from each other. The added distance restriction forces new points to be scattered close to , so that the minimum distance of to the original images is at most . The distance measure in (2) and (3) is set to be the angular cosine distance because of its superior discriminative information [38]. Let , then we set the radius parameters , and for . Further and is .

As for the optimization of (2) and (3), we employ the derivative free optimization method proposed in [50], in which the problem of is considered. Instead of calculating the gradients with respect to each parameter, this technique samples a number of solutions of , by which the feedback information is learned for searching for better solutions. The advantage of this method is to optimize problems even with bad mathematical properties, such as non-convexity, non-differentiability and too many local optima [50].

Iii-C Contrastive Learning to Enhance Discriminativity (Q2)

Investigators have achieved promising diagnosis performance for mammography by using deep neural networks. Yet one major limiting factor for continued studies is that deep models disregard the structural features of data. We consider to integrate the inherent data geometrical factor with CNNs with the merits of contrastive learning. By doing this, samples originated from same distribution are forced to be close whereas samples belonged to different categories are pushed away in the embedding space. Thus, the model’s discriminativity is expected to improve.

Iii-C1 Motivation

Contrastive learning was initially proposed to solve the manifold embedding problem in a self-supervised manner [18] and hence was extensively applied in representation learning [48, 21]

. This is attributed to its promising performance to improve model’s discriminativity through measuring similarities between correlated sample pairs, instead of directly computing sample-wise loss functions (

e.g. softmax, hinge, or mean squared error loss). Specifically, for a certain anchor sample, only one positive or negative pair is used for the calculation [19]. Positive pairs can be selected by data augmentation or co-occurence [25], while negative pairs are typically data samples uniformly sampled from other classes of data. Triplet loss [40] works in similar manner but in a supervised way, where labeled triplets rather than unlabeled neighboring sample pairs are selected for loss calculation. Similarly, triplet loss depends on triplet correlated samples, which includes one positive (belonging to the same class with the anchor) and one negative pair (from other classes) [16]. Although contrastive learning is effective to separate dense samples in deep latent space, typical triplet loss is not suitable for classifying mammography breast masses. In fact, random selection of negative and positive pairs can lead to worse generalization over the baseline, as the margin of mammogram manifolds across different classes are very close. On the contrary, with the use of manifold learning approximated by a designed local Signed graph, contrastive learning is able to preserve manifold locality knowledge, thus maximizing the manifold margin through the penalty involved by the selected neighboring positive and negative samples.

Iii-C2 Signed Similarity Graph

Graph embedding is trained with distributional context knowledge, which can boost performance in various pattern recognition tasks. Here, we aim to incorporate the signed graph Laplacian regularizer

[10] to learn a discriminative datum representation by a deep neural network, where discriminative here means that the intra-class data manifold structure is preserved in the latent space and the inter-manifold (slightly different) margins are maximized.

Using the supervision of the adversarial augmentation in section III-B, we build a Signed graph upon the expanded data . Given for class , and all other classes data , for , the corresponding elements in the Signed graph is built as follows:

(4)

where the () denotes the corresponding () nearest neighborhood of to approximate the locality of the manifold.

Iii-C3 Triplet contrastive loss

Then, we compute the structure preservation in the deep representation space (directly behind the softmax layer as shown in Fig.

4) , where . The Signed graph Laplacian regularizer is defined as following:

(5)

where is a distance metric for the dissimilarity between and . It encourages similar examples to be close, and those that are dissimilar to have a distance of at least to each other, where is a margin.

Note that instead of calculating the manifold embedding by solving an eigenvalue decomposition, we learn the embedding by a deep neural network. Specifically, inspired by the depth-wise separable convolutions [12] that are extensively employed to learn mappings with a series of factoring filters, we build stacks of depth-wise separable convolutions with similar topological architecture to that in [12] to learn such deep representations (Fig. 4).

Therefore, by minimizing (5), it is expected that if two connected nodes and are from the same class (i.e. is positive), and are also close to each other, and vice versa. Benefiting from such learned discriminativity, we train a simple softmax classifier to predict the class label, i.e.,

(6)

where when , and otherwise; is the parameter set of the neural network.

Iii-C4 Total Loss

Finally, by incorporating the Signed Laplacian regularizer (5) and the classification loss (6), the total objective of DiagNet is accordingly defined as:

(7)

where

is the regularization trade-off parameter which controls the smoothness of hidden representations.

Fig. 4: The deep neural network architecture constructed in COIN to extract deep latent features. “DC block” represents a down-sampling convolutional block, “RC block” is a residual convolutional block, and “SConv” is separable convolutions.

Iii-D Network Architecture and Implementation

The proposed CNN model is constructed with the architecture shown in Fig. 4. In the first four convolutional layers, down-sampling convolutional blocks (DC blocks) involve two separable convolutions are employed. Specifically, the separable convolution operators decompose convolutions into consecutive and

operations. After that a pooling layer halves the spatial size of the feature maps. The output of the down-sampling layer is then obtained by the transformation of the ReLU nonlinearity. The four DC blocks altered the original input

into feature maps with spatial sizes , , , and

respectively. Sequentially, seven separable convolutional layers are padded, reducing the total number of parameters, before three fully connected layers with the numbers of neurons are all

. The obtained latent features of the enlarged dataset are then regularized with the proposed contrastive loss in Sec. III-C. Finally, the learned features are classified into binary classes ( denotes ”Benign“ and represents ”Malignant“).

[Configurations of ] [Configurations of ]

Fig. 5: BMD Performance (accuracy and AUC score) of COIN on INBreast v.s. various hyper-parameters , and . (a) shows the performance with different positive neighbors and negative neighbors when equals 1, and (b) depicts various regularizer parameter with and .

[COIN (, )] [COIN (, )] [COIN (, )]

Fig. 6: t-SNE plots for the test set of INbreast dataset. (a), (b) and (c) show the embbedings of latent features trained by COIN with various learning configurations.

Iv Experiments

In this section, extensive experiments will be implemented to validate the proposed algorithm. We first examine the quality of generated masses from both adversarial augmentation modules. We then expand the original dataset with the augmented data, and build the Signed graph. To better evaluate the performance, we validate the proposed algorithm on the small FFDM mammography dataset: the INbreast dataset [37].

Iv-a Adversarial Augmentation Performance

To visually examine the quality of generated images by the proposed adversarial augmentation strategy, Fig. 2 show the augmented examples for benign and malignant categories (blue stands for benign and red represents malignant masses). It is noticeable that the difference between positive (second row) and negative neighbors (third row) within each category is subtle. Visually, it is very difficult to differentiate them within each mass type, not only with the masses themselves but also with the contextual or background tissues. This indicates that the generated negative neighbors are challenging to recognize, thus they tend to play an important role in increasing model’s discriminative ability. When we compare the generated samples by our proposed method with cGAN generated samples (first row), we can notice that the generated positive and negative samples of both benign and malignant categories are less noisy with more balanced concentration on low and high frequency signals. When observing the left column subfigures, it can be noticed that, both negative and positive neighbors of benign masses are in oval or round shape with relatively smooth boundaries, which are very similar to that of original INbreast data (Fig. 1). Additionally, the textual and contextual features of generated and realistic samples are visually highly alike. From the right column in Fig. 2, it can be seen that the shape of our resulting malignant masses (including both positive and negative neighbors) are mostly irregular, and the boundaries are fuzzy with spiculated vessels. These characteristics are identical to malignant masses in original INbreast dataset (Fig. 1).

In order to further evaluate the effectiveness of the proposed Adversarial Augmentation, we design a series of experiments to test the discriminativity of generated mass samples. As shown in Tab. I, we evaluate the classification performance with different augmentation algorithms in the proposed CNN architecture (Fig. 4), which include original INbreast data (baseline), conventional augmentation (flips and rotations), CGAN augmentation [22] and the proposed adversarial augmentation (positive neighbors only, i.e. is and ). Note that we optimize the CNN model with cross-entropy loss. From the Tab. I, we can notice that all augmentation algorithms have improved the classification performance when comparing with the baseline model. The conventional augmentation and CGAN [22] have achieved similar discriminative performance, whereas the proposed augmentation has outperformed other listed methods in both accuracy rate and AUC score. The proposed adversarial augmentation algorithm has achieved 89% accuracy and 0.92 AUC score.

Iv-B Signed Graph Laplacian performance

Determining the optimal values of hyper-parameters is a big challenge in deep learning. To explore COIN’s performance with different Signed graph configurations, the values of the number of positive neighbors and the number of negative neighbors are first grid searched with fixed regularization parameter , as shown in Fig. 5. The best performance occurs when and , which increases at least by 8% in the accuracy rate and by 12% in the AUC score when compared to no graph regularization. This confirms the effectiveness of using the signed graph regularization and also validates the importance of negative neighbors to improve the discriminativity and maximize the manifold margin. In addition, results show that the DiagNet achieves good performance only when both and are considered in the corresponding singed graph construction. Furthermore, we fix the best performing Signed graph configuration to evaluate the value and obtain the best AUC score and accuracy at . These results indicate that the deep latent features extracted by the deep network and the data inherent structural features are both important when diagnosing the malignant breast masses from the benign ones.

Methodology Accuracy AUC
Domingues et al. (2012) [15] 89% N/A
Dhungel et al. (2016) [14] 91% 0.76
Zhu et al. (2017) [52] 90% 0.89
Shams et al. (2018) [42] 93% 0.92
Li et al. (2019) [30] 88% 0.92
COIN (ours)
TABLE II: Breast Mass Diagnosis performance comparisons of the proposed DiagNet and relative state-of-the art methods on INbreast test set.

To visually observe the performance of data manifold learning, we further explore the learned features embedding plotted by t-SNE for test set (Fig. 6). For the purpose of ablation study, we explore the performance of COIN with different learning configurations. For instance, Fig. 6 shows COIN without any intra class or inter class Signed graph regularization provided by positive or negative neighbors, respectively. Fig. 6 shows the learning performance when COIN is only regularized by intra class regularization, i.e. without the usage of negative neighbors. And Fig. 6 illustrates the COIN learning when both intra and inter class regularization are employed. When compare these three conditions, the worst performance is obtained when there is no regularization (Fig. 6), by which samples of two categories are highly intersected. When the model is trained with intra class regularization (Fig. 6), it achieves a better discminativity performance, in which 15% samples are mis-classified. COIN with both negative and positive regularization (Fig. 6) has achieved the best embedding of the test data, where 82 out of 88 masses or approximately 93% test samples are correctly identified. Additionally, we have attached the original mass examples for some randomly selected misclassified masses in Fig. 6. We can notice that, the misclassified malignant mass sample by COIN are particularly similar to those benign masses surrounding it, and vice versa. This indicates that COIN can correctly categorize breast masses in most cases, apart from extremely hard example.

Iv-C Comparison to the state-of-the-art

Finally, to further explore the effectiveness of COIN, we compare the proposed algorithm with the state-of-the-art methods in Tab. II, where results of other works are taken from their original papers. It shows that, COIN has outperformed the state-of-the-art with mean accuracy 93.4% and AUC score 0.95. When compared with the second best algorithm [42], COIN

’s AUC score is significantly higher (3%) with experiments on the whole dataset without any pre-processing, post-processing or transfer learning.

V Conclusions

In this paper, we have proposed a novel deep framework COIN to address the two crucial challenges of BMD problem, i.e. data scarcity and data entanglement. COIN integrates adversarial augmentation and contrastive learning. In particular, the proposed adversarial augmentation dose not only enlarge the dataset, but also enhances the discriminativity for the diagnosis model. The proposed contrastive learning merits the model’s distinguishable ability further via exploiting the manifold geometry of data, which is valuable for mammography lesions of high resemblance. Experiments have shown that COIN surpasses the state-of-the-art algorithms for BMD problem.

References

  • [1] A. Antoniou, A. Storkey, and H. Edwards (2017)

    Data augmentation generative adversarial networks

    .
    arXiv preprint arXiv:1711.04340. Cited by: §III-B1.
  • [2] J. Arevalo, F. A. González, R. Ramos-Pollán, J. L. Oliveira, and M. A. G. Lopez (2016) Representation learning for mammography mass lesion classification with convolutional neural networks. Computer methods and programs in biomedicine 127, pp. 248–257. Cited by: §I-B.
  • [3] R. Blanks, M. Wallis, and S. Moss (1998) A comparison of cancer detection rates achieved by breast cancer screening programmes by number of readers, for one and two view mammography: results from the uk national health service breast screening programme. Journal of Medical screening 5 (4), pp. 195–201. Cited by: §I.
  • [4] P. Boyle, B. Levin, et al. (2008) World cancer report 2008.. IARC Press, International Agency for Research on Cancer. Cited by: §I.
  • [5] J. Brown, S. Bryan, and R. Warren (1996) Mammography screening: an incremental cost effectiveness analysis of double versus single reading of mammograms. BMj 312 (7034), pp. 809–812. Cited by: §I.
  • [6] G. Carneiro, J. Nascimento, and A. P. Bradley (2017) Automated analysis of unregistered multi-view mammograms with deep learning. IEEE transactions on medical imaging 36 (11), pp. 2355–2365. Cited by: §I-B.
  • [7] D. Chen, M. E. Davies, and M. Golbabaee (2020) Compressive mr fingerprinting reconstruction with neural proximal gradient iterations. In International Conference on Medical image computing and computer-assisted intervention (MICCAI), Cited by: §I-B.
  • [8] D. Chen and M. E. Davies (2020) Deep decomposition learning for inverse imaging problems. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §I-B.
  • [9] D. Chen, J. C. Lv, and Z. Yi (2014) A local non-negative pursuit method for intrinsic manifold structure preservation.. In AAAI, pp. 1745–1751. Cited by: §I-C.
  • [10] D. Chen, J. Lv, and M. E. Davies (2018)

    Learning discriminative representation with signed Laplacian restricted Boltzmann machine

    .
    arXiv preprint arXiv:1808.09389. Cited by: §III-C2.
  • [11] D. Chen, J. Lv, and Z. Yi (2017) Unsupervised multi-manifold clustering by learning deep representation. In

    Workshops at the 31th AAAI conference on artificial intelligence (AAAI)

    ,
    pp. 385–391. Cited by: §I-B, §II-B.
  • [12] F. Chollet (2017) Xception: deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258. Cited by: §III-C3.
  • [13] C. DeSantis, J. Ma, L. Bryan, and A. Jemal (2014) Breast cancer statistics, 2013. CA: a cancer journal for clinicians 64 (1), pp. 52–62. Cited by: §I.
  • [14] N. Dhungel, G. Carneiro, and A. P. Bradley (2016)

    The automated learning of deep features for breast mass classification from mammograms

    .
    In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 106–114. Cited by: 1st item, §II-A, §II-B, §III-B1, TABLE II.
  • [15] I. Domingues, E. Sales, J. Cardoso, and W. Pereira (2012) INbreast-database masses characterization. XXIII CBEB. Cited by: TABLE II.
  • [16] W. Ge (2018) Deep metric learning with hierarchical triplet loss. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 269–285. Cited by: §III-C1.
  • [17] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §II-A.
  • [18] R. Hadsell, S. Chopra, and Y. LeCun (2006) Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2, pp. 1735–1742. Cited by: §III-C1.
  • [19] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick (2020) Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738. Cited by: §II-B, §III-C1.
  • [20] O. J. Hénaff, A. Srinivas, J. De Fauw, A. Razavi, C. Doersch, S. Eslami, and A. v. d. Oord (2019) Data-efficient image recognition with contrastive predictive coding. arXiv preprint arXiv:1905.09272. Cited by: §II-B.
  • [21] R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler, and Y. Bengio (2018) Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670. Cited by: §II-B, §III-C1.
  • [22] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017) Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134. Cited by: Fig. 2, §III-B1, §III-B1, §III-B2, TABLE I, §IV-A.
  • [23] A. Jalalian, S. B. Mashohor, H. R. Mahmud, M. I. B. Saripan, A. R. B. Ramli, and B. Karasfi (2013) Computer-aided detection/diagnosis of breast cancer in mammography and ultrasound: a review. Clinical imaging 37 (3), pp. 420–426. Cited by: §I-A.
  • [24] Z. Jiao, X. Gao, Y. Wang, and J. Li (2018) A parasitic metric learning net for breast mass classification based on mammography. Pattern Recognition 75, pp. 292–301. Cited by: §I-A.
  • [25] P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan (2020) Supervised contrastive learning. arXiv preprint arXiv:2004.11362. Cited by: §II-B, §III-C1.
  • [26] T. Kooi, G. Litjens, B. Van Ginneken, A. Gubern-Mérida, C. I. Sánchez, R. Mann, A. den Heeten, and N. Karssemeijer (2017) Large scale deep learning for computer aided detection of mammographic lesions. Medical image analysis 35, pp. 303–312. Cited by: §I-A.
  • [27] T. Kooi, B. van Ginneken, N. Karssemeijer, and A. den Heeten (2017) Discriminating solitary cysts from soft tissue lesions in mammography using a pretrained deep convolutional neural network. Medical physics 44 (3), pp. 1017–1027. Cited by: §I-B.
  • [28] Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. nature 521 (7553), pp. 436. Cited by: §I-B.
  • [29] C. Li and M. Wand (2016) Precomputed real-time texture synthesis with markovian generative adversarial networks. In European conference on computer vision, pp. 702–716. Cited by: §III-B1.
  • [30] H. Li, D. Chen, W. H. Nailon, M. E. Davies, and D. Laurenson (2019) A deep dual-path network for improved mammogram image processing. International Conference on Acoustics, Speech and Signal Processing. Cited by: §II-A, §II-B, TABLE II.
  • [31] H. Li, D. Chen, W. H. Nailon, M. E. Davies, and D. I. Laurenson (2019) Signed laplacian deep learning with adversarial augmentation for improved mammography diagnosis. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 486–494. Cited by: §I-C.
  • [32] H. Li, D. Chen, W. H. Nailon, M. E. Davies, and D. Laurenson (2018) Improved breast mass segmentation in mammograms with conditional residual U-Net. In Image Analysis for Moving Organ, Breast, and Thoracic Images, pp. 81–89. Cited by: 1st item, §II-A.
  • [33] W. Lotter, G. Sorensen, and D. Cox (2017) A multi-scale cnn and curriculum learning strategy for mammogram classification. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 169–177. Cited by: §I-B.
  • [34] L. v. d. Maaten and G. Hinton (2008) Visualizing data using t-sne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: Fig. 1, §I-A.
  • [35] A. Malich, D. R. Fischer, and J. Böttcher (2006) CAD for mammography: the technique, results, current role and further developments. European radiology 16 (7), pp. 1449. Cited by: §I-A.
  • [36] S. M. McKinney, M. Sieniek, V. Godbole, J. Godwin, N. Antropova, H. Ashrafian, T. Back, M. Chesus, G. C. Corrado, A. Darzi, et al. (2020) International evaluation of an ai system for breast cancer screening. Nature 577 (7788), pp. 89–94. Cited by: §I.
  • [37] I. C. Moreira, I. Amaral, I. Domingues, A. Cardoso, M. J. Cardoso, and J. S. Cardoso (2012) INbreast: toward a full-field digital mammographic database. Academic radiology 19 (2), pp. 236–248. Cited by: §IV.
  • [38] V. Nair and G. E. Hinton (2010) Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pp. 807–814. Cited by: §III-B2.
  • [39] A. Oliver, J. Freixenet, J. Marti, E. Perez, J. Pont, E. R. Denton, and R. Zwiggelaar (2010) A review of automatic mass detection and segmentation in mammographic images. Medical image analysis 14 (2), pp. 87–110. Cited by: §I-A, §I-B, §I, §II-B.
  • [40] F. Schroff, D. Kalenichenko, and J. Philbin (2015)

    Facenet: a unified embedding for face recognition and clustering

    .
    In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823. Cited by: §III-C1.
  • [41] H. S. Seung and D. D. Lee (2000) The manifold ways of perception. science 290 (5500), pp. 2268–2269. Cited by: §II-B.
  • [42] S. Shams, R. Platania, J. Zhang, J. Kim, and S. Park (2018) Deep generative breast cancer screening and diagnosis. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 859–867. Cited by: §I-B, §III-B1, §IV-C, TABLE II.
  • [43] D. Shen, G. Wu, and H. Suk (2017) Deep learning in medical image analysis. Annual review of biomedical engineering 19, pp. 221–248. Cited by: §I.
  • [44] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §III-B1.
  • [45] C. Varela, S. Timp, and N. Karssemeijer (2006) Use of border information in the classification of mammographic masses. Physics in medicine & biology 51 (2), pp. 425. Cited by: §I-A.
  • [46] S. C. Wong, A. Gatt, V. Stamatescu, and M. D. McDonnell (2016) Understanding data augmentation for classification: when to warp?. In 2016 international conference on digital image computing: techniques and applications (DICTA), pp. 1–6. Cited by: §II-A.
  • [47] E. Wu, K. Wu, D. Cox, and W. Lotter (2018) Conditional infilling GANs for data augmentation in mammogram classification. In Image Analysis for Moving Organ, Breast, and Thoracic Images, pp. 98–106. Cited by: §II-A, §III-B1.
  • [48] Z. Wu, Y. Xiong, S. X. Yu, and D. Lin (2018) Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3733–3742. Cited by: §III-C1.
  • [49] Q. Xie, M. Luong, E. Hovy, and Q. V. Le (2020)

    Self-training with noisy student improves imagenet classification

    .
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10687–10698. Cited by: §II-B.
  • [50] Y. Yu, H. Qian, and Y. Hu (2016) Derivative-free optimization via classification. In Thirtieth AAAI Conference on Artificial Intelligence, Cited by: §III-B2.
  • [51] Y. Yu, W. Qu, N. Li, and Z. Guo (2017) Open-category classification by adversarial sample generation. International Joint Conference on Artificial Intelligence. Cited by: §II-A, §III-B2.
  • [52] W. Zhu, Q. Lou, Y. S. Vang, and X. Xie (2017) Deep multi-instance networks with sparse label assignment for whole mammogram classification. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 603–611. Cited by: §I-B, §II-A, TABLE II.