Comparison of different CNNs for breast tumor classification from ultrasound images

12/28/2020 ∙ by Jorge F. Lazo, et al. ∙ 0

Breast cancer is one of the deadliest cancer worldwide. Timely detection could reduce mortality rates. In the clinical routine, classifying benign and malignant tumors from ultrasound (US) imaging is a crucial but challenging task. An automated method, which can deal with the variability of data is therefore needed. In this paper, we compared different Convolutional Neural Networks (CNNs) and transfer learning methods for the task of automated breast tumor classification. The architectures investigated in this study were VGG-16 and Inception V3. Two different training strategies were investigated: the first one was using pretrained models as feature extractors and the second one was to fine-tune the pre-trained models. A total of 947 images were used, 587 corresponded to US images of benign tumors and 360 with malignant tumors. 678 images were used for the training and validation process, while 269 images were used for testing the models. Accuracy and Area Under the receiver operating characteristic Curve (AUC) were used as performance metrics. The best performance was obtained by fine tuning VGG-16, with an accuracy of 0.919 and an AUC of 0.934. The obtained results open the opportunity to further investigation with a view of improving cancer detection.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Breast cancer is the most common cancer in women [cancer_2019]. Cancer screening is performed via Breast Ultrasound (BUS) imaging and mammography. BUS is recommended in a large variety of cases, such as women under the age of 30 and/or in case of pregnancy. Cancer diagnosis is performed in the clinical practice by clinicians through visual BUS-image analysis. However, it is well known that BUS image acquisition and analysis are highly dependent on the clinician level of expertise [widely_variability].

Computer-aided diagnosis (CAD) systems for BUS image analysis have recently shown to be able to tackle the variability associated with both breast anatomy and BUS images, becoming a suitable tool to improve diagnosis accuracy [XIAN2018340].

In the last few years, deep-learning (DL) approaches, and more specifically convolutional neural networks (CNNs), have become the standard in research for BUS-image analysis [LIU2019261]. However, there are still open challenges that need to be addressed. Among them, the necessity of relying on a large and annotated BUS dataset for CNN training [LIU2019261]. A possible solution to attenuate this issue could be to exploit transfer learning and fine tuning. These kind of approaches has already been applied in other studies and imaging modalities presenting promising results as reported in [trasnfer_deep, fiorentino2019learning, calimeri2016optic]. However, even transfer learning seems to be the way of proceeding, there are different techniques to perform this method, and to our knowledge, there is no study in the comparison of this strategies applied to this kind of data.

In this paper we explore the use of pre-trained existing models and two different training strategies with the aim of determining which of these strategies and model is more suitable for the task of breast tumor classification. The paper is organized as follows: Sec. II reviews the state of the art in automated tumor classification methods, Sec. III delve into the methodology explaining the architectures and transfer learning techniques used. In Sec. V information about the dataset used and details about the training is provided. In Sec. IV the results are presented and discussed. Finally, Sec. VI, concludes this paper discussing results and proposing future improvements.

Ii Deep Learning in BUS image analysis

In the last few years, DL methods and specifically CNNs have become the state of the art for image analysis tasks. The application of DL in the analysis of medical US images involves different specific tasks, such as classification, segmentation, detection, registration, as well as the development of new methodologies for image-guided interventions.

In the specific case of tumor classification, different extensions and variations of DL approaches have been developed. In [yap2017automated] the authors propose the use of different CNNs for locating regions of interest (ROIs) corresponding to lesions. In [bakkouri2017breast]

, CNNs are used as feature extractors and the features obtained are classified with a Support Vector Machine (SVM). In

[Bashir_BUS] an architecture based on AlexNet is proposed, and its performance is compared with some pretrained models, using a small custom-built dataset. In [al2019deep]

, a method based on the use of Generative Adversarial Networks (GANs) for data augmentation is proposed, later the authors compare the performance of this network in the task of generating synthetic data, using pretrained models, in this case VGG16, Inception, ResNet and NasNet. A further step is taken in

[MOON2020105361], where the authors propose the use of ensembles to develop a better and more comprehensive generalized model. Their model is based in the use of VGG16-like architectures as well as different versions of ResNet and DenseNet. In [han2017deep] a modification to the GoogLeNet architecture is proposed, this network is an early version of Inception V3 and it is composed by a main branch and two auxiliary classifiers, specifically they suggest to remove the auxiliary classifiers from the main branch of the network. In [xiao2018comparison] they compare different pre-trained models but only using fine-tuning. In this study they compare the CNNs ResNet50, InceptionV3, and Xception.

Iii Proposed Methods

In this work, we investigated two different transfer learning techniques: (i) fine tuning and (ii) using pretrained CNNs as feature extractor. Each of these methods were tested using two different CNN architectures:

Iii-a CNN architectures

  • VGG-16 is a 16-layer CNN model which has a sequential architecture consisting of 13 convolutional layers and 5 max-pooling layers

    [vgg_16_paper]

    . The architecture starts with a convolutional layer with 64 kernels. This number is doubled after each pooling operation until it reaches 512. The pooling layer is placed after selected convolutional layers in order to reduce dimension in the activation maps and hence of the subsequent convolution layers. This in general reduces the number of parameters that the CNN needs to learn. The convolutional kernel size of all the convolutional layers in this model is 3x3. The model ends with three fully-connected (FC) layers with 64 neurons each, which perform the classification.

  • Inception V3 [InceptionV3]

    uses an architectural block called inception module, which consists of convolutional kernels with different sizes (1x1, 3x3 and 5x5) that are connected in parallel (Fig.

    1). The use of different kernel sizes allows the identification of image features at different scales. Furthermore, Inception V3 uses not just one classifier but two, the second one is an auxiliary classifier which is used as regularizer. One of the main advantages of this model is that it is composed of about 23 million of parameters even it has 42 layers, therefore the computational cost for training this network is also less than the one needed to retrain VGG-16. However, given its more complex topology it is also harder to retrain.

Fig. 1: Diagram of the inception module. The module is composed by several convolutional kernels of size 1x1, 3x3 and 5x5 connected in parallel. The 1x1 kernels placed before the 3x3 and 5x5 is used to reduce the dimensionality of the feature map. The output of each of the branches is finally concatenated before getting into the next stage.
Fig. 2: Samples of two images from the BUSI dataset. Image with a benign tumor (left) and with a malignant tumor (right).

Iii-B Transfer learning

As introduced in Sec. I, training a CNN model from scratch requires a large number of computational resources as well as a fair amount of labeled data. It also often requires a considerable amount of time, even using several graphics processing units (GPUs).

Transfer learning allows making the training process more efficient by using a model that has already been trained on a different dataset. There are different ways to perform transfer learning, in this work fine tuning and feature extraction techniques were explored. The first one refers to re-adjust the weights of the new distribution of the new training data, i.e. to tune the weights of the CNN by training it on the new dataset for few epochs and with a low learning rate. Usually, only the weights in the last layers are retrained. In this work, both VGG 16 and Inception V3 were pretrained on ImageNet

111http://www.image-net.org/, a dataset which has more than 14 million images belonging to 1000 classes. The second method, feature extraction, refers to use the whole network as a feature identifier, and then making use of the high level features, obtained from the network, in another classifier.

The fine-tuning strategy was performed by replacing the last fully connected layer composed by 2 neurons (one for the benign and the other for the malignant class), and then fine-tune different number of layers in the network. The layers chosen to be fine-tuned were the very last one up to the last 3 layers of each network.

In the case of the feature extraction method, an additional classifier was added and trained, this was composed by a Global Average Pooling Layer, a Fully connected layer with a Rectified Linear Unit (

ReLU

) activation function. Finally, a fully connected layer with

softmax activation function and 2 neurons.

Iv Results

Iv-a Dataset

The data used for this work came from two public available datasets collected by the groups of Rodriguez et al.222https://data.mendeley.com/datasets/wmy84gzngw/1 and Fahmy et al.333https://scholar.cu.edu.eg/?q=afahmy/pages/dataset The first dataset consists of 250 breast tumor images (100 benign and 150 malignant) with an average size of 100x75 pixels. The second one consists of 963 images of an average image size of 500x500 pixels, in this dataset 487 images correspond to images with benign tumors, 210 with malignant tumors and 266 images with any tumor at all. For this project only the images with benign and malignant tumors were used.

Hence, the dataset used in this work consisted of 537 and 360 with benign and malignant tumors, respectively. The whole dataset was shuffled randomly, and then split, in a stratified fashion, in two subsets, with 630 images for training and validation, and 269 for testing. A sample of benign and malignant BUS images is shown in Fig. 2.

During the training of the networks, mini-batch gradient descent method was applied and Adam optimization algorithm. At the moment of creating the training batches the images were reshaped “on the fly” to match the size of the input of each network, 224x224 pixels in the case of VGG-16 and 299x299 pixels for the case of Inception V3.

Iv-B Training settings

Training was performed with an initial learning rate of 0.001. The size of the output for the added fully connected (FC) layer was 1024 and 512 for Inception V3 and VGG-16 respectively. The mini-batch size used was of 50 images.

The experiments were carried out on an Nvidia GPU GTX 1660 using Keras with TensorFlow backend.

Iv-C Performance Metrics

To evaluate the proposed CNNs, the area (AUC) under the Receiver Operating Characteristic (ROC) curve was computed. The accuracy (), was computed as follows:

(1)

where and are the amount of malignant BUS images correctly classified, respectively, and and are the amount of malignant BUS images missclassified.

(a)
(b)
Fig. 3: Receiver operating characteristics (ROC) curves for the 2 different models tested using fine-tuning: (a) VGG-16, (b) Inception V3.
Model trainable parameters
Feature extraction
VGG-16 512,512 0.862 0.791
Inception V3 1,311,744 0.713 0.623
Fine-tuning
VGG-16 1,054,722 0.919 0.934
Inception V3 2,388,539 0.756 0.783
TABLE I: Comparison of the obtained results with VGG-16 and Inception V3 with different training methods each one. The numbers highlighted in black correspond to the model and method that obtained the highest scores in terms of ACC and AUC.

V Results and Discussion

The obtained ROC curves for VGG-16 and Inception V3 using the fine-tuning are shown in Fig. 3, a summary of the results obtained with each network and each training method is presented in Table I. Fine tuning worked better, and the model which performed better was VGG-16 in both cases. However, the difference between these two tranfer learning methods is smaller with Inception V3. In the case of the ACC, the difference is only of 0.046 while the difference of the AUC is 0.16. Using fine tuning on VGG-16, the values of ACC = 0.919 and AUC = 0.934 were obtained.

Inception V3 only reaches the values of 0.756 and 0.783, respectively in the test dataset. However, during the training, it reaches accuracy values over 0.93 and AUC of 0.89 which implies that the model is over-fitting. Normalization techniques, such as Dropout and normalization, may help to reduce the over-fitting issue and improve its performance in the testing stage [Phaisangittisagul]

. A deep exploration of the hyperparameters choice such as the batch size, the learning rate, the number of layers to be retrained (for the case of fine tuning), the size of the fully connected layers and number of layers (for the case of feature extraction) and the use of normalization methods is needed to be carried out. As is it possible to see from the sample of results show in Fig.

4 the variability of the content in the images affects the predictions made by the network. This may could be tackled by using different data augmentation methods which can generalize the variability of this kind of data.

(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
Fig. 4: Sample of results obtained by using fine tuning and VGG-16 architecture. (a)-(c) images with benign tumors missclassified as malign, (d)-(f) images with benign tumors correctly classified, (g)-(i) images with malign tumors missclassified as benign, (j)-(l) images with malign tumors correctly classified.

Vi Conclusion

In this paper, we explored the application of CNN for the task of tumor classification using BUS images. We trained two pretrained models in two different ways: using the existing models as feature extractors and fine tuning them. We observed that for this case fine tuning achieved better results and in general the architecture VGG-16 performed better in the test dataset. The best value obtained for accuracy was 0.919 with an AUC of 0.934. In the case of Inception V3 the values obtained were considerably lower, 0.756 and 0.783 respectively, but its performance during the training was better, reaching an accuracy value of 0.93. The results obtained using fine tuning are in the same range as the one of the state of the art, but we can state that fine tuning is the strategy to go and which should be developed further. Further normalization methods will be needed in order to avoid over-fitting as it seems to be the case. Future work includes the use of data augmentation, the testing of different and novel architectures which have claimed to be more robust against perspective transformations, spatial orientation, scale changes, etc. such as Capsule Networks and the exploration of image segmentation with other architectures.

Acknowledgements

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 813782.

References