Classification of Medical Images and Illustrations in the Biomedical Literature Using Synergic Deep Learning

06/28/2017 ∙ by Jianpeng Zhang, et al. ∙ 0

The Classification of medical images and illustrations in the literature aims to label a medical image according to the modality it was produced or label an illustration according to its production attributes. It is an essential and challenging research hotspot in the area of automated literature review, retrieval and mining. The significant intra-class variation and inter-class similarity caused by the diverse imaging modalities and various illustration types brings a great deal of difficulties to the problem. In this paper, we propose a synergic deep learning (SDL) model to address this issue. Specifically, a dual deep convolutional neural network with a synergic signal system is designed to mutually learn image representation. The synergic signal is used to verify whether the input image pair belongs to the same category and to give the corrective feedback if a synergic error exists. Our SDL model can be trained 'end to end'. In the test phase, the class label of an input can be predicted by averaging the likelihood probabilities obtained by two convolutional neural network components. Experimental results on the ImageCLEF2016 Subfigure Classification Challenge suggest that our proposed SDL model achieves the state-of-the art performance in this medical image classification problem and its accuracy is higher than that of the first place solution on the Challenge leader board so far.



There are no comments yet.


page 2

page 3

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The indispensable role of digital medical imaging in the modern healthcare has led to the fast growth of digital images in all types of electronic biomedical publications. This fast growth poses great challenges for image retrieval, review and recruiting data for clinical care and research settings. Hence, there have been a large number of image classification researches aiming to improve the data mining ability in this area

[5, 2, 7]. The ImageCLEF2016 Subfigure Classification Challenge [3], recognizing the increasing complexity of images in biomedical literatures, contains figures with sub-figures that produced by multiple imaging modalities and illustrations drawn from analysis of medical data.

Image classification has been thoroughly studied during the past decades with a huge number of solutions being published in the literatures [12, 11, 24, 25, 30]

. These solutions usually consist of the handcrafted feature extraction and classifier learning process. Despite of their success, it is difficult to design the handcrafted feature that is optimal for a specific classification task. Recently, with the deep learning methods being introduced, medical image analysis has experienced a rapid development. Especially, due to the fact that deep learning models can overcome the need for manual feature design and have superior classification capabilities, the medical image detection, classification

[22, 27] and segmentation [1, 14] enjoyed a performance boosting. For example, Xu et al. [28] adopted a deep convolutional neural network to minimize manual annotation and produce good feature representations for colon cancer classification using histopathology images. Shen et al. [21] developed a multi-crop pooling strategy and applied it to a convolutional neural network to capture object salient information for lung nodule classification in CT images..

Although deep learning-based approaches outperform the state-of-the-art in a number of medical image analysis tasks in the field, substantial challenges still remain. For example, the issues with medical image datasets, including small datasets and anatomical variations, still restrict the effectiveness of classifying medical images and illustrations in the literature. The first issue usually relates to the work that required in acquiring the image data and then in image annotation [26]

. Pre-trained deep convolutional neural network (DCNN) models have been used to address this issue, due to the strong transfer learning ability of the DCNN, i.e. learned from large-scale datasets like ImageNet, could be transferred to solving generic small-data visual recognition problems

[17, 15]. Koitka et al. [8] extracted the activation values from the last FC1000 layer in a pre-trained ResNet-152 model and adopted them to train a custom network layer using the pseudo inverse method [18]. Kumar et al. [9] proposed to integrate two different pre-trained CNN architectures and ensemble the results from multiple models into one high-quality classifier.

Fig. 1: An example shows the intra-class variation and inter-class similarity in modality-based medical image classification: (a) a brain CT image, (b) a pleural CT image, (c) a brain MR image, and (d) a pleural MR image.
Fig. 2: Architecture of the SDL model, which consists of a dual deep convolutional neural network and an additional synergic signal system. The input is a pair of images which are sampled from the training set.

The second issue is the intra-class variation and inter-class similarity [29], which poses even greater challenges for classifying medical images according to the different modalities, by which they were produced. A typical example that highlights the difficulty is shown in Figure 1, in which the brain CT and pleural CT images in the top row looks dissimilar due to viewing different anatomical structures although they belong to the same category; whereas the brain CT and MRI CT images in the left column look very similar but belonging to different categories. Although deep neural networks have enough capacity to forcibly remember all training samples [31], the ambiguity produced by intra-class variation and inter-class similarity may tease a neural network and make it fall into confusion. The neural network makes right decision with the low confidence level and the results may even completely opposite if adding small fluctuations in the input.

In this paper, synergic deep learning (SDL) is presented to enhance the distinguishing ability of deep neural networks, especially for those confusing samples. The basic learning strategy of SDL is to use a synergic signal to bridge several neural networks so that they can guide and benefit from each other. We specifically design a SDL model that consists of a dual deep convolutional neural network (dual-DCNN) and a synergic signal system in this paper to solve the medical image classification problems. It’s advisable to initialize our dual-DCNN with the pre-trained DCNN, whose parameters were derived from the ImageNet dataset [4]

and further fine tune it with our dataset. However, to break the independence between dual DCNNs, an additional synergic signal, serving as an information bridge between two DCNNs, is used to verify whether the input pair belongs to the same category or not. The wrong decision made by a DCNN will be highlighted in the form of synergic error with the help of correct decision made from another DCNN. In this case, the advantage depicted by one neural network is able to guide the learning of the weaker one, and both DCNNs have stronger representation ability to distinguish the confusing images with significant intra-class variance and inter-class similarity in the synergic learning mode. Moreover, our SDL model is easily trained under the classification and synergic supervisions in an ’end to end’ fashion. In the test phase, the precision probabilities of each test sample given by dual neural networks are added together as an ensemble decision probability. We have evaluated it on the ImageCLEF2016 medical classification challenge dataset and the experimental results show that our proposed SDL model is the state-of-the-art on the medical classification problem.

Ii The Synergic Deep Learning Model

Our proposed SDL model consists of three main components, i.e. a data pair input layer, a dual-DCNN and a synergic signal system, as shown in Figure 2. Different from the one by one input mode of conventional deep models, our SDL model accepts a pair of inputs that are randomly selected from the training set. A dual-DCNN, including DCNN-A and DCNN-B, is the main learning module with two input sequences. Both DCNN-A and DCNN-B are pre-trained residual neural networks and fine-tuned with the supervision of the true labels of inputting sequences. Besides, a synergic signal system is used to verify whether the input pair belongs to a same category or not, and gives the corrective feedback if a synergic error exists. For instance, in Figure 2, the first pair of images with high structure similarity actually belong to different classes. The second pair comes from the same class, but visually they are different. It’s easy for a weak DCNN to make false decision under the contrast. The error generated from synergic signal system will further modify the dual-DCNN to have a stronger ability to distinguish these confusing samples.

We discuss the details of our proposed SDL model in the following sections.

Ii-a The Dual Deep Convolutional Neural Network

The dual-DCNN is an important module in our SDL model that contains two complete learning units, namely DCNN-A and DCNN-B. In principle, any DCNN with arbitrary structure can be embedded in our SDL model. Here, due to the strong representation capability of the famous residual network [6], we employ a pre-trained residual neural network (ResNet-50, as shown in Figure 3) as the initialization of our DCNN-A and DCNN-B. It is composed of 50 learnable layers, and its parameters have been converged by training on the ImageNet dataset, for an image classification task. It’s worth to note that two parameter sets of the DCNN-A and DCNN-B, denoted by and , are not shared. To adapt the ResNet-50 model to our image dataset, we replace all fully connected (FC) layers with a FC

layer of 1024 neurons (

FC1024-A/B), a FC layer of K neurons (K

-class classification) and a softmax layer, and then fine tune the parameters of ResNet-50 by using our own training data. The weights of new FC layers are initialized by uniform distribution

U(-0.05, 0.05)

. The cross-entropy loss function of each DCNN is defined as


where M

is the number of training data. The mini-batch stochastic gradient descent (mini-batch SGD) algorithm is used to optimize the


Fig. 3: Architecture of ResNet-50 model.

Both DCNN-A and DCNN-B accept input from a pair of images, aiming to supervise the training process in each learning unit with the true labels of corresponding input sequences. Although each DCNN has the ability to predict the label class of an input image, we creatively embed the activations from last two fully connected layers in both DCNNs into a synergic signal system to break the learning independence of the dual DCNNs.

Ii-B Synergic Signal System

In our SDL model, a synergic signal system is designed to supervise the learning from the input pairs and bridge the gap between dual DCNNs. The architecture of this system is shown in Figure 4. Image representations need to be input into the synergic signal system in pairs. We randomly select image pairs from the training data and denote the property of a pair as


where and are outputs of FC1024-A and FC1024-B, and are true labels of and , respectively. Here, ‘’ is a positive pair and ‘’ is a negative pair. Image pairs are selected from each min-batch. To avoid the unbalance data problem, the number of positive pairs in a batch is about . and are concatenated together into an embedding layer followed by a FC layer with 2 neurons. It’s convenient to monitor the synergic signal by adding another softmax layer and using the following cross-entropy loss


where is the ensemble of parameters of the synergic signal system. The detailed learning process of our proposed model is summarized in Table 1.

Fig. 4: Diagram of the synergic signal system.
Input: and (outputs of FC1024-A and FC1024-B), initialized parameters of DCNN-A, DCNN-B and the synergic signal system, , and , learning rate and the hyper parameter .
Step1: Concatenate two input features FC1024-A and FC1024-B into a combined one, which is denoted as . The labels of these three supervisions are , , .
Step2: Update parameters , and by using back-propagation algorithm.
Compute loss: , and .
Compute gradient:
where and is a weighting factor of synergic signal.
Update parameters: and
TABLE I: Learning process of the synergic deep learning model.

Ii-C Test Phase

In the test phase, for a test image x, the DCNN-A and DCNN-B give predictions and , which are activations in last FC layer. The additional synergic signal is abandoned for final classification results in the test phase. The predicted label of the input x is denoted as


Iii Experiments

Iii-a A toy example

In this section, a toy example on the MNIST dataset is presented. Based on the classical LeNet-5 model, we designed a simple convolutional neural network called LeCNN, shown in Figure 5, as the architecture of DCNN-A and DCNN-B modules. Thus, the SDL model is composed of two LeCNNs and a synergic signal system. To simply evaluate the effectiveness of our SDL model, we set the =1. Figure 6 shows the loss and accuracy curves obtained on the training and testing sets during the training progress. It suggests that using the synergic signal system does lead to performance improvement, as our SDL model has lower loss and higher accuracy than the LeCNN model.

Fig. 5: A simple convolutional neural network (LeCNN) architecture used in MNIST experiment.
Fig. 6: Comparison of LeCNN and proposed SDL on the MNIST dataset.

Iii-B Dataset

We evaluated our SDL model on the ImageCLEF2016 Subfigure Classification Challenge dataset [3], which consists of 6776 training images and 4166 testing images collected from the PubMed Central (PMC111[16]. These images are divided into 30 categories, including 12 categories of medical diagnostic images, such as CT images, MRI images and PET images, and 18 categories of illustrations, such as figures, tables and flow charts. The abbreviations and details of each image category were listed in Table 2. The aim of our experiment is to classify medical diagnostic images according to the modality they were produced and classify illustrations according to their production attributes.

No. Abb. Det. Training
1 D3DR 3D reconstructions 201
2 DMEL Electron microscopy 208
3 DMFL Fluorescence microscopy 906
4 DMLI Light microscopy 696
5 DMTR Transmission microscopy 300
6 DRAN Angiography 17
7 DRCO Combined modalities in one image 33
8 DRCT Computerized Tomography 61
9 DRMR Magnetic Resonance 139
10 DRPE PET 14
11 DRUS Ultrasound 26
12 DRXR X-ray, 2D Radiography 51
13 DSEC Electrocardiography 10
14 DSEE Electroencephalography 8
15 DSEM Electromyography 5
16 DVDM Dermatology, skin 29
17 DVEN Endoscopy 16
18 DVOR Other organs 55
19 GCHE Chemical structure 61
20 GFIG Statistical figures, graphs, charts 2954
21 GFLO Flowcharts 20
22 GGEL Chromatography, Gel 344
23 GGEN Gene sequence 179
24 GHDR Hand-drawn sketches 136
25 GMAT Mathematics, formulate 15
26 GNCP Non-clinical photos 88
27 GPLI Program listing 1
28 GSCR Screenshots 33
29 GSYS System overviews 91
30 GTAB Tables and forms 79
TABLE II: Category abbreviations and details in the ImageCLEF2016 classification hierarchy.

Iii-C Parameter Settings

To leverage the overfitting issue in deep learning, we utilized several data argumentation strategies, including rotation, translation and random scaling, to enlarge our dataset 10 times. We designed the following variable learning rate


where t is the index of iteration and

. We set the maximum epoch number to 80 and adopted the mini-batch stochastic gradient decent with a batch size 64 as the optimizer. To stop the training process when the model falls into overfitting,

20% of training data were randomly selected to form a validation set, which was used to monitor the performance of our model. We evaluated our proposed model with different values of the hyper parameter . The performance shown in Figure 7 reveals that our model achieves the lowest loss and highest accuracy when . Hence, we empirically set the value of to 40 in our experiments.

Fig. 7: Comparison of validation loss and accuracy under different values.

Iii-D Results and Analysis

We evaluated our proposed SDL model against the standard ResNet-50 model with the same experiment settings, including the same training set, validation set, initial parameters and learning rate scheme. Figure 8 shows the loss and accuracy curves of both models on the validation dataset. The smaller loss and higher accuracy achieved by the SDL model indicate that our model outperforms ResNet-50 on the validation set.

Fig. 8: Validation loss and accuracy curves by using pre-trained ResNet-50 and our proposed SDL model.

The classification accuracies of the ResNet-50 model, each component of our model i.e. DCNN-A or DCNN-B, and our SDL model on the validation set were displayed in Figure 9. It shows that, after incorporating the synergic signal system into the dual-DCNN architecture, each component of our model, which is also ResNet-50, achieves more than 2% accuracy improvement, as compared to the standard ResNet-50. Moreover, jointly using those two component in an ensemble learning manner can further improve the classification accuracy.

Fig. 9: Classification accuracy of ResNet-50, each DCNN component of our model and our SDL model on the validation set.

The F-scores

[19] of our SDL model and ResNet-50 model were calculated on each category of the test dataset and were depicted in Figure 10, in which a red arrow indicates an increased accuracy when applying our model to that category of test data, whereas a blue arrow suggests a decreased accuracy. It shows that our model achieves higher classification accuracy than ResNet-50 on most categories.

Fig. 10: Comparison of F-scores for each test class using ResNet-50 and our SDL model.

Next, we evaluated our SDL model on the ImageCLEF2016 test set. Figure 11 gives the confusion matrix of the classification results obtained by using our SDL model. The x-axis is the predicted label, and y-axis is the true label. A higher intensity value represents higher classification accuracy. Since this is a highly imbalanced classification problem, we let the size of each rectangle in this figure be proportional to the number of training images in each class. The confusion matrix shows that our SDL model achieves relatively accurate classification on every major category and most minor categories.

Fig. 11: Confusion matrix of the classification result obtained by applying our SDL model to the ImageCLEF2016 test dataset.

Table 3 gives the classification accuracy of our proposed SDL model, ResNet-50 model, Kumar’s ensemble method and the accuracy of six best-performed solutions listed in the ImageCLEF2016 Subfigure Classification Challenge leader board. Obviously, handcraft feature engineering underperforms deep learning-based methods, and deeper network, such as ResNet-152, performs better than ResNet-50. However, our SDL model achieves the so far best classification accuracy in this challenge. Please note that our SDL model is capable of adapting any DCNN structures, which means it can benefit from much deeper models, such as ResNet-152.

Method Classification Accuracy(%)
SDL model 86.58
Koitka [8] (ResNet-152) 85.38
ResNet-50 84.54
Koitka [8] (11 handcrafted features) 84.46
Valavanis [23] (BoW model) 84.01
Kumar[9](Ensemble of multi-DCNNs) 82.48
Kumar [10] (A pre-trained DCNN) 77.55
Li [13] 72.46
Semedo [20] 65.31
TABLE III: Classification accuracy of our SDL model and eight best-performed solutions on the ImageCLEF2016 test dataset.

Iv Conclusions

In this paper, we propose a synergic deep learning (SDL) model that contains a dual collaborative deep convolutional neural network and an additional synergic signal system to classify medical images and illustrations in the biomedical literature. To strength the collaborative learning of a dual nets, the synergic signal is used to verify whether an input pair belongs to a same category or not. It promotes that our SDL model has stronger representation ability to distinguish those easily confused inter-class samples and obvious diversity of intra-class samples. Experimental results on the ImageCLEF2016 Subfigure Classification Challenge dataset show that our proposed SDL model achieves the state-of-the-art performance in this medical image classification problem, with an accuracy higher than the first place on the Challenge leader board at the time of submission.

V Acknowledgements

We appreciate the efforts devoted by the organizers of the ImageCLEF2016 Subfigure Classification Challenge to collect and share the data for comparing algorithms of classifying medical images and illustrations in the biomedical literature.


  • [1] H. Chen, X. Qi, L. Yu, Q. Dou, J. Qin, and P.-A. Heng. Dcan: Deep contour-aware networks for object instance segmentation from histology images. Medical Image Analysis, 36:135–146, Feb 2017.
  • [2] M. de Bruijne. Machine learning approaches in medical image analysis: From detection to diagnosis. Medical Image Analysis, 33:94–97, Oct 2016.
  • [3] A. G. S. de Herrera, R. Schaer, S. Bromuri, and H. Mueller. Overview of the imageclef 2016 medical task. In CLEF2016 Working Notes. CEUR Workshop Proceedings, Sep 2016.
  • [4] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li. Imagenet: A large-scale hierarchical image database. In

    IEEE Conference on Computer Vision and Pattern Recognition

    , pages 248–255, Jun 2009.
  • [5] P. Ghosh, S. Antani, L. R. Long, and G. R. Thoma. Review of medical image retrieval systems and future directions. In 24th IEEE International Symposium on Computer-Based Medical Systems, Jun 2011.
  • [6] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, June 2016.
  • [7] J. Kalpathy-Cramer, A. G. S. de Herrera, D. Demner-Fushman, S. Antani, S. Bedrick, and H. Mueller. Evaluating performance of biomedical image retrieval systems an overview of the medical image retrieval task at imageclef 2004-2013. Computerized Medical Imaging And Graphics, 39:55–61, Jan 2015.
  • [8] S. Koitka and C. M. Friedrich. Traditional feature engineering and deep learning approaches at medical classification task of imageclef 2016 fhdo biomedical computer science group (bcsg). In CLEF2016 Working Notes. CEUR Workshop Proceedings, Seq 2016.
  • [9] A. Kumar, J. Kim, D. Lyndon, M. Fulham, and D. Feng. An ensemble of fine-tuned convolutional neural networks for medical image classification. IEEE Journal of Biomedical and Health Informatics, 21(1):31–40, Jan 2017.
  • [10] A. Kumar, D. Lyndon, J. Kim, and D. Feng. Subfigure and multi-label classification using a fine-tuned convolutional neural network. In CLEF2016 Working Notes. CEUR Workshop Proceedings, Sep 2016.
  • [11] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Conference on Computer Vision and Pattern Recognition, pages 2169–2178, Jun 2006.
  • [12] F.-F. Li and P. Perona. A bayesian hierarchical model for learning natural scene categories. In IEEE Conference on Computer Vision and Pattern Recognition, pages 524–531, Jun 2005.
  • [13] P. Li, S. Sorensen, A. Kolagunda, X. Jiang, X. Wang, C. Kambhamettu, and H. Shatkay. Udel cis at imageclef medical task 2016. In CLEF2016 Working Notes. CEUR Workshop Proceedings, Sep 2016.
  • [14] R. Li, T. Zeng, H. Peng, and S. Ji. Deep learning segmentation of optical microscopy images improves 3d neuron reconstruction. IEEE Transactions on Medical Imaging, PP, Mar 2017.
  • [15] P. Mettes, D. C. Koelma, and C. G. M. Snoek. The imagenet shuffle: Reorganized pre-training for video event detection. In ACM International Conference on Multimedia Retrieval, pages 175–182, Jun 2016.
  • [16] H. Mueller, J. Kalpathy-Cramer, D. Demner-Fushman, and S. Antani. Creating a classification of image types in the medical literature for visual categorization. In Conference on Medical Imaging - Advanced PACS-Based Imaging Informatics and Therapeutic Applications, Feb 2012.
  • [17] M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1717–1724, Jun 2014.
  • [18] L. Personnaz, I. Guyon, and G. Dreyfus. Collective computational properties of neural networks: New learning mechanisms. Physical Review A, 34:4217–4228, Nov 1986.
  • [19] D. M. W. Powers. Evaluation: From precision, recall and f-factor to roc, informedness, markedness and correlation. International Journal of Machine Learning Technology, 2(1):37–63, Dec 2011.
  • [20] D. Semedo and J. Magalhaes. Novasearch at imageclefmed2016 subfigure classification task. In CLEF2016 Working Notes. CEUR Workshop Proceedings, Sep 2016.
  • [21] W. Shen, M. Zhou, F. Yang, D. Yu, D. Dong, C. Yang, Y. Zang, and J. Tian. Multi-crop convolutional neural networks for lung nodule malignancy suspiciousness classification. Pattern Recognition, 61:663–673, Jan 2017.
  • [22] K. Sirinukunwattana and S. Raza. Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Transactions on Medical Imaging, 35(5):1196–1206, May 2016.
  • [23] L. Valavanis, S. Stathopoulos, and T. Kalamboukis. Ipl at clef2016 medical task. In CLEF2016 Working Notes. CEUR Workshop Proceedings, Sep 2016.
  • [24] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3360–3367, Jun 2010.
  • [25] Z. Wang, Y. Hu, and L.-T. Chia. Learning image-to-class distance matric for image classification. ACM Tansactions on Intelligent Systems and Technology, 4(2), Mar 2013.
  • [26] J. Weese and C. Lorenz. Four challenges in medical image analysis from an industrial perspective. Medical Image Analysis, 33:44–49, Oct 2016.
  • [27] F. Xie, H. Fan, and Y. Li. Melanoma classification on dermoscopy images using a neural network ensemble mode. IEEE Transactions on Medical Imaging, 36(3):849–858, Mar 2017.
  • [28] Y. Xu, T. Mo, Q. Feng, P. Zhong, M. Lai, and E. I.-C. Chang. Deep learning of feature representation with multiple instance learning for medical image analysis. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1626–1630, Jul 2014.
  • [29] S. Yang, W. Cai, H. Huang, Y. Zhou, D. D. Feng, Y. Wang, M. J. Fulham, and M. Chen.

    Large margin local estimate with applications to medical image classification.

    IEEE Transactions on Medical Imaging, 34(6):1362–1377, Jun 2015.
  • [30] Y. Yang, L. Yang, G. Wu, and S. Li. A bag-of-objects retrieval model for web image search. In ACM International Conference on Multimedia, pages 49–58, Oct 2012.
  • [31] C. Zhang, S. Bengio, M. Hardt, and O. Vinyals. Understanding deep learning requires rethinking generalization. In International Conference on Learning Representations, Apr 2017.