The medical image analysis is one of the fundamental, applied, and active research area during the last few decades. The classification of medical images such as colon cancer is one of the most popular core research areas of the medical image analysis 
. Categorization of tumors at the cellular level can help medical professionals to better understand the tumor characteristics which can facilitate them to explore various options for cancer treatment. Classifying cell nuclei from routine colon cancer (RCC) images is a challenging task due to cellular heterogeneity.
The American Cancer Society publishes colon cancer (also known as Colorectal cancer (CRC)) statistics every three years. The American Cancer Society Colorectal Cancer Facts & Figures 2017-2019  reports the following. In
, in the USA, an estimation says thatnew cases of colon cancer were found out of which people died, which includes men and women. The colon cancer is the third most dangerous cancer which affects both men and women. Thus, it is required to analyze the medical images for accurate colon cancer disease recognition.
Nucleus image classification has been applied to various histology related medical applications. Following are some recent attempts in applying image analysis or computer vision techniques in the medical domain. In 2014, Veta et al.  published a complete review article on breast cancer image analysis. Many researchers worked in the area of histological image analysis, a few of them are [4, 5]. Traditional machine learning methods have been employed by several researchers using handcrafted features obtained from histology images [6, 7]. Manually engineered features may not always represent the underlying structure of histology images. On the other hand, convolutional neural networks (CNNs) extract high-level and more semantic features automatically from the training data.
Recently, deep learning based approaches have achieved very promising performance in the field of computer vision and image analysis. In , Krizhevsky et al.  proposed a deep CNN model (called the AlexNet) consisting of 8 learnable layers for image classification. The AlexNet model is further extended to VGG-16 by Simonyan et al.  with
number of trainable layers. Later, the GoogLeNet with inception modules became popular for deep networks. In recent development, He et al.  proposed a deeper residual network (ResNet) with layers for image recognition in . The CNN based models have also shown very encouraging performance for other tasks such as object detection, segmentation, depth estimation, and action recognition, etc. Girshick et al.  proposed R-CNN model (i.e., Regions with CNN features) for object detection. The ‘You Only Look Once (YOLO)’ model was proposed by Redmon et al.  for a unified, real-time object detection. Repala et al.  built a dual CNN based unsupervised model for depth estimation. Recently, Singh et al. 
proposed Long Short-Term Memory (LSTM) networks and CNN based classifier to classify human actions.
The deep learning has been also utilized extensively for medical image and video analysis due to its capabilities to deal with complex data. In , IEEE Transactions on Medical Imaging published a special issue on deep learning in medical imaging which focused on the achievement of CNN and other deep learning based approaches . Litjens et al.  conducted a survey on deep learning in medical imaging by considering nearly 300 latest contributions, including image classification, object detection and segmentation tasks where deep learning techniques were used. Esteva et al.  proposed a deep CNN based classifier for skin cancer detection by training the model over a dataset of clinical images covering over different types of diseases. In , Rajpurkar et al.  proposed CheXNet which is a layer CNN model. The ChexNet model is trained over Chest X-ray14 dataset which is one of the largest publicly available chest X-ray dataset containing X-ray images belonging to different diseases.
Xu et al.  proposed an unsupervised deep learning model called auto-encoder to classify cell nuclei, where the higher level features are classified using soft-max classifier. Korbar et al.  introduced a deep neural network model to classify different types of colorectal polyps in whole-slide images. Very recently, Bychkov et al. 
proposed a classifier by combining the convolutional and recurrent neural network architectures for Colorectal cancer classification.
Sirinukunwattana et al.  proposed a convolutional neural network named as softmaxCNNIN27 to classify cell nuclei in histology images. Their softmaxCNNIN27 architecture has 5 trainable layers and learnable parameters. We have experimentally observed that the softmaxCNNIN27 model used by Sirinukunwattana et al.  is not deep enough as compared to the complexity of the histology image dataset. To overcome this problem, we have proposed a deep CNN model named as RCCNet having trainable layers with learnable parameters which outperforms softmaxCNNIN27  for the histological routine colon cancer nuclei classification task.
The main objective of this paper is to develop an efficient and simple CNN architecture suitable for the classification of histological colon cancer images. The simplicity considered is in terms of the number of layers and number of trainable parameters, which are compared against the widely used CNN models such as AlexNet, CIFAR-VGG, GoogLeNet and WRN. In this work, we figured out that a careful consideration of number of trainable layers and trainable parameters can lead to an efficient CNN model. The proposed model is called the RCCNet which is used for the RCC classification task. Experimentally, we compared the proposed method with other popular models such as softmaxCNNIN27 , softmaxCNN , AlexNet , CIFAR-VGG , GoogLeNet , and WRN . A promising performance is observed using the RCCNet in terms of the efficiency and accuracy.
The rest of the paper is organized as follows. Section II is devoted to the detailed description of the proposed RCCNet architecture. Section III presents the experimental setup including dataset description along with a description of compared methods. Results and Analysis are reported in section IV. Finally, section V concludes the paper.
Ii Proposed RCCNet Architecture
Categorization of histology images is hard problem due to the high inter-class similarity and intra-class variablility. The primary objective of our work is to design a Convolutional Neural Network (CNN) based architecture which classifies the colon cancer images. This section describes the proposed RCCNet which has seven trainable layers.
The proposed RCCNet architecture is illustrated in Fig. 1. In the proposed architecture, we considered histology images of dimension as input to the network. This CNN model has three blocks with seven trainable layers. In the block, two convolutional layers, viz., and are used just after the input layer. The layer is followed by a pooling layer () to reduce the spatial dimension by half. In the block, two convolutional layers (i.e., and layers) are followed by another pooling layer (). In the block, three fully connected layers, namely , , and are used in the proposed architecture. The input to layer of block is basically the flattened features obtained from layer. The convolutional layer produces a dimensional feature map by convolving filters of dimension
. The zero padding bypixel in each direction is done in layer to retain the same spatial dimensional feature map. The layer has the filters of dimension with no padding which produces a
dimensional feature map. The stride is set toin both and layers. In layer, the sub-sampling with the receptive field of is applied with a stride of and without padding which results in feature map of size . The layer produces feature maps of spatial dimension (i.e., spatial dimension is retained by applying zero padding with a factor of 1), which is obtained by applying filters of dimension with a stride of . Similar to layer, layer also does not apply padding and uses stride of . The layer produces features maps of dimension , obtained by convolving the filters of size . The second sub-sampling layer also uses the kernel size of with a stride of , which results in a dimensional feature map. The right and bottom border feature values of input are not considered in layer to get rid of dimension mismatch between input and kernel size. The feature map generated by
layer is flattened into a single feature vector of lengthbefore block (i.e., fully connected layers). So, the input to layer is dimensional feature vector and output is dimensional feature vector. Both input and output to layer is dimensional feature vectors. The last fully connected layer takes the input of dimension (i.e., the output of layer) and produces the values as the output corresponding to the scores for classes. This architecture consists of trainable parameters from 7 trainable layers (i.e., , , , , , , and layers).
On top of the last fully connected layer
of proposed RCCNet model, a ‘softmax classifier’ for multi-class classification is used to generate the probabilities for each class. The probabilities generated by the ‘softmax classifier’ is further used to compute the loss during training phase and to find the predicted class during testing phase.
Ii-a Training Phase
The categorical cross entropy loss is computed during the training phase. The parameters (weights) of the network are updated by finding the gradient of parameters with respect to the loss function. The cross-entropy loss (also known as the log loss) is used to compute the performance of a classifier whose output is a probability value ranging betweenand . Let be a three-dimensional input image to the network with class label where is the set of class labels. In the current classification task, . The output of the network is a vector which is,
where denotes the forward pass computation function and represents the class scores for the classes. The cross-entropy loss for , assuming that the target class (as given in the training set) is ,
The total loss over a mini-batch of training examples is considered in the training process.
Ii-B Testing Phase
At test time, for a given input image, the class label having the highest score is the predicted class label. The predicted class label is computed as,
where is the probability that belongs to class , which is computed as follows.
Iii Experimental Setup
This section is devoted to present the experimental setting including dataset description, a briefing about the compared models, training details and the evaluation criteria.
Iii-a Dataset Description
In order to find the performance of the proposed RCCNet for the task, we have used a publicly available ‘CRCHistoPhenotypes’ dataset111https://warwick.ac.uk/fac/sci/dcs/research/tia/data/crchistolabelednucleihe which consists of the histological routine colon cancer nuclei patches . This dataset consists of nuclei patches that belong to the four classes, namely, ‘Epithelial’, ‘Inflammatory’, ‘Fibroblast’, and ‘Miscellaneous’. In total, there are patches from the ‘Epithelial’ class, patches from the ‘Fibroblast’ class, patches from the ‘Inflammatory’ class and the remaining patches from the ‘Miscellaneous’ class. The dimension of each patch is . The sample cell nuclei patches from the ‘CRCHistoPhenotypes’ dataset is given in Fig. 2.
Iii-B Compared CNN Models
In order to justify the performance of the proposed RCCNet for the task, five state-of-the-art CNN models are implemented and a comparison is drawn. A brief overview of these architectures is given in the rest of this subsection.
Iii-B1 softmaxCNN_IN27 
Sirinukunwattana et al.  proposed softmaxCNNIN27 architecture for the classification task. This model has learnable layers including convolutional and
fully connected layers. Each convolutional layer is followed by a max-pooling layer to reduce the spatial dimension by half. Theconvolutional layer has filters of size , which results in a feature map of dimension . The max-pool reduces the dimension of feature map to . The convolutional layer has filters of kernel size , which produces a feature map of dimension . The max-pooling produces the dimensional feature map. It is further followed by three fully connected layers, which have , , and nodes, respectively. softmaxCNNIN27 model consists of trainable parameters.
We also modified the architecture of sirinukunwattana et al.  with some minimal changes to make it suitable for dimensional input. It is called softmaxCNN. Initially, the input images are up-sampled to . Then, zero padding by pixel in each direction is done, which results in a dimensional image. The convolution layer produces a dimensional feature map by convolving filters of size . This is followed by another convolution layer, which produces a dimensional feature map by applying filters of size . The rest of the architecture is same as the original softmaxCNNIN27 . softmaxCNN model consists of trainable parameters from trainable layers.
Iii-B2 AlexNet 
AlexNet  is the most popular CNN architecture, originally proposed for natural image classification. Initially we tried to make use of the original AlexNet  architecture by up-sampling the image dimension from to
. However, we experimentally observed no improvement even after training this model for 200 epochs. With this observation, we made minimal modifications to the AlexNet to fit for low resolution images. The image dimensions are up-sampled fromto . Then, zero padding by pixel in each direction is done, which results in a dimensional image. This is followed by convolution layer which produces a dimensional feature vector by applying filters of dimension . The convolutional layer produces the feature map of dimension by convolving filters of dimension . The rest of the architecture is same as original AlexNet  (the last fully connected layer is modified to have neurons instead of ). This architecture corresponds to trainable parameters with 8 trainable layers.
Originally, the VGG-16 model was introduced by Simonyan et al. 
for ImageNet challenge. Liu et al. proposed a modified VGG-16 architecture (CIFAR-VGG) for training low scale images like CIFAR-10 . We have utilized the CIFAR-VGG architecture  to train over histology images by changing the number of neurons in last FC layer to . This model has trainable layers with trainable parameters.
Iii-B4 GoogLeNet 
GoogLeNet  is the winner of ILSVRC 2014, which consists 22 learnable layers. GoogLeNet  is originally proposed for classification of large scale natural images. We made minimal changes to the GoogLeNet architecture  to work for low-resolution images. The convolutional layer produces a dimensional feature map by applying filters of dimension . Then, convolution layer computes a dimensional feature map by applying filters of dimension . This is followed by an inception block, which results in a dimensional feature vector. The remaining part of the model is similar to original GoogLeNet  except the last fully connected layer which is is modified to have neurons instead of . This CNN model corresponds to trainable parameters.
|Model Name||Trainable Parameters||Training Time (in minutes)||Classification Accuracy||Weighted Average F1 Score|
|Training Accuracy %||Testing Accuracy %||Overfitting||Training F1 score||Testing F1 score|
Iii-B5 Wrn 
He et al.  introduced the concept of residual networks for natural image classification. Zagoruyko et al.  proposed a wide Residual Network (WRN) to train low resolution images of CIFAR-10 dataset. In this paper, we have adapted the WRN architecture  for comparison purpose. The number of nodes in the last fully connected layer is changed to corresponding to the number of classes in used histology dataset. The WRN architecture used in this paper consists of trainable parameters.
Iii-C Training Details
The initial value of the learning rate is considered as , and iteratively decreased with a factor of
if there is no improvement in validation loss during training. The rectified linear unit
is employed as the activation function in all the implemented models. To reduce over fitting, dropout is used after of each fully connected layer with a rate of 28] used after every trainable layer(except last layer) after is applied. All the models are trained for epochs using Adam optimizer  with , , and . The of entire dataset ( images) is used for the training and remaining ( images) is used to test the performance.
Iii-D Evaluation Criteria
In order to assess the performance of CNN models, we have considered two performance measures accuracy and weighted average F1 score. In this paper, the training time is also considered as one of the evaluation metrics to judge the efficiency of the CNN models.
Iv Results and Analysis
We have conducted the extensive experiments to compare the performance of proposed RCCNet model with other state-of-the-art CNN models like softmaxCNN , AlexNet , CIFAR-VGG , GoogLeNet , and WRN . Table I presents the performance comparison among the CNN models in terms of the number of trainable parameters, training time, training accuracy, testing accuracy, amount of over-fitting, training F1 score, and testing F1 score. Followings are the main observations from the results of Table I:
The proposed RCCNet model outperforms the other CNN models both in terms of test accuracy and test weighted F1 score because the proposed model is highly optimized for histological routine colon cancer images.
The softmaxCNN model  proposed originally for histological routine colon cancer images is not enough complex, whereas our model is enough complex to produce a reasonable performance.
The proposed RCCNet model is better generalized as compared to other CNN models and results in lowest amount of over-fitting as depicted in Table I. The highest amount of over-fitting is observed for wide residual network (WRN) . This analysis points out that the amount of over-fitting is closely related to the network structure like depth of network, number of learnable parameters, and type of network (i.e., plain/inception/residual).
Fig. 3 shows the comparison among test accuracies of implemented CNN architectures. From Fig. 3, it is observed that the AlexNet , GoogLeNet  and WRN  converge quickly compared to other CNN architectures. The softmaxCNNIN27  model is slow in terms of the convergence. However, the proposed RCCNet architecture is very reasonable and converges smoothly.
In this paper, we have proposed an efficient convolutional neural network based classification model to classify colon cancer images. The proposed RCCNet model is highly compact and optimized for histological low-resolution patches. Only plain trainable layers are used with trainable parameters. The classification experiments are performed over histological routine colon cancer patches. The performance of the proposed RCCNet model is compared with the other popular models like AlexNet, CIFAR-VGG, GoogLeNet, and WRN. The experimental results point out that the RCCNet is better generalizes and outperforms other models in terms of the test accuracy and weighted average F1 score. The proposed RCCNet model attains classification accuracy and weighted average F1 score. The RCCNet is also highly efficient in terms of the training time as compared to deeper and complex networks.
This research is supported in part by Science and Engineering Research Board (SERB), Govt. of India, Grant No. ECR/2017/000082.
-  K. Sirinukunwattana, S. E. A. Raza, Y.-W. Tsang, D. R. Snead, I. A. Cree, and N. M. Rajpoot, “Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images,” IEEE transactions on medical imaging, vol. 35, no. 5, pp. 1196–1206, 2016.
-  R. L. Siegel, K. D. Miller, S. A. Fedewa, D. J. Ahnen, R. G. Meester, A. Barzi, and A. Jemal, “Colorectal cancer statistics, 2017,” CA: a cancer journal for clinicians, vol. 67, no. 3, pp. 177–193, 2017.
-  M. Veta, J. P. Pluim, P. J. Van Diest, and M. A. Viergever, “Breast cancer histopathology image analysis: A review,” IEEE Transactions on Biomedical Engineering, vol. 61, no. 5, pp. 1400–1411, 2014.
-  M. Arif and N. Rajpoot, “Classification of potential nuclei in prostate histology images using shape manifold learning,” in Machine Vision, 2007. ICMV 2007. International Conference on. IEEE, 2007, pp. 113–118.
-  H. Sharma, N. Zerbe, D. Heim, S. Wienert, H.-M. Behrens, O. Hellwich, and P. Hufnagl, “A multi-resolution approach for combining visual information using nuclei segmentation and classification in histopathological images.” in VISAPP (3), 2015, pp. 37–46.
-  T. R. Jones, A. E. Carpenter, M. R. Lamprecht, J. Moffat, S. J. Silver, J. K. Grenier, A. B. Castoreno, U. S. Eggert, D. E. Root, P. Golland et al., “Scoring diverse cellular morphologies in image-based screens with iterative feedback and machine learning,” Proceedings of the National Academy of Sciences, vol. 106, no. 6, pp. 1826–1831, 2009.
H. Chang, A. Borowsky, P. Spellman, and B. Parvin, “Classification of tumor
histology via morphometric context,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 2203–2210.
-  Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.
-  A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.
-  K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
-  C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
-  K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
-  R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587.
-  J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
-  V. K. Repala and S. R. Dubey, “Dual cnn models for unsupervised monocular depth estimation,” arXiv preprint arXiv:1804.06324, 2018.
-  K. K. Singh and S. Mukherjee, “Recognizing human activities in videos using improved dense trajectories over lstm,” in Computer Vision, Pattern Recognition, Image Processing, and Graphics: 6th National Conference, NCVPRIPG 2017, Mandi, India, December 16-19, 2017, Revised Selected Papers 6. Springer, 2018, pp. 78–88.
-  H. Greenspan, B. van Ginneken, and R. M. Summers, “Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1153–1159, 2016.
-  G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. van der Laak, B. van Ginneken, and C. I. Sánchez, “A survey on deep learning in medical image analysis,” Medical image analysis, vol. 42, pp. 60–88, 2017.
-  A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, and S. Thrun, “Dermatologist-level classification of skin cancer with deep neural networks,” Nature, vol. 542, no. 7639, p. 115, 2017.
-  P. Rajpurkar, J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Ding, A. Bagul, C. Langlotz, K. Shpanskaya et al., “Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning,” arXiv preprint arXiv:1711.05225, 2017.
J. Xu, L. Xiang, Q. Liu, H. Gilmore, J. Wu, J. Tang, and A. Madabhushi, “Stacked sparse autoencoder (ssae) for nuclei detection on breast cancer histopathology images,”IEEE transactions on medical imaging, vol. 35, no. 1, pp. 119–130, 2016.
-  B. Korbar, A. M. Olofson, A. P. Miraflor, C. M. Nicka, M. A. Suriawinata, L. Torresani, A. A. Suriawinata, and S. Hassanpour, “Looking under the hood: Deep neural network visualization to interpret whole-slide image analysis outcomes for colorectal polyps,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on. IEEE, 2017, pp. 821–827.
-  D. Bychkov, N. Linder, R. Turkki, S. Nordling, P. E. Kovanen, C. Verrill, M. Walliander, M. Lundin, C. Haglund, and J. Lundin, “Deep learning based tissue analysis predicts outcome in colorectal cancer,” Scientific reports, vol. 8, no. 1, p. 3395, 2018.
-  S. Liu and W. Deng, “Very deep convolutional neural network based image classification using small training sample size,” in Pattern Recognition (ACPR), 2015 3rd IAPR Asian Conference on. IEEE, 2015, pp. 730–734.
-  S. Zagoruyko and N. Komodakis, “Wide residual networks,” arXiv preprint arXiv:1605.07146, 2016.
-  A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” 2009.
-  N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
-  S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
-  D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.