Every year, several million people die of cancer in the world due to the inaccessibility of appropriate detection schemes and consequent ineffective treatments . Over the last decades, scientists have applied different methods to detect cancer tissues at an early stage. Such investigation is motivated by the fact that early diagnosis can facilitate the clinical management of patients. As a consequence, researchers have been examining methods for the early detection of cancers via several methods including cancer screening, solid, liquid and optical biopsy, prognostic determination, and monitoring. However, up till now, there are no known diagnostic procedures that do not hurt the physical health of patients during the process of cancer detection, being such a method invasive. Consequently, early diagnosis should require the ability not only to identify cancer tissue as small as a single cell but having non-invasiveness as a prerequisite.
With the advent of new digital technologies in the field of medicine, Artificial Intelligence (AI) methods have been applied in cancer research to complex datasets in order to discover and identify patterns and relationships between them. Machine Learning (ML) is a branch of AI related to the problem of learning from data samples to the general concept of inference. In turn, DL is a part of ML methods based on learning data representation. DL algorithms, in particular, convolutional networks, have rapidly become a methodology of choice for analyzing medical images. A fundamental concept in DL is to let computers learn the features that optimally represent the data for the problem to be handled. This goal can be approached by building models (networks) composed of many layers that transform input data (in our case medical images) to outputs (e.g. a classification such as disease being present/absent) while learning increasingly higher level features. In the last decade of application of DL to medical images, Convolutional Object Detection (COD) has become a successful approach to cancer analysis. In this paper, we have investigated the use of a COD-based method to several differentiated samples of cells cultured on a glass slide, with the purpose to discriminate osteosarcoma cells from MSCs (osteoblasts). The results are auspicious, exhibiting an accuracy of nearly one on the available dataset. These results related to the classification of cells of different malignant degree, ranging from normal to cancer cells, can generate important advantages in the study of cell seeding and cell growth. Indeed, such results allow efficient analysis of single cells simply by employing an optical microscope without using conventional biochemical methods that are time-consuming and may require a large number of cells.The next step will be to extend the algorithm to large populations of cells and tissues with the purpose to improve digital histopathology. The paper is organized as follows. First, related works are described in Section 2. Section 3 describes materials and methods, focusing on the procedure followed for the cell culture (3.1), on the construction, augmentation, and annotation of the dataset (3.2) and, finally, on the chosen network architecture (3.3). Section 4 reports the results of the training and accuracy of the method applied. Finally, Section 5 concludes the paper with discussion for future work. This paper extends our conference contribution .
2 Related works
The automatic classification of biological samples has received a lot of attention during the last years. Most of the conventional approaches rely on a feature extraction step, followed by feature classification for detecting the presence of structures of interest in biological images. Traditional methods have been based on handcrafted features, mainly consisting in descriptors of shape and appearance, including color and texture features. In this approach, general-purpose and ad hoc features are computed on the region of interest or the segmented structure of interest to gather into a single vector all the information for solving the visual task. By contrast, inDL approaches, significant features for the visual task are not defined a priori but they are learned during the training process. Such a new approach has recently shown expert-level accuracy in medical image classification, improving new methods in diagnostic pathology . Digital pathology exploits the quantification and classification of digitized tissue samples by supervised deep learning. This innovative approach to histopathology making use of digital methodologies has shown excellent results even for tasks previously considered too challenging to be accomplished with conventional image analysis methods [4, 29, 14, 18, 19, 7]. In histopathology, several DL results have recently appeared. In , the authors present two successful applications of DL in reducing the workload for pathologists, namely prostate cancer identification in biopsy specimens and breast cancer metastasis detection in sentinel lymph nodes. Their work proves the potential of DL in increasing objectivity of diagnoses; indeed all glass slides in which prostate cancer and micro- and macro-metastases of breast cancer were present were automatically detected; slides featuring normal tissue only could be excluded without the use of any additional immunohistochemical markers or human intervention. Similarly, in , a Convolutional Neural Network (CNN
) is trained to provide a simple, efficient and effective method for achieving state-of-the-art classification and segmentation for the MICCAI 2014 Brain Tumor Digital Pathology Challenge. Transfer learning was used in their work, starting with a network pre-trained on an extensive general image database. Again, in, the authors address the classification of breast cancer histology images using transfer learning starting with the general Inception Resnet v2 for direct labeling of the full images. In , the authors proposed two different CNN architectures for breast cancer, namely a single task CNN is used to predict malignancy and multi-task CNN is used to predict both malignancy and image magnification level simultaneously. The results of their methods are compared using as a benchmark the BreaKHis dataset. All the previous works discussed above deal with general histological images to classify the whole images in order to decide whether there is or not the presence of malignant cells. Concerning the specific case of osteosarcoma, which is the focus of the present paper, in , a CNN is defined, trained and evaluated on hematoxylin and eosin stained images. The goal of their network is to assign tumor classes (viable tumor, necrosis) versus non-tumor directly to input slide images.
Also, many tasks in digital pathology, directly or indirectly connected to tumor cell differentiation, require the classification of small clusters of cells up to a single cell, if possible. For this purpose, differently, from the works mentioned above, this paper investigates the classification of single cultured cells with a known grade of differentiation with a supervised DL approach. Specifically, COD-based DL method is applied to several differentiated samples of cells cultured on a glass slide, with the primary purpose to discriminate osteosarcoma cells from MSCs (osteoblasts).
Within the ML techniques applied for the analysis of cancer cells, recently, COD has gained considerable interest 
. Besides, several methods have been proposed to address the object recognition task, and many software frameworks have been implemented to design, train and use deep learning networks (such as Caffe, Apache MXNet 
and many others). Among all such methods, Google TensorFlow is currently one of the most used frameworks, and its Object Detection API emerged as a potent tool for image recognition. Since the case study proposed in this paper requires the highest accuracy architecture allowable, we selected the Faster Region Convolutional Neural Network (Faster R-CNN) [22, 23] that is an original region proposal network sharing features with the detection network that improves both region proposal quality and object detection accuracy. Faster R-CNN uses two networks: a Region Proposal Network (RPN) to generate region proposals and a detector network to discover object instances. The RPN produces region proposals more quickly than the Selective Search 
algorithm used in previous solutions. By sharing information between the two networks, the accuracy is also improved, and this solution is currently the one with the best results in the latest object detection competitions. Faster R-CNN approach can be applied using several network architecture as elemental deep features encoders. In a guide for selecting the right architecture depending on speed, memory and accuracy is provided.
Concerning general purpose CODs, evaluating a DL approach to digital histopathology poses the problem of collecting a dataset sufficiently rich for performing an adequate training of the network. Indeed, as it is well known, DL
methods require many examples to understand and learn the best representation of an object model. Some of the works as mentioned above resorted to the use of transfer learning, starting with a network pre-trained on large datasets, such as ImageNet. However, also proper data augmentation strategies have been used with good results to overcome over-fitting issues. Conventional data augmentation methods address both the spatial and appearance domains of the images, by applying to the original images geometrical transformations (mainly orthogonal transformation such as rotations and mirroring) and/or intensity transformations (e.g. contrast stretching). For instance, in, the authors use spatial data augmentation (arbitrary rotation, mirroring and scaling) during the training of all models, while noticing that the most prominent source of variability in histopathology images is the staining color appearance. In , they propose a so-called multi-scale fusion data augmentation method: their original database is augmented with a factor of 14 by rotation, scaling and mirroring randomly over all samples. They employed rotations by multiples of the right angle and a scale factor up to , as well as horizontal and vertical mirroring, addressing the classification problem of breast cancer pathological images.
3 Material and Methods
3.1 Cells Culture
Normal, cancerous and mixed cells were cultured on glass slides. Details can be found in ; in this paper we briefly describe the essential difference among the cell populations under investigation. Undifferentiated MSCs were isolated from human bone marrow according to a previously reported method  and used to perform three culture strategies. MSCs were plated on glass slides inside Petri dishes at a density of 20,000 cells with 10% fetal bovine serum (FBS). The samples were cultured for 72 h, then fixed in 1% neutral buffered formalin for 10 min at 4C. Osteosarcoma cells consisted of human cells, named MG-63, were seeded on six glass slides at 10,000 cells. Finally mixed cancer and healthy cells were plated on six glass slides inside Petri dishes at 10,000 cells with 10% FBS.
At each endpoint, all the samples were fixed in 1% (w/v) neutral buffered formalin for 10min at 4C. Morphologies are visible in Figure 1, as imaged by an inverted microscope (Nikon Eclipse Ti-E).
3.2 Data set collection, annotation and augmentation
A total of images has been collected using two different microscopes, working in two different color spaces: one acquires conventional RGB images while the other acquires monochrome images with green background density. Experienced users have manually annotated all the images. Namely, it was requested to identify in each of the images a number of rectangular regions corresponding to particular cells and cell clusters. Five categories have been used to label the regions:
Single cancer cell
Single MSC cell
To ease the annotation tasks, a graphical interface for performing annotation has been provided to the experts. The interface is based on the LabelImg Software  and allows to insert multiple instances of labeled regions in each of the images in the dataset. A total of 279 objects were labeled in the images.
The dataset was therefore augmented applying both spatial and intensity transformations. With respect to other approaches that perform augmentation online directly during the training stage by applying transformations randomly, in this paper augmentation was performed offline before training. Since the dataset contains a relatively small number of images and objects when compared to large general image datasets, there is no memory and efficiency concern in the present case. For spatial transformations, we applied the dihedral group consisting of the symmetries of the square. Each image and the associated labeled regions were transformed accordingly, yielding a boost in the number of samples in the dataset. As for what regards the color space, power law transform has been used to augment the datasets and make the results more robust with respect to illumination changes:
where represents the original input pixel value, is the output pixel value obtained after power law transformation and are the parameters of the transform. In our experiments, we fixed and . In the case of RGB images, the power law transform was applied to each color channel. In general, such a procedure allowed for a boost in dataset size.
Finally, images and labels were automatically converted into the relative TensorFlow formats. Images were encoded into TensorFlow records, and labels were produced into Comma Separated Values (CSV) listing. Each row in the CSV listing contains the filename, the image size, the label and the top-left and bottom-right corner of the object determined by the domain expert.
3.3 CNN for cell detection and classification
Among the possible approaches to COD, in this paper, Faster R-CNN is adopted. Faster R-CNN uses two sub-networks: a deep fully convolutional network that proposes regions (named Region Proposal Network - RPN) and another module that classifies the proposed regions (classification network) . The two sub-networks share the first layers which act as a feature extraction module. Several architectures can be used for building the feature extraction module. Specifically, Inception Resnet v2 model was selected in this paper and instantiated for this particular application making use of TensorFlow . Transfer learning was used to cope with the limited dataset of images, which is not sufficient for dealing with training from scratch. Namely, an inference graph for Inception Resnet v2 pre-trained on COCO dataset 
has been imported. On the basis of the feature extracted, the RPN produces candidates for regions that might contain objects of interest. Namely, sliding a small window on the feature map, the RPN produces probabilities about the object presence in that region for region boxes of fixed aspect ratio and scale; a bounding box regressor also provides optimal size and position of the candidate rectangular areas in an intrinsic coordinate system. Candidates with a high probability of object presence are then passed to the classification network that is in charge of assessing the presence of an object category inside the region. As a training strategy, firstly only the final fully connected layers of the two sub-networks were trained, leaving frozen all the other layers. In a fine-tuning phase, also the layers in the feature extraction module were optimized by using the training routines made available in TensorFlow.
Given the limited dataset available and with the primary goal of demonstrating the applicability of DL to the problem of cell classification, it was opted to perform fold cross validation with in order to obtain more statistically significant results. The original set of images was partitioned into non-overlapping subsets , ,, with images each. The data augmentation strategy described in Section 3.2 was then applied to each subset () producing the extended set with cardinality as well an associated list of labeled regions.
Multiple training and validation sessions were then carried out. In particular for each (), a network was optimized using as training set , while the set might be used for validation. Notice that we opted for this partitioning approach in order to keep fully separated the training set form the validation set. Approximately, the proportion of the split between training and validation is , since the number of regions of interest contained in each subset does not vary significantly.
As an additional experiment, the same training procedure was repeated not taking in input the original monochrome and RGB images, but converting first all the the images to grayscale using .
Each training phase lasted five days for all the training sets, using 300 regions proposals and learning parameters set to for the first cycle and then reduced to . In the RPN, four scales corresponding to and three aspect ratios were used.
All the inference graphs produced have been exported and tested for inference on the validation set.
Figure 2 reports examples of localization and recognition using the first graph on a RGB image.
Figure 3 shows an example of the second graph localization and recognition on another gray-scale image.
The average accuracy obtained using RGB and the original monochrome images was . When using the images converted to grayscale very similar results have been found with an accuracy of . On the basis of these results, the use of color seems not to provide significant information for classification.
All training procedures have been executed on a PC with a 4 cores 8 threads Intel(R) Core(TM) i7-4770 CPU @ 3.40 featuring 16 Giga Bytes DDR3 of RAM, an Nvidia Titan X powered with Pascal, and Ubuntu 16.04 as operating system. Localization and recognition of new images require less than one second on a personal computer with a modern Intel I7 CPU.
Classification of single or small clusters of cancer cells is a crucial question for early diagnosis. In this paper, a Deep Learning approach to recognize single or small clusters of cancer cells has been presented. The Deep Learning method adopted was based on Faster-RCNN technique and applied to several samples of cells cultured on glass slide with the purpose to discriminate osteosarcoma cells from osteo-differentiated MCSs (osteoblasts). The ability of such an algorithm to identify and classify approximately the 100% of the investigated cells potentially will allow us to extend the method to large population cells or tissues. These results related to the classification of cells of different malignant degree, ranging from normal to cancer cells, can have significant consequences in the study of cell seeding and cell growth. Another essential advantage of our results is that they allow efficient analysis of single cells by merely employing an optical microscope without using conventional biochemical methods that are time-consuming and may require a large number of cells. The next step will be to extend the algorithm to large populations of cells and tissues with the purpose to improve digital histopathology.
This research was performed in the framework of the BIO-ICT lab, a joint initiative by the Institute of Biophysics (IBF) and the Institute of Information Science and Technologies (ISTI), both of the National Research Council of Italy (CNR).
We would like to thank NVIDIA Corporation for its support: this work would have been very time-consuming without a Titan X board powered by Pascal that was granted by NVIDIA to the Signals & Images Lab of ISTI-CNR in 2017.
In turn, we wish to thank Serena Danti, from the Department of Engineering, University of Pisa, and Luisa Trombi and Delfo D’Alessandro, from the Department of Medicine, University of Pisa, for useful support with biological samples.
-  (2016) TensorFlow: a system for large-scale machine learning.. In OSDI, Vol. 16, pp. 265–283. Cited by: §2, §3.3.
Deep learning for magnification independent breast cancer histopathology image classification.
2016 23rd International conference on pattern recognition (ICPR), pp. 2440–2445. Cited by: §2.
-  (2018) Deep learning based tissue analysis predicts outcome in clorectal cancer. Scientifics Reports 8. External Links: Cited by: §2.
-  (2013) Mitosis detection in breast cancer histology images with deep neural networks. In International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 411–418. Cited by: §2.
-  (2013-10)(Website) Note: Accessed 2018-10-30 External Links: Cited by: §4.
-  (2018) Deep learning approach to human osteosarcoma cell detection and classification. In International Conference on Multimedia and Network Information System, pp. 353–361. Cited by: §1, §2, §3.1.
-  (2016) Single-cell phenotype classification using deep convolutional neural networks. Journal of biomolecular screening 21 (9), pp. 998–1003. Cited by: §2.
-  (2018) Classification of breast cancer histology images through transfer learning using a pre-trained inception resnet v2. In International Conference Image Analysis and Recognition, pp. 763–770. Cited by: §2.
-  (2019) https://mxnet.apache.org/. Note: hhttps://mxnet.apache.org/Last retrieved August 6, 2020 Cited by: §2.
Speed/accuracy trade-offs for modern convolutional object detectors.
2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 3296–3297. External Links: Cited by: §2.
-  (2011) Human cancer classification: a systems biology-based model integrating morphology, cancer stem cells, proteomics, and genomics. Journal of Cancer 2. Cited by: §1.
-  (2014) Caffe: convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia, pp. 675–678. Cited by: §2.
-  (2017) Domain-adversarial neural networks to address the appearance variability of histopathology images. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 83–91. Cited by: §2.
-  (2017-07) Classifying osteosarcoma patients using machine learning approaches. In 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Vol. , pp. 82–85. External Links: Cited by: §2.
-  (2014) Microsoft coco: common objects in context. In European conference on computer vision, pp. 740–755. Cited by: §3.3.
-  (2016) Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Scientific reports 6, pp. 26286. Cited by: §2.
-  (2016) World cancer report 2014. Geneva, Switzerland: World Health Organization, international agency for research on cancer, WHO press, 2015. Oxford University Press. Cited by: §1.
-  (2017-10) Convolutional neural network for histopathological analysis of osteosarcoma. 25, pp. . Cited by: §2.
-  (2017) Histopathological diagnosis for viable and non-viable tumor prediction for osteosarcoma using convolutional neural network. In Bioinformatics Research and Applications, Z. Cai, O. Daescu, and M. Li (Eds.), Cham, pp. 12–23. External Links: Cited by: §2.
-  (2017) Histopathological diagnosis for viable and non-viable tumor prediction for osteosarcoma using convolutional neural network. In International Symposium on Bioinformatics Research and Applications, pp. 12–23. Cited by: §2.
-  (2018) Histopathological breast cancer image classification by deep neural network techniques guided by local clustering. BioMEd Research International 2018. Cited by: §1.
-  (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.), pp. 91–99. External Links: Cited by: §2, §3.3.
-  (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence 39 (6), pp. 1137–1149. Cited by: §2.
-  (2015) Cancer classification in the genomic era: five contemporary problems. Human Genomics 9. Cited by: §1.
-  (2008) Human autologous plasma-derived clot as a biological scaffold for mesenchymal stem cells in treatment of orthopedic healing. Journal of Orthopaedic Research 26 (2), pp. 176–183. Cited by: §3.1.
-  (2015) LabelImg. git code. Note: https://github.com/tzutalin/labelImgLast accessed 11 May 2018 Cited by: §3.2.
-  (2013) Selective search for object recognition. International Journal of Computer Vision. External Links: Cited by: §2.
-  (2017) Deep learning model based breast cancer histopathological image classification. In 2017 IEEE 2nd international conference on cloud computing and big data analysis (ICCCBDA), pp. 348–353. Cited by: §2.
-  (2015) Beyond classification: structured regression for robust cell detection using convolutional neural network. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 358–365. Cited by: §2.
-  (2015) Deep convolutional activation features for large scale brain tumor histopathology image classification and segmentation. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 947–951. Cited by: §2.