Chromosome classification is an important stage in karyotyping procedure. Karyotyping  is useful for detecting chromosomes abnormalities, including numerical and structural abnormalities which may result in several genetic diseases such as Down syndrome . In clinical practice, cytologists capture chromosomes that appears in the metaphase stage of cell division and use Giemsa staining technique to obtain banding patterns which has obvious unique bands of dark and light color. Then, cytologists can take advantage of chromosomes characteristics to order chromosomes in a standard format. Normally, a human cell owns 46 chromosomes consisted by 22 pairs of autosomes and 1 pair of sex chromosomes (XY or XX). However, chromosome classification is a highly skilled cytogenetic techniques and well-trained operators always need many years of experiences. Meanwhile, considerable manual effort is required yet to identify various types of chromosomes.
To reduce the burden of cytologists in chromosome classification, there are some computer aided classification methods proposed. Traditional classification methods mostly rely on handcraft features. Ming et al.  takes the banding patterns as features which are extracted by averaging gray profile, gradient profile and shape profile and then classified them by multilayer classifier. Markou et al. 
uses a support vector machine to discriminate chromosomes and take the band-profiles which are extracted along axis as input of classifier. Recently, researchers use deep learning method to solve chromosome classification problem and obtain decent performances. Sharma et al. firstly applies some pre-processing on chromosomes segmented through crowdsourcing and then classify them using CNN network. Gupta et al. 
also uses pre-processing algorithm for bent chromosomes and then employs Siamese Network to extract discriminative embeddings for the final Multi-layer Perceptron classifier. Qin et al. extracts global-scale features and local-scale features using varifocal mechanism and concatenates both above features to predict type and polarity simultaneously.
However, all these works are basically taking manually segmented single chromosomes as inputs of classifiers, while clinically all the analysis are based on the entire metaphase image. In this work, we propose a detection based framework, named DeepACC, which not only detect but also fine classify all the 24 class chromosomes automatically based on the entire metaphase image. As shown in Fig. 1, fine classification of the chromosomes is a challenge task since large intra-class distinctions and small inter-class differences. On the one hand, manual operations and changes of experimental conditions may import large intra-class distinctions between different batches of metaphase images, namely batch effects. On the other hand, early seven groups classification criterion [8, 19] reveals that different classes of chromosomes within the same group exist small inter-class differences. Besides, touching and overlapping of chromosomes on the metaphase images may further bring difficulties for classification. In the following, we enhance the classifier of the model by novelly incorporating the clinical prior knowledge of chromosomes into deep learning models.
Our model is developed based on the Faster R-CNN , and the head is separated into two isolated classification stream and regression stream, and then the classification branch is enhanced. We firstly introduce the Additive Angular Margin Loss  to enforce higher intra-class compactness and inter-class discrepancy of the model simultaneously. Secondly, as chromosomes usually appear in pairs, we adopt this fact as a prior knowledge and design a siamese structure to alleviate batch effect of the metaphase images. Finally, a Group Inner-Adjacency Loss is proposed to take the early grouping criterion of chromosomes [8, 19] as an additional prior knowledge to constraint the network, to further reduce inter-class similarities within groups.
The goal of this work is to find out all the 24 classes of chromosomes on a metaphase image, which can be served as a multi-class object detection problem. As shown in Fig. 2, we choose the classical object detection framework, Faster R-CNN  with backbone network ResNet-101 , as our base architecture. Additionally, to identify small chromosomes more accurately, we attach Feature Pyramid Network (FPN)  to the backbone network, which combine the high-level information with low-level information together for optimization.
The Region Proposal Network (RPN)  is then used to filter out candidate proposals. RPN scans each predefined anchor box for identifying whether the reference box is foreground or background and refining coordinates. Different from the original Faster R-CNN, RoIAlign  is introduced to crop features of each candidate proposal. We then divide the detection branch into a classification branch and a regression branch separately in which parameters are not shared. Specifically, the regression branch is consisted by two fully connected layers and constrained by Smooth L1 loss, which is similar to the original Fast R-CNN . Inspired by the visual object tracking model, the classification branch is adopted by a siamese architecture which consists two streams sharing two fully connected layers, the first stream named margin branch optimized by Additive Angular Margin Loss (Section 2.2) and the second stream named inference branch takes the top confident proposal of each class from the first stream or margin branch weights as local class centers to alleviate batch effects (Section 2.3). Finally, a Group Inner-Adjacency Loss is proposed to further enhance inter-class discrepancy within groups(Section 2.4).
2.2 Margin Branch Enhanced by Additive Angular Margin Loss
It is important to identify the feature differences (e.g. length, banding pattern and centromere index) between different classes of chromosomes. However, the differences between different classes of chromosomes are sometimes tiny but same classes may be in contrary. As a result, we need to obtain higher similarity for intra-class proposals and diversity for inter-class proposals. Inspired by recent research about loss functions of face recognition, we replace the original Cross Entropy (CE) Loss with Additive Angular Margin (AAM) Loss.
Similar to the chromosomes classification problem, large-scale face recognition model also need to enhance the discriminative power. Some works [11, 23, 3] try to achieve above requirements by incorporating margins in an established loss function. In the same way, we reformulate the last fully connected layer of the margin branch without bias term, it can be expressed as , where is -th column of weight and is the feature of -th proposal with label . Therefore, can be regarded as representation of -th class named global class center of -th class and represents the angle between global class center and feature . In practice, and is normalized to and rescaled by a scale factor for easier optimization. An additional margin penalty is added on each to enforce higher intra-class compactness and inter-class discrepancy during training. The formulation of Additive Angular Margin Loss is shown as in Eq. 1:
Comparing to the cross entropy loss, additive angular margin loss directly optimize chromosome feature embeddings to narrow the angle between the same class deep features and broad the angle between different classes.
2.3 Siamese Inference Branch to Alleviate Batch Effect
Besides intrinsic variations between different classes of chromosomes, cytologists may bring in high intra-class distinctions between different batches of metaphase images since different operations or conditions, namely batch effects. It is notable that batch effects is inevitable and severe especially in developing countries where most of cytologists are lack of good training. However, batch effects is hard to be learned because of uncertainty of human operation and circumstances. In this work, we try to alleviate batch effects through making full use of local informations on each metaphase image.
As shown in Fig. 3(a)(b), by replacing global class center with the most confident local feature which is more suitable for the specific batch samples, one may obtain a more accurate results and reduce the error, here we name the local feature as local class center. As a well-known prior knowledge, autosomes normally appear as a pair and we may use the most confident chromosome of each class to predict the other one. Practically, including X and Y chromosome, we take the most confident proposal’s feature as the local class center of each class to predict the remaining others.
Inspired by the visual object tracking model , we construct a parallel inference branch which yields a siamese architecture together with the margin branch. The two branches share the same input and the margin branch is used to obtain the local class centers which are further adopted as the weights in the inference branch. As illustrated in Fig. 3(c), to obtain weights of inference branch , we firstly find out the most confident proposal of each class from the margin branch. It is worth noting that some classes of chromosomes may be missing in some cases and local class center may introduce some instability at the beginning stage of the training. Therefore, we set a threshold to control where local class center comes from and normalize them. We finally obtain predict score using the adjusted weights as well as original features without normalization and constrain it by a cross entropy(CE) loss:
where is a bias term initialized as constant zero. The scores obtained in the inference branch are regarded as the final predicted results.
Remark: By keeping the bias term and constraining by CE loss, the inference branch not only takes the advantage of discriminative power bringing by the AAM loss and optimized decision boundaries which reduce batch effects, but also avoids information loss due to feature normalization and flexibility reduction due to the lack of bias terms (as pointed in  for ).
2.4 Group Inner-Adjacency Loss Using Prior Knowledge
Besides classical 24 classes criterion, cytologists generally agreed that chromosomes also can be classified to seven groups according to the size and centromeric index, including group (chromosome 1-3), (chromosome 4-5), (chromosome 6-12, X), (chromosome 13-15), (chromosome 16-18), (chromosome 19-20) and (chromosome 21-22, Y), namely Denver System [8, 19]. Though it is an empirical criterion, it somehow indicates that inter-class similarities within groups is more severe and need to take special care of. Therefore, inspired by the Non-Adjacency Loss , we take the clinically seven group criterion as a prior knowledge, and propose an additional group inner-adjacency loss to further optimize class discriminability of the model within each group.
Defining two different classes as adjacent if they belong to the same group, and a set of forbidden group inner-adjacency class . For each positive proposal with label , the probability of belonging to its forbidden group inner-adjacency class should be null. To this end, let be the probability vector of from the margin branch and be the probability vector of the top confident proposal of its adjacent class , we enforce to be low until two vectors are vertical in geometry space. In practice, the group inner-adjacency loss is designed to minimize the sum of element-wise product of the probability vectors between and its forbidden group inner-adjacency class where :
Here, is the number of group inner-adjacency classes of class and is the number of positive proposals.
3.1 Dataset and Implementation Details
We collect 3390 Giemsa-stained metaphase images with resolution from clinical cytogenetics laboratory, where each chromosome is labelled with a bounding box and its class by experienced cytologists. The dataset is divided into
as training, validation and testing set. All images are normalized by mean and standard deviation. During training, only random () horizontally flipping are used for data augmentation. DeepACC is end-to-end jointly optimized by the loss detailed as following:
Here indicates the original Faster R-CNN loss except for classification head. In all experiments, we set as , as and as . The margin penalty and scale factor used in Additive Angular Margin Loss are set as and respectively. All the remaining settings are the same as Faster R-CNN.
The classical mean Average Precision () is adopted to evaluate the performance of the model. However, since in practice we only take Top score class of each example into account for classification problem, we also introduce an
as an evaluation metric, which only take the Topscore bounding box of each proposal to compute mAP. Furthermore, since clinically cytologists pay more attention on proposals at high score (here we set threshold as ), we ignore low-score proposals and further reduce redundancy by class-agnostic NMS ( threshold) after Top score bounding box selection, and then introduce Accuracy () and Average Error Ratio () to evaluate the performance. is defined as proportion of true positive in all predictions and is defined as the fraction of sum of false positives and false negatives divided by the number of ground truth. The definitions of true positive, false positive and false negative are the same as DeepACE.
The model is implemented on MMDetection
toolbox that based on Pytorch framework. We set batch size as and trained the network for epochs with initial learning rate which is decayed by a factor of at and
epoch. Stochastic Gradient Descent (SGD) is adopted to optimize our network on a Nvidia Titan Xp GPU with momentum= 0.9.
3.2 Ablation Study
Ablation studies are performed on the validation set to test the effect of each individual module proposed in our model and summarized in Table 1. We also test and explore that simply separating classification and regression branch does not improve the performance (Table 1(a)(b)).
As shown in Table 1(e), combination of Additive Angular Margin Loss and Siamese Inference Branch can have significant improvements on all metrics, where mAP(%) over baseline by , (%) by , AER(%) by and (%) by . Specifically, it is worth noting that adding Additive Angular Margin Loss only (Table 1(c)) can deteriorate the performance at some extent, this may because that the lack of bias term as well as feature normalization in can destroy the discrimination ability of specific class (as pointed in ). However, more compact and well-separated features obtained from the can still help Siamese Inference Branch selects more representative local class centers to reduce batch effects (Table 1(d)(e)). In addition, adding the Group Inner-Adjacency Loss (Table 1(f)) can further reduce the misclassification error within groups and improve the performance.
3.3 Main Results
Final results are reported in Table 2, we compare our proposed model with two-stage object detection baseline, Faster R-CNN and one-stage object detection baseline, RetinaNet. All experiments are trained with combination of training and validation set and final results are reported based on the testing set. DeepACC greatly outperforms both of the baselines, achieving an mAP(%) of , (%) of , AER(%) of and Acc(%) of .
This work creatively proposed a detection based deep learning model to detect and fine classify chromosomes from entire metaphase image. After introducing the Additive Angular Margin Loss to enhance the discriminative power, a siamese inference branch is proposed to transform decision boundary of each class by making full use of prior knowledges that chromosomes usually appear in pairs. In addition, clinical grouping criterion is taken as a prior knowledge to further reduce classification errors within groups. The DeepACC significantly outperforms both the state-of-the-art two-stage and one-stage baselines.
This work was supported by National Natural Science Foundation of China(grant 31900979) to Li Xiao.
6 Author Contributions
Tianqi Yu, Manqing Wang, Fuhai Yu, Chan Tian, and Jie Qiao collected and labeled the data. Chunlong Luo, Li Xiao, Yufan Luo, and Yinhao Li designed the model and analyzed the data. Chunlong Luo implemented the model. Li Xiao conceived and supervised this work and wrote the manuscript with assistance from Jie Qiao and Chan Tian. Further information or questions should be directed to the Lead Contact, Li Xiao (email@example.com).
Fully-convolutional siamese networks for object tracking.
European conference on computer vision, pp. 850–865. Cited by: §2.1, §2.3.
-  (2019) MMDetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155. Cited by: §3.1.
Arcface: additive angular margin loss for deep face recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4690–4699. Cited by: §1, §2.1, §2.2, §2.2.
-  (2019) Removing segmentation inconsistencies with semi-supervised non-adjacency constraint. Medical image analysis 58, pp. 101551. Cited by: §2.4.
-  (2015) Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pp. 1440–1448. Cited by: §2.1.
-  (2017) Mask r-cnn. In Computer Vision (ICCV), 2017 IEEE International Conference on, pp. 2980–2988. Cited by: §2.1.
-  (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §2.1.
-  (1960) A proposed standard system of nomenclature of human mitotic chromosomes. The Lancet 275 (7133), pp. 1063–1065. Cited by: §1, §1, §2.4.
-  (2017) Feature pyramid networks for object detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 936–944. Cited by: §2.1.
-  (2017) Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp. 2980–2988. Cited by: Table 2.
-  (2017) Sphereface: deep hypersphere embedding for face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 212–220. Cited by: §2.2.
-  (2012) Automatic chromosome classification using support vector machines. Google Scholar, pp. 1–24. Cited by: §1.
-  (2010) Automatic pattern extraction and classification for chromosome images. Journal of Infrared, Millimeter, and Terahertz Waves 31 (7), pp. 866–877. Cited by: §1.
-  (2015) Karyotyping techniques of chromosomes: a survey. Int J Comput Trends Technol 22 (1). Cited by: §1.
-  (2019) PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, pp. 8024–8035. Cited by: §3.1.
-  (2009) Molecular genetic analysis of down syndrome. Human genetics 126 (1), pp. 195–214. Cited by: §1.
-  (2019) Varifocal-net: a chromosome classification approach using deep convolutional networks. IEEE transactions on medical imaging 38 (11), pp. 2569–2581. Cited by: §1.
-  (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pp. 91–99. Cited by: §1, §2.1, §2.1, Table 2.
-  (1963) The london conference on the normal human karyotype. Cytogenetic and Genome Research 2 (4-5), pp. 264–268. Cited by: §1, §1, §2.4.
-  (2017-07) Crowdsourcing for chromosome segmentation and deep classification. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vol. , pp. 786–793. External Links: Cited by: §1.
-  (2017-10) Siamese networks for chromosome classification. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Vol. , pp. 72–81. External Links: Cited by: §1.
-  (2017) Normface: l2 hypersphere embedding for face verification. In Proceedings of the 25th ACM international conference on Multimedia, pp. 1041–1049. Cited by: §2.3, §3.2.
-  (2018) Cosface: large margin cosine loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5265–5274. Cited by: §2.2.
DeepACE: automated chromosome enumeration in metaphase cell images using deep convolutional neural networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 595–603. Cited by: §3.1.