1 Introduction
Neuronal connectivity is one of the key topics on the frontier of brain science [12]
. The exquisite connectivity map is fundamental to understand human intelligence and emotion, and beneficial for designing frameworks of artificial intelligence algorithms. Reconstruction of full morphology for every single neuron provides the ultimate resolution in mapping connectivity at whole brain level
[18, 17]. Specifically, neurons are genetically labelled and imaged, followed by digital reconstructions to represent the spatial location and topology as a treelike structure. The visualization of a reconstructed neuron in the whole mouse brain using the Vaa3D platform [9] is displayed in Fig. 3.Neuron reconstruction, or tracing, is a challenging task as axonal/dendritic arborization can be both dense and complex. Reconstruction of the axonal arbor of one single typical neuron requires tracing dozens of branches in a 3D image with thousands of voxels in each dimension. In addition, imaging signals can be weak in certain brain areas and neighboring neurons often have branches that are close to each other, as shown in Fig. 6, making it difficult to discriminate the connections. As a result, although dozens of methods [6, 13, 15, 14, 19] have been proposed for the purpose of automatic neuron tracing, their results are still far from satisfactory. Manual tracing by welltrained human annotators is still an indispensable step, where integrity of tracing results is controlled by either consensus from multiple annotators [18, 17] or by several rounds of examinations (unpublished intermediate annotations).
The disadvantage of manual tracing is that it cannot be systematic and is highly laborintensive and timeconsuming. Due to the complexity of neuron morphology, even welltrained annotators may need a Virtual Reality (VR) equipment for correct reconstruction [16]. Therefore, it is important to introduce systematic algorithms to improve the quality and efficiency. To this end, we collaborate with neuron annotators and find out that the identification of key points, such as branching points or termination points, is an essential step in tracing, as missing or erroneously adding of such a single point may lead to serious topological errors. The identification of these points is also one of the most timeconsuming steps in the process of manual tracing. Therefore, the neuron annotators can benefit a lot from a computeraided system which can provide guidance to determine these key points.
In this work, we propose a framework to formulate the quality control problem of neuron reconstruction into a binary classification task for point of interest (where the wrong tracing starts). Benefiting from the recent development of deep learning [10, 2, 5, 3, 20], several commonly used networks for 2D image recognition are converted into 3D version and their performance on this problem is investigated. The cross validation experiments on a large dataset demonstrate that the proposed approach can not only evaluate the quality of neuron reconstruction, but also provide guidance for the annotators to locate the problem without too many false alerts.
2 Materials and Methods
2.1 Data
The manual reconstruction (or tracing) of neuron morphology consists of several rounds of examination. In each round, an annotator manually traces all the neuronal points, and then another annotator validates the reconstructed results and marks the mistakes, which are going to be corrected in the next round. Such process usually needs to be repeated for three times unless there are no further mistakes. The neurons passing the final round are used as the gold standard for the correct reconstructions, while the neurons with marked mistakes can be employed as the incorrect cases.
In this study, there are 254 neurons with correct reconstructions, and each of them has 1 or 2 wrong reconstructions from different rounds (421 wrong reconstructions in total). To clarify notations, up case letter is used to represent a neuron reconstruction and lower case letter is applied to indicate a single neuronal point. A reconstructed neuron is stored in an SWC file, which is a standardized neuromorphometric format [7] and commonly used for neuron reconstruction sharing as well as neuronal morphology analysis. Each line in an SWC file represents a neuronal point with its seven properties, including the voxel’s identifier number, neuronal type (soma, axon and dendrite), x coordinate, y coordinate, z coordinate, radius, and the identifier number of its parent ( is known as the child of ). In addition, the corresponding 3D optical microscope images cropped based on the SWC files are used to provide the intensity information of the neurons and the surrounding background.
2.2 Problem Formulation
To determine whether the reconstruction of a neuron is correct or not, an intuitive way is to first generate a 3D binary map based on the SWC file, in which the intensities of the voxels labeled as neuronal points are set to be 1 and the intensities of the rest voxels are set to be 0; then add this binary map as another channel to the corresponding optical microscope image; and finally feed the concatenated image into a deep neural network for classification. However, the size of the 3D optical microscope image could be extremely large,
i.e., thousands of voxels in each dimension. With such a large image as input, the classifier requires several terabytes of memory and months to train. In addition, due to the sparsity of a single neuron reconstruction as shown in Fig.
3, this large image may feed too much irrelevant information to the network and lead to inferior performance. On the other hand, such a classifier can only provide the evaluation of the whole neuron. It would still be difficult for the annotators to locate the problem due to the tens of thousands points in each neuron.Therefore, we choose to perform the classification on each single point in this work instead of the whole image. A commonly used approach for pointwise recognition is to crop a small region around a point from the whole image and feed it to a deep neural network to determine the category of this point. However, an incorrect reconstruction does not mean that all its neuronal points are wrongly traced. Comparison between the correct and wrong reconstruction is necessary to determine the category of points.
Considering the possible disturbance of manual tracing, slight deviations of coordinates should be allowed. We define a point from reconstruction having a match in reconstruction if 1) and are the reconstructions of the same neuron; 2) there is a point in whose Euclidean distance to is less than a threshold (set as 4 in this study), i.e.,
(1) 
where , , and represent the 3D coordinates of point .
Although should be labelled as a wrongly traced point if does not have any match in the correct reconstruction of the same neuron, it does not necessarily mean that belongs to background. As shown in Fig. 6, two neurons could be so close that the annotator accidentally jumps from one neuron to the other, which should still be considered as wrong tracing. However, such a mistake is theoretically impossible to detect with only a small neighbourhood for most points in that branch because it is actually a correct reconstruction for the other neuron. Therefore, to determine whether the reconstruction of a point is correct or not may require tracing back to hundreds of points ahead, which results in cropping a large image and may lead to inferior performance due to the interference of the redundant regions.
To resolve this issue, we propose to only detect the points which initialize the wrong tracing instead of all the points that do not belong to the neuron and the points that have been missed during the reconstruction. A point from the wrong reconstruction is defined as a point where the wrong tracing begins under two conditions: 1) point must have a match in the correct reconstruction ; 2) there is a child of point who does not have any match in , i.e., a point has been wrongly traced, or there is a child of point does not have any match in , i.e., a point has been missed. Therefore, the quality control problem of neuron reconstruction is converted to a binary classification between (where the wrongly traced begins, denoted as POI, i.e., point of interest, for the rest of paper) and (the match point of ) based on their neighbourhoods in the reconstruction and , respectively. It is worth mentioning that although such a strategy cannot provide the location of all the wrongly traced points, it is enough or even better for the annotators than just offering the location of POIs. In order to identify and fix the problem, the annotators also need to locate where the wrong tracing begins, and they can easily remove all the following points from there if the reconstruction jumps to another neuron or leaks into background tissues.
For the rest of paper, the labels of POIs are set as 1 to represent the experimental group, while their match points are labelled as 0 indicating that these samples belong to the control group. Although theoretically, every point in the correct reconstructions could be used as a sample of the control group, using all the correct points will lead to a highly unbalanced training set (6,423 vs. 20,548,189) and cost much more time for training. Instead we only use the ones whose match points are POIs as the constant control group, and randomly add a few other correct points for training in each iteration.
2.3 Network Architectures
With the recent development of deep learning, many networks have been proposed for 2D or 3D image recognition [10, 2, 3, 11, 1]. However, the concept of 3D image in medical imaging community is not the same as in natural image field, whose 3D images are mostly composed of a point cloud, a 2D image with an additional channel for depth information, or video with time as the third dimension. Each sample used in this study is the concatenation of an optical microscope image and a binary image. It is in the format of a 3D image with three spatial coordinates and two channels, and saved as a 4D matrix. Therefore, most 3D deep learning approaches can barely be used for this problem. Instead, we select six network architectures which have delivered stateoftheart performance for many 2D image recognition problems, and convert them into 3D versions, including VGG11 and VGG16 [10], ResNet101 and ResNet152 [2], DenseNet121 and DenseNet201 [3]. The exact configurations of these networks are shown as in Table 1. Note that because the size of the cropped image for each point is only
voxels, the strides of the first convolutional layers in ResNet101, ResNet152, DenseNet121 and DenseNet201 are set to one to retain sufficient detail information for the rest of layers. In addition, the last convolutional layer with a stride of two has been removed from ResNet101 and ResNet152 to ensure the size of input for the last pooling layer is larger than one.
VGG11  VGG16  ResNet101  ResNet152  DenseNet121  DenseNet201  



, 2  



, 2  



, 2  



,  , 
represents 3D convolution, max pooling and average pooling with
kernel, respectively. stands for a fully connected layer. The stride for convolution is 1 unless otherwise stated, and the stride for max pooling and average pooling are both 2.3 Experiments
3.1 Experimental Setup
The networks are implemented with PyTorch
[8]on NVIDIA Tesla P40 GPUs. The parameters of all the networks are randomly initialized without any pretraining. Weighted cross entropy is applied as the loss function, and weight for each group is in inverse proportion to its sample number. The Adam
[4] optimizer with , is used for optimization without any weight decay. The learning rate starts withand decreases to onetenth after every 10 epochs. The batch size is set to 15 and the maximum number of epochs is set to 50. Five commonly used metrics are presented to evaluate the performance of the proposed framework, including area under the curve (AUC), accuracy, sensitivity, specificity and precision. These metrics range in
, and a higher score implies better performance.First, a fivefold cross validation experiment is performed to evaluate the proposed framework. Following the strategy stated in Section 2.2, the points of interest from the wrong reconstructions and their match points in the correct reconstructions are used for cross validation, which are 6,423 pair of images extracted from 675 reconstructions of 254 neurons. To avoid potential bias, the images are randomly split into five folds on the neuron level, i.e., the images from different reconstructions of the same neuron belong to the same fold and are used either all for training or all for test. Therefore, the number of points in each training set may not be identical. In each iteration during the training process, five other points are randomly selected from the correct reconstructions of the neurons belonging to the training sets. These points are also used for optimization to prevent potential overfitting, because the errors annotators make could concentrate on some special regions and training with only the POIs and their match points may lead to a network with inferior performance on other regions. Hence, each batch for training contains five POIs, their match points and five randomly selected points. Considering the number of points belonging to the control group is twice as many as the points belong to the experimental group, the weight is set as 1 for the control group and 2 for the experimental group.
To further evaluate the generalization ability of the framework, all the other points from the correct reconstructions (besides the match points of POIs) in the test sets are also utilized for evaluation with the network which has the best performance in the cross validation experiment. Note that other points of the wrong reconstruction cannot be used here because the label of most wrongly traced points should be set neither as 1 (POI, where the wrong tracing begins) nor as 0 (correctly traced) based on the current strategy. Therefore, all the points used in this experiment should belong to the control group, the accuracy is equivalent to the specificity.
3.2 Main Result
The results with different network architectures are shown as in Table 2. Note that although some other random points are used for training, we only perform the evaluation for the classification of POIs and their match points here. The proposed framework with VGG11 as the classifier achieves the best performance, with an average AUC score of 94.9.0% and an average accuracy of 86.6%. The high sensitivity (74.7%) and specificity (98.6%) indicates that the 3D VGG11 network can detect most POIs with only a few false alerts. It demonstrates the capability that the proposed framework has to provide guidance for the annotators to locate where the wrong tracing begins.
Network  AUC (%)  Accuracy (%)  Sensitivity (%)  Specificity (%)  Precision (%) 
ResNet101 [2]  84.85.1  79.33.5  71.39.4  87.33.9  87.93.8 
ResNet152 [2]  86.75.3  81.83.6  71.18.7  92.41.9  90.51.3 
DenseNet121 [3]  83.66.7  78.84.2  70.59.2  87.18.3  86.86.5 
DenseNet201 [3]  85.64.9  80.52.7  72.47.1  88.62.1  89.93.2 
VGG11 [10]  94.91.4  86.62.2  74.74.7  98.60.4  98.10.4 
VGG16 [10]  93.61.3  85.91.6  73.84.4  97.00.5  98.010.6 
The results of the fivefold cross validation experiments with different network architectures. In each cell, the first number represents the average measurement, and the second number indicates the standard deviation.
3.3 Ablation Study
The result with or without additional points from the correct reconstructions used for training are presented in Table 3. The network used in this experiment are the one with the best performance in the cross validation experiment, i.e., VGG11. Although with additional points for training, VGG11 has inferior performance regarding the classification of the POIs and their match points (86.6% vs. 91.2%), its performance on the other points in the correct reconstruction is consistent with the match points of POIs (98.2% vs. 98.6%), suggesting the network has a decent generalization capability. Such a stable and consistent performance demonstrates that the proposed framework can be ultilized in practice to provide guidance towards better neuron reconstruction.
AUC (%)  Accuracy (%)  Sensitivity (%)  Specificity (%)  Specificity2 (%)  

With  94.91.4  86.62.2  74.74.7  98.60.4  98.20.2 
Without  96.01.3  91.21.7  90.14.5  92.21.5  82.51.9 
4 Conclusion
In this study, we proposed a fully automatic framework for the quality control of neuron reconstruction. By formulating the problem into a binary classification task for each neuronal point and leveraging stateoftheart deep learning technology, the proposed approach achieved a sensitive of 74.7% for the test set, indicating its capability of detecting wrong tracing, and the high specificity of 98.6% for the match points from the correct reconstructions suggests a low false alert rate. Furthermore, the network trained with a few additional points presented a consistent performance on all the other points in the correct reconstructions, which demonstrated the proposed approach can be used in practice to provide guidance for the annotators towards more accurate and efficient neuron reconstruction.
References
 [1] Griffiths, D., Boehm, J.: A Review on deep learning techniques for 3D sensed data classification. Remote Sensing 11(12), 1499 (2019)

[2]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778 (2016)
 [3] Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4700–4708 (2017)
 [4] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
 [5] Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., Van Der Laak, J.A., Van Ginneken, B., Sánchez, C.I.: A survey on deep learning in medical image analysis. Medical Image Analysis 42, 60–88 (2017)
 [6] Liu, S., Zhang, D., Liu, S., Feng, D., Peng, H., Cai, W.: Rivulet: 3d neuron morphology tracing with iterative backtracking. Neuroinformatics 14(4), 387–401 (2016)
 [7] O’Halloran, D.M.: Module for SWC neuron morphology file validation and correction enabled for high throughput batch processing. PloS one 15(1), e0228091 (2020)
 [8] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)
 [9] Peng, H., Ruan, Z., Long, F., Simpson, J.H., Myers, E.W.: V3D enables realtime 3D visualization and quantitative analysis of largescale biological image data sets. Nature Biotechnology 28(4), 348–353 (2010)
 [10] Simonyan, K., Zisserman, A.: Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556 (2014)
 [11] Song, S., Xiao, J.: Deep sliding shapes for amodal 3D object detection in RGBD images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 808–816 (2016)
 [12] Sporns, O., Tononi, G., Kötter, R.: The human connectome: A structural description of the human brain. PLoS Computational Biology 1(4) (2005)
 [13] Wan, Z., He, Y., Hao, M., Yang, J., Zhong, N.: An automatic neuron tracing method based on mean shift and minimum spanning tree. In: International Conference on Brain Informatics. pp. 34–41. Springer (2016)
 [14] Wang, C., Chen, W., Liu, M., Zhou, Z.: Automatic 3d neuron tracing based on terminations detection. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). pp. 1027–1030. IEEE (2018)
 [15] Wang, C.W., Lee, Y.C., Pradana, H., Zhou, Z., Peng, H.: Ensemble neuron tracer for 3d neuron reconstruction. Neuroinformatics 15(2), 185–198 (2017)
 [16] Wang, Y., Li, Q., Liu, L., Zhou, Z., Ruan, Z., Kong, L., Li, Y., Wang, Y., Zhong, N., Chai, R., et al.: TeraVR empowers precise reconstruction of complete 3D neuronal morphology in the whole brain
 [17] Wang, Y., Xie, P., Gong, H., Zhou, Z., Kuang, X., Wang, Y., Li, A.a., Li, Y., Liu, L., Veldman, M.B., et al.: Complete single neuron reconstruction reveals morphological diversity in molecularly defined claustral and cortical neuron types. bioRxiv p. 675280 (2019)
 [18] Winnubst, J., Bas, E., Ferreira, T.A., Wu, Z., Economo, M.N., Edson, P., Arthur, B.J., Bruns, C., Rokicki, K., Schauder, D., et al.: Reconstruction of 1,000 projection neurons reveals new cell types and organization of longrange connectivity in the mouse brain. Cell 179(1), 268–281 (2019)
 [19] Yang, J., Hao, M., Liu, X., Wan, Z., Zhong, N., Peng, H.: FMST: An automatic neuron tracing method based on fast marching and minimum spanning tree. Neuroinformatics 17(2), 185–196 (2019)
 [20] Zhang, Q., Yang, L.T., Chen, Z., Li, P.: A survey on deep learning for big data. Information Fusion 42, 146–157 (2018)