Detection and skeletonization of single neurons and tracer injections using topological methods

03/20/2020
by   Dingkang Wang, et al.
0

Neuroscientific data analysis has traditionally relied on linear algebra and stochastic process theory. However, the tree-like shapes of neurons cannot be described easily as points in a vector space (the subtraction of two neuronal shapes is not a meaningful operation), and methods from computational topology are better suited to their analysis. Here we introduce methods from Discrete Morse (DM) Theory to extract the tree-skeletons of individual neurons from volumetric brain image data, and to summarize collections of neurons labelled by tracer injections. Since individual neurons are topologically trees, it is sensible to summarize the collection of neurons using a consensus tree-shape that provides a richer information summary than the traditional regional 'connectivity matrix' approach. The conceptually elegant DM approach lacks hand-tuned parameters and captures global properties of the data as opposed to previous approaches which are inherently local. For individual skeletonization of sparsely labelled neurons we obtain substantial performance gains over state-of-the-art non-topological methods (over 10 and faster proofreading). The consensus-tree summary of tracer injections incorporates the regional connectivity matrix information, but in addition captures the collective collateral branching patterns of the set of neurons connected to the injection site, and provides a bridge between single-neuron morphology and tracer-injection data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

05/14/2018

Topological Skeletonization and Tree-Summarization of Neurons Using Discrete Morse Theory

Neuroscientific data analysis has classically involved methods for stati...
10/08/2018

Modelling brain-wide neuronal morphology via rooted Cayley trees

Neuronal morphology is an essential element for brain activity and funct...
09/27/2019

A Topological Nomenclature for 3D Shape Analysis in Connectomics

An essential task in nano-scale connectomics is the morphology analysis ...
07/23/2012

Towards a theory of statistical tree-shape analysis

In order to develop statistical methods for shapes with a tree-structure...
09/05/2014

Automatic Neuron Type Identification by Neurite Localization in the Drosophila Medulla

Mapping the connectivity of neurons in the brain (i.e., connectomics) is...
06/09/2015

Self Organizing Maps Whose Topologies Can Be Learned With Adaptive Binary Search Trees Using Conditional Rotations

Numerous variants of Self-Organizing Maps (SOMs) have been proposed in t...
10/17/2021

On the Statistical Analysis of Complex Tree-shaped 3D Objects

How can one analyze detailed 3D biological objects, such as neurons and ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Summary

Computer Science and Engineering Department, The Ohio State University, Columbus, OH, USA 43210. Cold Spring Harbor Laboratory, NY, USA 11724. Indian Institute of Technology, Chennai, Tamil Nadu, India 600036.

Neuroscientific data analysis has traditionally involved methods for statistical signal and image processing, drawing on linear algebra and stochastic process theory. However, digitized neuroanatomical datasets containing labelled neurons, either individually or in groups labelled by tracer injections, do not fit into this classical framework. The tree-like shapes of neurons cannot be adequately described as points in a vector space (e.g. the subtraction of two neuronal shapes is not a meaningful operation). There is therefore a need for new approaches, which has become more urgent given the growth in whole-brain datasets with sparsely labelled neurons or tracer injections.
Methods from computational topology and geometry are naturally suited to the analysis of neuronal shapes. In this paper we introduce methods from Discrete Morse Theory to extract tree-skeletons of individual neurons from volumetric brain image data, and to summarize collections of neurons labelled by anterograde tracer injections. Since individual neurons are topologically trees, it is sensible to summarize the collection of neurons using a consensus tree-shape. This consensus tree provides a richer information summary than the regional or voxel-based ”connectivity matrix” approach that has previously been used in the literature.
The algorithmic procedure for single neuron skeletonization includes an initial neurite-detection step to extract a density field from the raw volumetric image data, followed by Discrete Morse theoretic skeleton extraction from the density field using the 1-(un)stable manifold of the density field. We apply the method to high-resolution 3D volumes of sparsely-labelled neurons to quantitatively extract the single neuron topology, and achieve better performance than state-of-the-art algorithms using non-topological methods (over 10% improvement in precision, and significant speedups at the proofreading stage).
In image volumes of brains with neuronal tracer injections, a summarized consensus tree of the collective neuronal projection patterns is extracted to characterize brain-wide neuronal connectivity. The procedure for building the tree skeleton assigns the weight associated with detected tracer projections locally to the nearest point on the skeleton. This ensures that the resulting weighted skeleton also provides a summary of the tracer projection data: integrating the weighted skeleton over a brain region, helps recover the regional ”connectivity matrix”. Thus the consensus-tree summary of tracer injections incorporates the traditional regional connectivity matrix information, but in addition captures the collective collateral branching patterns of the set of neurons connected to the injection site. We propose that this summary can provide a future means of characterizing tracer injection data, and also provide a bridge to a growing body of single-neuron morphology data.
We find that the DM method is able to trace a tree branch through regions of low intensity that pose challenges to baseline methods. This is a particular strength of the DM approach as it utilizes the global topological structure present in the data, whereas relevant literature methods are inherently spatially local. Additionally the DM approach is theoretically well principled and conceptually clean without multiple ad-hoc hand-engineered steps. There is a significant computational overhead to the topological approaches but we are able to mitigate the speed issues using parallelized implementations of the core algorithms.

Background

Topological Data Analysis and Discrete Morse Theory: Within the last decade or so, topological data analysis (TDA) has grown in influence. Several new TDA methodologies have been proposed for analyzing complex high-dimensional datasets. These approaches use a variety of topological concepts to characterize essential structure behind the data and algorithmic implementations have been developed [EH10, Carl09, CM17, LS13, Tie18]. Some of the ideas (in particular persistent homology [Edelsbrunner2002]) have been applied to multiple application domains [Buchet2018, dario, platt, Lamar, LBD17], including neuroscience [li2017, CG19, KDS2018].
The area of TDA that relates to the present work is the persistence-guided Discrete Morse theory based framework for reconstructing hidden graphs from observed data. Discrete Morse theory has been utilized to capture hidden structure in 2D or 3D volumetric data [DRS15, GDN07, RWS11]. Extraction of hidden graphs was formulated in [2011MNRAS], and the framework was simplified and theoretical guarantees provided in [DWW18]. Morse theory exploits the global (instead of local) topology of a scalar field. Thus the hidden graph skeleton of the scalar field can be extracted reliably through regions of weak signals. Classical Morse theory applies to continuous functions on manifolds. Discrete Morse theory [Forman_DM_1998] provides a discretized computational framework suitable for algorithmic implementation on digitized data. Persistent homology is used to separate signal from noise and remove potentially noise-related structure from the graph. Such a framework has been applied previously to reconstructing hidden road networks from noisy GPS trajectories and satellite images [WWL15, DWW19]. Here we adapt and extend this approach to develop a computational methodology suitable for computational neuroanatomy and to address neuroscientific problems.
In this manuscript we introduce a data analysis framework entitled DM-skeleton that uses TDA and the Discrete Morse approach to skeletonize individual neurons as well as groups of neurons labelled by tracer injections. For single neuron reconstruction, we show that DM-skeleton significantly outperforms the best available automated approaches[APP2_Hang2013, Hang_GTree2018] (over improvements in F1 scores) and reduces proofreading times. For tracer injection skeletonization, DM-skeleton provides a conceptually new route to the analysis of mesoscale projection data, and shows robust performance.
Single Neuron Reconstruction: Many methods have been previously proposed for single-neuron reconstruction from high resolution image stacks [Al-Kofahi:2002, Bas2011, Basu2014, Boykov2001, Choromanska2012, Chothani2011, activelearning2014, Lee2008, lee2012, Myatt2012, oh2014mesoscale, sironi2015, SCHMITT20041283, sui2014, Srinivasan2007, turetken2013, Turetken2011, VASILKOSKI2009197, Yang2013, ZHANG2007149, Zhao2011, Zhou1999, Zhou2015, Zhou2016, APP2_Hang2013, smarttracing, Quan_NeuroGPSTree_2016, Hang_GTree2018] (see also surveys and books [Acciai2016, DONOHUE201194, neuronsurvey, arenkiel2014neural] for more detailed discussions.) Most of the existing methods sequentially expand a tree from a collection of seed points that are often selected based on density information. A popular line of sequential tracing algorithms use shortest-path based approaches, such as APP2 [APP2_Hang2013] and SmartTracing[smarttracing]. More recently, methods using a principal-curve to sequentially include more nodes in the reconstruction (e.g., NeuroGPS-Tree [Quan_NeuroGPSTree_2016] and GTree [Hang_GTree2018]) have shown high performance on single neuron reconstruction in mouse brain datasets. Our proposed DM-skeleton approach shows robust performance gains over APP2 and GTree, and shortens proofreading time. Importantly, it also opens up a conceptually new direction in single neuron tracing by using global topological information in a manner that we feel will generalize beyond this specific work.
Tracer Injection Skeletonization: Brain-wide tracer injection datasets are another area where we apply the DM skeletonization approach. In tracer injection data sets, thousands of neurons are collectively labelled and do not have the topological simplicity of individual trees. Conventionally, brain-wide connectivity information is summarized in the form of regional connectivity matrices [ConnectivityMatrix_chapter3_2016]. Such a representation ignores the fundamental tree-like morphology of neurons and contains no information relating to collateral branching patterns. In this manuscript we introduce a new approach to the analysis of tracer injection data, by skeletonizing the tracer injections using a summary tree. To the best of our knowledge, this is a new approach to conceptualizing tracer-injection data, and could provide a biologically better-grounded approach to the study of mesoscale connectivity mapping using tracer injections. One important advantage of the tree-skeletonization approach is that it connects more directly with single-neuron reconstructions.

It is possible to utilize methods such as APP2 or GTree to perform tracer-injection skeletonization (and we use these as baseline comparison techniques). However, these methods do not automatically interpolate through regions of weak label, a problem that is exacerbated for tracer-injection data. In contrast, the DM based approach utilizes global information and is more robust.


Figure 1: Workflow and sample outputs for the DM-Skeleton pipeline. a, Workflow for fMOST datasets (from sparse neuronal label to individual skeleton extraction) and sample outputs for each step. b, Workflow for tracer injection datasets (from dense neuronal label to injection summarization) and sample outputs for each step.

The DM pipeline for summarizing multiple-neuron tracer injection datasets has multiple stages. First, the raw image stack is preprocessed to detect or highlight neuronal processes [Samik_DM++]. Then a variant of the Discrete-Morse module [WWL15, DWW18]

is employed to produce a graph skeleton containing all potential neuronal trajectories. This graph skeleton may contain false positives, so the next stage of our algorithm performs a persistent homology based simplification step which removes superfluous branches in low-density regions and branches that misalign with estimated “flow vectors”. This simplified graph is skeletonized into a minimum spanning tree, which is further simplified using a persistence threshold. During all simplification steps, weights assigned to points on the tree (summarizing the neighboring projection density) is re-assigned so as to preserve the total weight. The resulting weighted tree-summary provides a new way of characterizing tracer injection data, simultaneously capturing regional connectivity information as well as collateral branching patterns.

Results

Summary method overview The high-level workflow of our new Discrete Morse-based (DM-Skeleton) pipeline is shown in Fig. 1. The detailed flows for high-resolution single neuron data (e.g, fluorescent micro-optical sectioning tomograph (fMOST)) and for whole-brain 3D densely labelled tracer injection data (e.g, serial two-photon tomography (STP)) are slightly different, as the types of input and goals are different. Nevertheless, both workflows have three main components, namely preprocessing, skeletonization, and simplification. (For details, see Method section.)

Figure 2: Illustration and basic concepts for Discrete Morse theory. a, An example of a 2D image (left) converted to a density function with the graph (terrain) of this function (right). The weak y-shape connection is captured by the mountain ridges (shown in black) on this terrain. b, An example of 2D terrain (graph of a 2D function): red points are local maxima, green points are saddles, while blue points are local minima. The white paths are some examples of the integral paths following the local gradients, ending in minima. The black curves are a collection of 1-stable manifolds (integral paths between maxima and saddles). c, An example of persistence pairs on 1D functions. For a function , topological features (connected components in this simple case) are ’born’ at local minima, and ’die’ at local maxima. Each persistence pair indicates the birth and death of some feature, born at and killed at . This gives rise to a point in the so-called persistence diagram w.r.t. (lower pictures). The persistence of this feature is the difference in function values which can be considered as a measure of importance as it gives how “long” the feature persists. In the figures, persistence pairings are marked by red dotted curves. The function (shown on the right) can be viewed as a noisy perturbation of function (shown on the left). The function has 3 prominent features (persistence pairs), while the perturbed version also has additional “smaller” features with lower persistence (importance). As shown in the persistence diagrams (lower row): In addition to the three red persistence points, diagram also has blue points close to the diagonal, indicating features whose persistence (i.e., deathtimebirthtime) is small. These small features (”noise”) can be detected using their persistence values and removed in our algorithm.

DM-Skeleton takes 3D image stacks as input. In Step 1, the input image is converted to a density field defined on a 3D grid . For fMOST data, raw image intensity was treated as the density value subjected to Morse skeletonization. For the STP data, a process-detection step [Samik_DM++] was first applied to segment labelled axon fragments in the high-resolution 2D images. A 3D volume was created to summarize the density of axon fragments within each voxel at lower resolution. This summary volume was the density field subjected to Morse analysis. Given the starting density field , the goal of the DM-skeleton algorithm is to capture center-lines passing through (relatively) high density regions. By viewing this density map as a terrain (see Fig. 2 for a 2D example), this step corresponds to extracting the ’mountain ridges’ of this terrain. This extraction was achieved using 1-stable manifolds from discrete Morse theory.
In Step 2, a variant of the persistence-guided discrete Morse-based framework of [WWL15, DWW18] was applied to the 3D density field . We call the output the Morse ’graph skeleton’ . Note that this approach takes into account the global topology of the density field: the 1-stable manifold connects through low-density regions (e.g., gaps and weak signals around the Y-junction in Fig. 2a) reliably. Finally, in Step 3, we extracted either a forest (e.g, each tree representing a single neuron in fMOST data) or a tree summary from the Morse graph skeleton . The forest output may have false positives, thus we developed a simplification module to further remove such branches (see Fig. 3; more details in Method).

Figure 3: Summarization and simplification algorithms. a, An example of spanning forest algorithm. Only one of the trees is shown for simplicity. Assume is the graph output from Discrete Morse, and is its weighted shortest path spanning tree. The edges are weighted by the average intensity of both endpoints - larger weights mean smaller distances. Dotted lines represent those edges that are removed from . b, Score assignment for tree nodes. Each voxel is associated with its nearest neighbor in the set of tree nodes. The score of a tree node is the sum of the density of voxels associated with . An upper bound is set to avoid distant associations. c, Score smoothing: To reduce noise, we smooth the scores of each tree node by averaging those of its neighbors within -hops (i.e., connected via at most edges to ). We restrict only to neighbors that are ancestors or descendants of . An example with is shown in the figure, where blue nodes are neighbors of node . d, Simplification process: We utilize two strategies. Strategy 1 (LeafBurner) starts from the leaves and iteratively removes tree nodes with scores . Strategy 2 (RootGrower) grows the tree from the root by keeping tree nodes with scores . In the tree on the left, green nodes (red nodes) correspond to nodes with score higher (lower) than the threshold . Deploying different strategies will result in different outputs. e, To achieve a weight-preserving summarization, each tree node has a weight (represented by the size of the dotted green circle) equal to the sum of the weights of associated voxels. We also assign a thickness value to each tree node (represented by the radius of the orange circle) for better visualization. This thickness value is proportional to the square root of the number of associated voxels.

fMOST sparse labeling dataset. Three semi-manually reconstructed neurons from fMOST dataset were taken as ground truth. A m neighborhood around the soma was taken as the test dataset. More details about the test dataset and the ground truth are provided in Methods. The Morse graph skeleton obtained from the persistence-guided discrete Morse-based framework (Step 2) was fed to the simplification module (Step 3). In Step 3, the positions of somata were taken as given and used as roots to extract a forest such that one tree is rooted at each soma. Note that there exist many automatic methods for reliably detecting soma positions [SomaDetection_Yan_2013, SomaDetection_Cheng_2016]. Automatic methods can be applied to reduce human labor when a region has many somata. The tree rooted at the soma of the target neuron was then simplified. In this step, each tree node was assigned a score based on density (see Supplementary Methods). Next, the RootGrower strategy was applied (see Method section) to grow a simplified neuron tree. This strategy (instead of LeafBurner) is more suitable for the fMOST dataset because there is observable autofluorescence scattered in the extracted regions. A bottom-up process (LeafBurner) would stall in such regions when the autofluorescence has high-intensity values similar to the somata.
We evaluated the performance of the DM-Skeleton pipeline on three fMOST single-neuron regions. We compared our output with the state-of-the-art approaches GTree [Hang_GTree2018] and APP2 [APP2_Hang2013] methods. For GTree and APP2, we tested different parameters and used the outputs having the fewest gross errors for comparison. We compared the precision, recall and F1-score of the outputs of these three methods over ground-truth reconstructions (Fig. 4c). In all three regions, we note that both our DM-Skeleton and the GTree method performed better than the APP2 method. Furthermore, on average, our DM-Skeleton method outperformed GTree by around 14% in precision and 3% in recall. This suggests that our DM-Skeleton has significantly fewer false positives than the output of GTree.

Figure 4: Results of fMOST single-neuron skeletonization. For the purpose of visualization, all thickness values were suppressed. a, 3D volumes of ground-truth (cyan), DM-Skeleton method (red), GTree (yellow) and APP2 (green) outputs. All the methods are further compared in the zoomed-in region (the region in the white square shown on the ground truth) in (b). b, Ground-truth (cyan on the left), and comparisons of zoomed-in regions of DM-Skeleton method and GTree method. APP2 output (green) clearly misses many branches and is considered having the worst performance. False positives and false negatives are highlighted in orange circles. GTree method has many false positives and APP method misses many branches. Our DM-Skeleton method has a visible advantage over GTree and APP2 in these examples. c, Evaluation on fMOST dataset, best scores achieved are highlighted in red. DM-Skeleton used persistence threshold = 256 and simplification threshold = 0.2 for all the cases. The APP2 method does not have a good performance in terms of F1-score. On average, when compared with the GTree method, DM-Skeleton has around 14% advantage on precision and slightly better recall; it achieves the best F1-scores ( to better) in all the regions.

To further illustrate the potential use of our output, we proofread the outputs from GTree and DM-Skeleton using 3D virtual finger tool [Peng_virtual_finger_2014] on fMOST regions. Proofreading was performed on both the GTree outputs and the DM-Skeleton outputs by two independent annotators without domain-specific knowledge and with no prior knowledge of the ground truth. The annotators also constructed the skeleton from scratch on the region of neuron1 using the same proofreading tool with an average time spent of roughly one hour. Supplementary Table 1 shows that the automatic methods reduce the human time of annotation by at least two thirds. Precision, recall, and F1-score were calculated to compare proofread results with the ground truth (Supplementary Table 1). All of these metrics demonstrate that proofread results even by naive annotators were consistent and of high-quality. Importantly, the time spent proofreading the DM-Skeleton outputs was much shorter than the time spent proofreading GTree outputs. The reason is that DM-Skeleton outputs only have minor early-termination issues, which can be corrected by extending corresponding traces. By contrast, GTree outputs usually have a considerable amount of false positives. They also occasionally miss important mainstream branches, which causes difficulties and prolongs the proofreading process.
To utilize our framework, one labeled dataset was used to tune the parameters (for persistence simplification and tree simplification). These parameters were then applied to other datasets with similar contrast and SNR. In the results reported above, Neuron-1 from the fMOST dataset was used to tune the parameters. We chose multiple persistence and tree simplification thresholds. For each such pair of choices, we generated the final tree and computed its F1-score compared with ground-truth tree (Supplementary Table 2). We then selected the thresholds that give the best F1-score and apply the same thresholds to all other fMOST neurons. From Supplementary Table 2, we note that results were reasonably stable to perturbations of thresholds.
STP tracer injection dataset. When applying the DM-Skeleton pipeline on the tracer injection dataset, in Step 1, the original image data (see Section Data Collection) was passed through a hybrid deep CNN with topological priors[Samik_DM++] to detect axon fragments. The output of this automated detection process was then manually proofread and corrected. The proofread images were summarized into lower resolution to construct a manageable 3D volume of the entire brain. The remaining process is similar to the single neuron skeletonization algorithm above. However, there are some notable differences. On one hand, the pre-processed STP data was clean and did not have as much noise as the autofluorescence and bright background in fMOST data, making the LeafBurner simplification strategy preferred over the strategy RootGrower for STP dataset (see Fig. 3, and also Method section). On the other hand, the tracer injection data has more discontinuities and gaps. Therefore, a Gaussian filter (kernel radius 2) was applied to the image stack (after pre-processing) before further processing. The persistence threshold was adjusted to account for bit-depth differences. The simplification threshold for the tracer data was taken to be the same as for the single neuron data set even though we changed the strategy for simplification. This parameter was robust to input image types because an average score of all tree nodes was used as the reference (Supplementary Methods).
The Morse graph skeleton obtained in Step 2 may have some redundant straight segments connecting signals to the boundary of the domain (Supplementary Fig. 2). This boundary effect is caused by the zero-value background and degenerated gradient on those pixels in the cleaned STP data. Such redundant segments can be easily removed based on distance to the nearest non-zero voxel.

We can in principle further streamline the output using flow vectors estimated with weighted principal component analysis

[Vector_simp_Delchambre_2014]. Paths not well aligned with estimated flow vectors can be removed as long as doing so does not increase the number of connected components that make up the output. In the results shown below, we did not apply this flow vector simplification, but results with or without this flow vector simplification step are compared in Supplementary Fig. 4. The output with flow vector simplification is visually cleaner and still has good coverage rates.
In Step 3, the root position was manually selected within the projection site. Then, a shortest path spanning tree was extracted and simplified. In contrast with the single-neuron skeletonization, the tracer injection skeleton has a weight assigned to each tree node after simplification. The weight of a given tree node represents the total weight of voxels whose nearest neighbor in the tree node set is (see Fig. 3, and also Methods section). This is a weight-preserving process where the weights of the tree nodes of the output preserve the total tracer density of points assigned to that node.
In addition, for visualization purposes we assigned a thickness value for each tree node. The thickness value uses the number of voxels instead of their weights to avoid excessively large tree nodes around the injection site (see Fig. 5a and Methods section). Keeping only top branches (based on length) can provide a more concise summary or skeleton. See Fig. 5a.

Figure 5: Skeletonization result on STP data with tracer injection. a, On the left, we show the results in a coronal plane. The result with thickness assigned (top) have tree nodes with greater thickness in regions with higher process density. In the output before thickness assignment (middle), the thickness of every tree node is set to a constant. Top 20 branches are also shown (bottom). Selecting top branches based on total length can provide a concise summarization. The DM-Skeleton result after thickness assignment are presented from other angles on the right. The images and summaries were rescaled along the rostral-caudal direction for isotropic visualization. b, The APP2 outputs (shown in Supplementary Fig. 3) do not have comparable quality to the other two, so the visual comparison here is between GTree and our method. The raw data (bottom) has several challenging regions marked by orange rectangles. DM-Skeleton method successfully captures these regions, while the GTree method misses these regions. In addition, the GTree method has many redundant edges in the high density region, which causes difficulties in observing the main skeleton. c, We computed the weight distribution over regions for all summarization methods and the proofread brain with an injection in the primary motor cortex (MOp). The same brain regions in either hemisphere were considered separately. The 5 projection regions with top weights are analyzed for the summarization methods. The injection site is on the ipsilateral hemisphere. The DM-Skeleton method recovers the correct top 5 regions, while the GTree method has relatively low ranks for SC and APN regions. In other regions, e.g., the MOp and MOs in the contralateral hemisphere, the GTree method does not capture the signals very well and the APP2 method completely misses those regions. Our DM-Skeleton has the most accurate summary for all cases. The abbreviated region names are Caudoputamen (CP), Secondary motor area (MOs), Primary somatosensory area (SSp), Superior colliculus (SC) and Anterior pretectal nucleus (APN). The suffixes (i) or (c) correspond to the ipsilateral or contralateral hemisphere. DM-Skeleton is abbreviated as DM in the table for better presentation.

Visually, the output by APP2 method is over-simplified and misses many obvious signals (Supplementary Fig. 3). GTree and DM-Skeleton method have more plausible results (Fig. 5b). In Fig. 5b, we further compared GTree and DM-Skeleton outputs in several regions. From the raw data, we can see that these regions, corresponding to motor cortex and striatum in the contralateral hemisphere, contain visible signals. GTree, however, does not have enough branches covering these regions. This may be because GTree is strongly dependent on brightness and contrast, and is not robust enough to handle regions of images with relatively low brightness. This necessitates fine-tuning of the pixel-value histogram prior to running GTree. In contrast, the Discrete Morse-based method is able to overcome these issues because only the relative order of pixel-values matters and effectively there is a built-in histogram localization. Thus the method is automatically adaptive to different types of input images, as well as to different local pixel intensity distributions, with minimal image pre-processing requirements. We also note that even the top few branches from our DM-Skeleton output (the image at the left bottom corner in Fig. 5a) still capture key branches and present a good coverage of different brain regions (including those within orange boxes shown in Fig. 5b, which are missed in the outputs by GTree and APP2 methods).
To verify the biological interpretability of the skeletonization results, we first mapped the 3D volume of the simplified tree to the Allen mouse brain atlas [AllenMouseBrainAtlas] for regional projection strength analysis (Supplementary Methods). The intensity of projected pixels reflected the weights preserved by tree nodes. The projected segments were categorized into the corresponding brain regions based on the atlas map. We applied the same procedure to the proofread volume. A good summarization should cover most of the signals in the input data. Thus, the regions covered by the skeleton-based summary and the conventional connectivity matrix based summary should coincide. Moreover, the distributions of weights over different regions should be similar. In Fig. 5c, we computed the percentages of (projected) weights within certain regions over the total weights for the proofread data and all three summarizations. Note that the output from GTree is merely a skeletonization without thickness.
We selected the top 5 regions with the highest weights based on the conventional connectivity matrix summary and found the ranks of these regions in the summarization outputs. DM-Skeleton output has the identical top 5 regions as those calculated from the proofread data, while GTree or APP2 have difficulties in capturing the signals in SC and APN. The DM-Skeleton output captured the projection into the contralateral areas, while the GTree output did not cover many of these regions. The APP2 output completely missed the contralateral hemisphere. There are other regions where our weighted tree qualitatively matches the conventional summary, while GTree and APP2 have weight percentages close to zero. The coverages for each hemisphere as well as the entire brain were calculated individually. Our DM-Skeleton had the best coverage rates in all cases. In the contralateral hemisphere, it outperformed the GTree method by more than 40%. There is a small discrepancy in the weight distribution over regions between the conventional process density summary and the DM-Skeleton based summary. The reason is that DM-Skeleton did not capture some poorly labelled regions, and the voxels may be assigned to a tree node in a neighboring region.
In addition, we compared our DM-Skeleton results with single neuron data from the Mouselight project [Mouselight]. Eleven neurons which have soma positions in the injection site of the STP dataset were selected and quantitatively summarized. Visually, the DM-Skeleton output showed good correspondence with this collection of neurons, which were expected to cover similar regions as the STP volume (Supplementary Fig. 5).

Discussion

In this paper we introduced a new conceptual framework of data analysis utilizing Discrete Morse Theory for extracting underlying tree structures and applied it to two neuroscience problems of current importance: the automatic extraction of single neuron skeletons, and biologically meaningful analysis of mesoscale connectivity mapping data. Apart from being conceptually and mathematically well grounded, this approach significantly outperforms state-of-the-art approaches relevant for these problems.
Current approaches to analyzing brain-wide tracer-injection data all start from regional connectivity matrices, which lose the biologically important tree-like structure of neurons composing those projections. Apart from respecting the collateral branching patterns of the underlying set of neurons and providing a natural bridge to single neuron data, the DM-Skeleton method utilizes the global structure of the signal and is robust to noise and missing data, and naturally adaptive to signal intensity variations. We demonstrated the approach using a tracer-injected whole-brain data set. The resulting tree-skeleton faithfully captured the regional ”connectivity matrix” information, and significantly outperformed baseline methods. In addition it captured the collateral branching patterns, consistent with corresponding single neuron skeletons with somata in the injection region. We expect that this approach will be useful in the future for mesoscale brain connectivity mapping using tracer injections and form a bridge to the single neuron data.
We also demonstrated significance performance improvements for single neuron reconstruction over existing best in class methods. Automated single neuron skeletonization is a well-studied problem with a large associated literature and hundreds of algorithms. Our methodology breaks new ground by taking a different conceptual approach that leads to a simple and theoretically transparent algorithmic framework simultaneously with performance improvements. As in the case of tracer-injection skeletonization, the method takes into account the global structure of the data, is robust to noise and missing data, and is data-adaptive as construction of the Morse skeleton depends on the order relations between neighboring pixel intensities rather than their absolute values.
While our method achieves strong performance, it is computationally expensive compared with other methods. The computational bottleneck comes from the persistence-guided discrete Morse-based framework. Computing persistence pairings takes running time in the worst case (although usually significantly faster in practice), where is the number of cells that make up the input cell complex. To address this bottleneck, we utilized DIPHA [DIPHA_Bauer_2014] to compute persistence pairings. This algorithm is distributed and allows for a significant speedup compared to non-distributed persistence algorithms. Further code optimizations over our current implementation are possible and run-time can be further reduced in future work. Despite the higher computational complexity, the conceptual elegance and theoretical transparency, performance improvements including significant reduction in human proof-reading times, and incorporation of prior biological structure are strong arguments in favor of the approaches proposed here.

References

References

Data and code availability

The fMOST and STP data were collected as a part of Brain Initiative Cell Census Network and shared online. The raw images of the STP dataset are available from: ftp://download.brainimagelibrary.org:8811/biccn/huang/connectivity/anterograde/180830_JH_WG_Fezf2LSLflp_CFA_female_processed/. The raw images of the fMOST dataset are available from: ftp://download.brainimagelibrary.org:8811/biccn/zeng/luo/fMOST/732664811/. Single neuron reconstruction ground truth are available from ftp://download.brainimagelibrary.org:8811/biccn/zeng/luo/fMOST/cells/732664811/. Processed projection summary of STP data, subregions of fMOST data, the code and documentation are available on Github at https://github.com/wangdingkang/DiscreteMorse.

Acknowledgements

The authors thank Katherine Matho and Kathleen Kelly for helpful discussions on the test datasets.

This work is in part supported by National Science Foundation under grants CCF-1740761, RI-1815697, and DMS-1547357, National Institute of Health under grant R01-EB022899, MH114821 and MH114824. We would also like to thank Crick-Clay Professorship, Mathers Charitable Foundation and H N Mahabala Chair, IIT Madras for their support.

Author contributions

D. W., L. M., S. W., Y. W. and P. M designed the pipeline. D.W. and L.M. implemented the pipeline. B. H., S. B., J. J., K. R., X. L. and M. L. provided the test datasets. D. W. and L. M. collected results on all test datasets, and together with Y. W. and P. M. analyzed the results on fMOST dataset. D. W., B. H., X. L., Y. W. and P. M. analyzed the results on STP dataset. D. W., L. M., B. H., S. B., J. J., X. L., Y. W. and P. M. wrote the manuscript.

Competing interests

The authors declare no competing interests.

Methods

0.1 Data Collection.

The Serial Two-Photon (STP) dataset presented in this paper was collected as a part of Brain Initiative Cell Census Network [Matho_BICCN]. Specific Cre-dependent transgenic mouse lines were crossed with IslFlp reporter lines. Flp-dependent AAV tracers were utilized to reveal cell type-specific axon connection [Neural_circuits_JoshHuang2013]. Each brain was prepared and imaged using STP tomography [Serial_two_photon_Ragan2012] with in-plane resolution, and sectioned coronally every 50 . Two channels of 16-bit data were collected, where Channel 1 collected the autofluorescence and Channel 2 collected the fluorescent tracer information. Only Channel 2 data were used in the subsequent analysis. One STP dataset was involved in the development and demonstration of methods in this paper (available from: ftp://download.brainimagelibrary.org:8811/biccn/huang/connectivity/anterograde/180830_JH_WG_Fezf2LSLflp_CFA_female_processed/).

The dual-color fluorescent micro-optical sectioning tomography (fMOST) data presented in this paper , both raw images and single neuron reconstruction data (“ground truth”), were collected as a part of Brain Initiative Cell Census Network and downloaded from Brain Image Library (available from http://www.brainimagelibrary.org/). Specific Cre-dependent transgenic mouse lines with reporter transgenic mouse lines. Sparse labeling of the entire neuron was achieved by injecting small amount of viral tracer. The brain was fixed and embedded in resin before mounted for fMOST imaging at in-plane resolution and sectioned serially at . Two channels of data were collected, with 16-bit data in each channel [Dual_color_imaging_Gong2016]. Only green channel contained the neuron tracing information and therefore used in the current study. One fMOST dataset was involved in the development and demonstration of methods in this paper (available from: ftp://download.brainimagelibrary.org:8811/biccn/zeng/luo/fMOST/732664811/). Single neuron reconstruction was performed using a combination of automatic tracing and manual annotation [TeraVR_Wang2019]. Information of nodes and edges of individual neuron was stored in separate files in swc format (available from: ftp://download.brainimagelibrary.org:8811/biccn/zeng/luo/fMOST/cells/732664811/).

0.2 Data pre-processing.

For fMOST data, neighborhoods of 3 reconstructed neurons were taken as the test datasets. Each neighborhood volume was saved as one VTK file (simple legacy format). The associated single neuron reconstruction data, saved in SWC format, were taken as ground truth for the subsequent analysis. The VTK and SWC files are available on https://github.com/wangdingkang/DiscreteMorse.

The STP dataset was first processed with a topologically motivated convolutional neural network

[Samik_DM++] for the detection of the tracers. The network, termed as DM++, takes in whole STP sections and divides them into pixel tiles. These tiles are passed through a topological algorithm based on Discrete Morse [DWW17] and a CNN counterpart for determining the topological and neuronal priors, respectively. The topological priors capture the faint connectivity which is used to boost the performance of the CNN in a supervised Siamese setting for the dual priors incorporated into the DM++ framework. The final output likelihood map is converted into a binary mask for the neuronal processes using an optimal empirically determined threshold. This captures most of the processes in the tiles, which are then stitched back together to form a mask for an entire reference section of the brain.

The preliminary outputs of process detection were manually verified for the entire brain by an experienced neuroanatomist using Fiji [FIJI_Schneider2012]. Briefly, the preliminary outputs consisting of detected signal were masked with the original brain section image and error corrected using the pixel painting tool. The filled processes were identified as those having a brighter intensity compared to the background.

The proofread brain from the previous step was annotated in the format of binary images. The images were further downsampled to the desired resolution using the sum method, which preserved all the signal information. Volumes of neuronal axons were summarized by counting the number of pixels ( resolution) containing fluorescent signal into in coronal sections while keeping the inter-section spacing the same (). The resulting images are 8-bit, and the value for each pixel indicates the signal strength for its corresponding voxel. The image stack was saved in VTK simple legacy format and is available on https://github.com/wangdingkang/DiscreteMorse.

0.3 DM-Skeleton

This section provides intuition and descriptions for the DM-Skeleton pipeline; recall the workflow in Figure 1.

0.4 Step 1: Pre-processing

Each image volume in VTK format was loaded as an image stack, and converted into a density field defined on the cubical complex (3D grid) , where each vertex corresponds to a voxel in the input image and has a density value. If the input is fMOST data, then the density value at each voxel is simply the pixel value in the input raw image stack. For whole-brain tracer injection STP data, a significant portion of the raw images is background (see the example in Figure 1). Hence we first apply the learning-based process-detection module [Samik_DM++] to remove the background and segment the foreground. We further apply a Gaussian filter to smooth the values across the domain.

0.5 Step 2: Skeletonization.

The Discrete Morse graph reconstruction algorithm [WWL15, DWW18] takes a density field of any dimension as input, and outputs a graph skeleton capturing center-lines passing through relatively high density regions. In our case, the input density field is defined at vertices of a cubical complex of the domain. In all subsequent operations, only the 2-skeleton of this cubical complex is needed, that is, we assume consists of vertices, edges, and squares. Cubic cells are not needed in the algorithm.

To explain the main idea, consider first the smooth case where we have a smooth function over the domain . Consider the terrain of the density function values plotted over the domain (Fig. 2) where the terrain of a function defined on is given. The underlying graph skeleton of can be captured by the mountain ridges of this terrain (Fig. 2a). Locally, the density along these mountain ridges is higher than the density off of them. These ridges form the so-called 1-stable manifold of the function in Morse theory, and are defined by the integral lines “connecting” local maxima to saddle points (Fig. 2b). (An integral line is a curve in the domain where at any point on it, its tangent vector coincides with the gradient of the density field. Integral lines are thus intuitively flow lines, following the steepest descending direction of the density fields.)

Inside the algorithm, roughly speaking, ridges are associated with certain persistence values, as captured by the so-called persistent homology [Edelsbrunner2002], which can be interpreted as an importance score. This makes it easy to filter out ridges of low importance, which are typically associated with noise, from the final output by providing the algorithm with a persistence threshold. An example of simplification for a 1D function is shown in Fig. 2c.

For the input to our algorithm, we have a density field defined at vertices of a cubical complex of the domain (a 3D region). Following [DWW18], discrete Morse theory [Forman_DM_1998] is used to capture the mountain ridges mentioned above, combined with the persistence algorithm to measure importance. See Supplementary Methods for a description of the algorithm. To improve the efficiency of the algorithm, we modified the algorithm of [DWW18] so that it works directly with cubical complexes and also uses DIPHA [DIPHA_Bauer_2014] to compute persistence pairs in a distributed manner.

These mountain ridges cover the neural branches as locally, points along these neural branches tend to have relatively higher density (signal strength) than off the branches. The global nature of the 1-stable manifolds makes the output skeleton robust to small gaps in signal, and effective at capturing junctions; see e.g., Fig. 2a, where the global nature of 1-manifolds connects through low-density region around the Y-junction.

In ideal circumstances, we would find a persistence threshold that would remove all of the noise and only keep the ridges that make up the true neuron tree. However, because of the noisy nature of biological data and also the Discrete Morse graph reconstruction algorithm will not necessarily output a tree, we cannot take the algorithm’s output as a final output. Instead, we first run the algorithm with a low persistence threshold such that we do not remove any ridges that would be part of an ideal output. Then we simplify the Morse graph skeleton in the next step.

0.6 Step 3: Simplification.

The output of the above persistence-guided Morse-based framework is a geometric graph , also referred to as the Morse graph skeleton. Next, is converted into a spanning forest . Prior to this, both boundary and estimated flow vector simplification can be applied to to limit the number of unnecessary branches in the spanning forest. (The estimation of flow vector at each graph node can be found Initial vector-based Morse graph simplification in Supplementary Methods.) Further simplification strategy to remove false positives and an option to control the level of details presented in the final summarization.

0.7 Extraction of a forest.

The Morse graph skeleton extracted in Step 2 already serves as a good initial skeleton. Each arc in graph is realized by a polygonal path, consisting of edges from the input grid . Then, a forest (a collection of rooted trees) from (See Fig. 3a) is extracted, which is in fact a spanning forest of (that is, contains all vertices from , and edges of are from ). Here we assume we are given the set of tree roots . In our experiments, the positions of roots correspond to soma locations and injection sites for fMOST and tracer injection data, respectively. The injection site and soma positions can be provided by human annotators or automatically detected [SomaDetection_Yan_2013, SomaDetection_Cheng_2016].

With and roots , a weighted shortest path spanning forest algorithm is applied to produce the spanning forest . Intuitively, the weight of an edge depends on the density values of its endpoints, and edges in the regions of higher density values should have larger weights (i.e., smaller distances). The details of the weighted shortest path spanning forest algorithm is introduced in Supplementary Methods. We remark that it may also be reasonable to use the minimum spanning forest of . However, the shortest path spanning forest mimics the natural process of tracers spreading from the injection site, and has a better empirical performance than minimum spanning forest.

0.8 Tree simplification.

Now consider any tree from the initial spanning forest constructed above.

The simplification based on persistence in Step 2 can remove false-positive branches to some extent, however, false positives with small density surrounded by background can still remain in each tree . We further develop the following simplification strategy to remove these false positives.

First, scores are assigned to each tree node in tree ; the details of calculating node scores are explained in Supplementary Methods. Next, one of the following two strategies is chosen to simplify based on the scores. The first strategy, called LeafBurner, starts from the leaves of and iteratively removes leaves with scores at most to obtain the final simplified tree . An originally internal tree node will become a leave if all of its children are already removed by LeafBurner in previous iterations. The second strategy, called RootGrower, instead expands a simplified tree from the given root and gradually includes tree nodes from with scores at least

. At any moment, only children of those nodes already included in the simplified tree from earlier iterations will be considered. Each strategy has its own merits and disadvantages. For

RootGrower, gaps (weak connections) might cause breaks, while LeafBurner can be potentially stuck at branches in remote noise with high scores and fail to remove them (See Fig. 3d).

0.9 Weight-preserving summarization

In this process, our goal is to summarize voxels in the input proofread brain while conserving weight information. The weights of voxels in a certain region should be assigned to a tree node of the DM-Skeleton output in the same region. We use a similar idea when assigning scores in the simplification step, but now only use the voxel weights. The summarization weight of a tree node is total weights of voxels whose nearest neighbor in the tree node set is . Additionally, there is an upper bound for the distance allowed. More specifically, . The upper bound is intentionally large to avoid missing weights. For the STP dataset, we used m.

0.10 Additional features.

For visualization purpose, a ”thickness value” can be assigned to each tree node in accordance with the SWC format (Fig. 3e). When calculating the density score for each tree node in the simplified tree (Supplementary Methods), voxel information is leveraged in the surrounding area of that tree node. Here a similar idea is applied, we associate each voxel in the segmented foreground (in the case that the input is not preprocessed in Step 1, such as the fMOST data, then we use a low threshold to remove background) to its nearest neighbor in within distance which is specified later. The thickness value of a node is proportional to the square root of the number of associated voxels. More specifically, , where is a certain constant, is the radius (thickness) assigned to node , and is the number of associated voxels. Any tree node with no voxel associated is set to have the minimum non-zero raidus, i.e., . Empirically, the default value of is . In the experiment on STP dataset, we set m for the best visualization (Fig. 5a).

An option for users to control the desired level of detail is provided. In particular, given the simplified tree rooted at , an “importance” value is assigned to each branch, and the top (a user provided threshold) branches are then greedily selected. Here, a branch is a unique path from the root node to a degree-1 leaf node . It turns out that branch length works well as the importance value. An example on STP data is shown in Fig. 5a.

Supplementary Methods

0.11 Implementation in the discrete setting

The following provides a more detailed description of how the persistence-guide Morse-based graph framework is implemented. For even more specifics, please refer to [Forman_DM_1998] for an introduction into Discrete Morse Theory and to [DWW18] for more details on the algorithm we implement.

1:  Persistence Computation- Compute persistence pairings induced by lower-star filtration of with respect to -
2:  Obtain Simplified Discrete Gradient Vector Field- Initialize trivial vector field- For each persistence pair, perform cancellation if possible and persistence
3:  Collect Output- compute the 1-unstable manifold of each critical edge with persistence
4:  return  union of 1-unstable manifolds
Algorithm 1 = DiMorSC (, , )

The Discrete Morse algorithm is traditionally given a triangulation of the domain and a density function given at the vertices of . Instead of a triangulation, we take to be the 2-skeleton(vertices, edges, and squares) of the cubical complex of the domain. This does not change any part of the algorithm and reduces computation time. Additionally, the user provides the algorithm with a persistence threshold .

0.12 Step 1

The first step of the algorithm is to compute the persistence pairings P() by the lower star filtration of with respect to -. In our implementation, we use DIPHA[DIPHA_Bauer_2014] to compute persistence because it is a distributed persistent homology algorithm that we found minimizes computation time.

0.13 Step 2

The second step of the algorithm is to compute the discrete gradient vector field. As shown in [DWW18], all that is needed is to calculate the spanning forest that is made up of all negative edges (edges that are paired with a vertex in P()) with persistence less than or equal to . Positive edges (edges paired with a square) and edges with persistence greater than are not part of the spanning forest. No explicit discrete gradient vector field needs to be computed nor maintained. This step takes linear time once the persistence pairings are computed in Step 1.

0.14 Step 3

The third step of the algorithm is to compute the 1-unstable manifold of each critical edge. As shown in[DWW18], for each edge, the 1-unstable manifold is equivalent to the union of the edge with the paths from both vertices to the sink of their corresponding tree in the spanning forest computed in Step 2. The union of all 1-unstable manifolds is outputted by the algorithm. Please note that the 1-unstable manifold of - is equivalent to the 1-stable manifold of .

0.15 Initial vector-based Morse graph simplification.

Given the Morse skeleton graph computed in Step 2 of the DM-Skeleton pipeline, at the beginning of Step 3 of our pipeline, we can perform a flow-vector based graph simplification to remove branches misaligned with underlying flow vectors. In particular, we first estimate flow-vectors to get a sense of which direction true neuron branches will flow. To do this, we use a weighted principal component analysis [Vector_simp_Delchambre_2014]

. In standard principal component analysis, the principal component represents the direction which explains the most variance of a collection of data points. With weighted principal component analysis, the principal component represents the direction which explains the most variance of the weights of data points. For each vertex in the Morse graph output, we compute this flow-vector for a cubic neighborhood, assigning each point in the neighborhood a weight equal to its intensity. We then apply Gaussian diffusion to all computed flow-vectors for the sake of smoothing. Note that the flow-vector estimation is only carried out for nodes in the Morse skeleton graph.

Next, each path from non-degree 2 to non-degree 2 node in the Morse skeleton graph is computed. Then, for each vertex in each path , we estimate a path-vector at w.r.t. by taking the difference between the two vertices that are 4-hops from in the path. We compute the cosine between this path-vector and the estimated flow-vector at and consider this to be the vector score at . The closer the value is to 1, the more the flow-vectors and path-vectors are aligned, meaning the Morse graph skeleton is more aligned with the estimated flows. On the other hand, the closer the value is to 0, the less the vectors are aligned, indicating the Morse graph skeleton is perhaps significantly deviating from the estimated flows. Note that for each vertex with degree greater than two, receives a score for each path it is a part of.

In addition to these vector scores, a capped intensity value is calculated for each vertex in the Morse skeleton graph. This is simply the minimum of the corresponding voxel value of the vertex and a user-provided value. For our experiments, the value provided is 1.

Finally, a score is computed for each path using the vector scores and capped intensity values. Specifically, for each vertex in each path , let return the capped intensity value of , and let return the vector score of w.r.t. . The score of the path is equal to divided by the length of . is a user provided weight parameter (a value of zero is used in our experiments).

Once scores are computed for each path, the simplification process begins. Only paths below a user-provided threshold are removed. In increasing score order, paths are removed if doing so does not change the number of connected components in the Morse graph skeleton. If a path is removed, it is possible that paths with lower scores that were not removed will now not alter connectivity if removed, and thus are again tested for removal. The simplified Morse skeleton graph is then fed into a weighted shortest spanning forest algorithm.

0.16 Weighted shortest spanning forest algorithm

Given a graph where each node is a point in , let with denote the set of roots for spanning forests. We then compute a weighted shortest-spanning forest for as follows: For any edge , its weight is defined as , where is the Euclidean distance between and , and is the density map. Using these weights (as distances), the algorithm computes the shortest path distances between each root in the root-set and nodes in . Then for each root , its shortest path tree is spanned by all nodes whose shortest path distance to is smaller than that to any other node in (ties are broken arbitrarily).

0.17 Score initialization and smoothing

We now describe how to compute a score for all tree nodes in a given tree (from the spanning forest), so as to carry out the tree simplification procedure as described in Methods section.

Consider a specific tree from the weighted shortest-path forest, with root . We first calculate two temporary scores carrying different types of information for each tree node in : one is a density score based on density information, and the other one is a vector score based on directional information. The weighted score of each tree node is the weighted sum of these two temporary scores, i.e., with . Empirically, a large is used because density is the major indicator for false positives. The details for computing and are stated in the following sections. After the weighted score is computed for all tree nodes, we further smooth this score to obtain the final score .

0.18 (1). Calculate density scores for tree/graph nodes.

The density score is calculated based on voxel density. For each voxel, we first find its nearest neighbor in the set of tree nodes and associate it with the nearest neighbor. To avoid associating a voxel to a tree node far away, we set an upper bound , thus a voxel will not be associated with any tree node if its distance to the nearest neighbor exceeds this upper bound. Empirically, on both injection tracer and fMOST single-neuron datasets, the default upper bound is set as the distance between adjacent brain slices (i.e., m for the fMOST dataset and m for the STP dataset). If a voxel has multiple nearest neighbors, we break the tie arbitrarily. The density score of a tree node is simply the sum of density values of all associated voxels, i.e.,

where denotes the set of all voxels, is the density map and is the node set of the simplified tree (Fig. 3b).

0.19 (2). Calculate vector scores for tree nodes.

Vector scores have already been computed as described in Initial vector-based Morse graph simplification earlier.

0.20 (3). Score smoothing to obtain final score .

Recall that we obtain a combined weighted score from density scores and vector scores. This weighted score will be further smoothed before performing tree simplification to remove local noise. In particular, the score of a tree node will be smoothed by averaging those of its neighbors within -hops (i.e., connected via at most tree edges to ). We further restrict only to neighbors that are ancestors or descendants of given the root : intuitively, we wish to consider only neighbors of along the “same” neural branch. An example with is shown in Fig. 3c. The final score of tree node is , where denotes the set of ’s ancestors and descendants (including itself) within hops. In practice, maximum hop is set to for both tracer injection and fMOST datasets.

0.21 Evaluation metrics

0.22 Precision, Recall and F1-score for evaluating fMOST results.

For fMOST data, the single neuron reconstruction data were provided as groundtruth. The precision and recall metrics calculated for evaluating fMOST results depend on True Positives (TP), False Positives (FP) and False Negatives (FN). All the outputs were discretized before computing these metrics. We broke any segment with a length greater than 2 pixels so that segment lengths are all roughly 1 pixel. Then, for each node

from the discretized skeletonization, we labelled as either TP or FP. is considered as TP if its nearest neighbor in the human annotation is within . Otherwise, is a FN. Similarly, a node in the annotation is a FN if there is no predicted node within . Precision and recall are then routinely computed as and

. The F1-score is the harmonic mean of precision and recall, i.e.,

. The parameter was predetermined based on similar reasons as in [Quan_NeuroGPSTree_2016]. The thickness of dendrites near the soma is around and the curve of F1-score against this parameter (Supplementary Fig. 1) also supports the choice. The F1-score is stabilized when the parameter reaches .

0.23 Region analysis

The brain-wide inter-regional connectivity information was obtained through region analysis on the whole brain tracing data. The 3D volume of the data was registered with the Allen Mouse Brain Atlas at 10 resolution [AllenMouseBrainAtlas]. Based on the atlas, the 3D density fields (proofread STP dataset and projected summarizations) were segmented into individual brain regions. Regions were considered ”covered” when the total weights of the voxels in these regions were greater than zero. The covered brain regions were ranked based on the total weights of the voxels inside. The two hemispheres were separated and the above described region analysis was performed on each side in addition to the whole-brain analysis. To quantitatively evaluate performance, we calculate a coverage rate for each method on the two hemispheres, and the entire brain. For a given method, the coverage rate is equal to the sum of the weights for regions in which the method’s output covers divided by the sum of weights for all regions.

Method Precision Recall F1
Proofreading
time
Total process
length (cm)
Normalized proofreading
(sec / cm)
Neuron1 GTree 0.846 0.906 0.875 - 0.500 -
GTree-a1 0.953 0.940 0.946 23m40s 2839.06
GTree-a2 0.954 0.929 0.941 21m40s 2599.14
DM 0.900 0.940 0.920 - -
DM-a1 0.957 0.923 0.940 15m43s 1885.37
DM-a2 0.948 0.941 0.944 15m56s 1911.37
Neuron2 GTree 0.749 0.750 0.750 - 0.508 -
GTree-a1 0.818 0.748 0.781 20m00s 2399.20
GTree-a2 0.807 0.756 0.780 21m40s 2599.14
DM 0.865 0.749 0.803 - -
DM-a1 0.866 0.773 0.817 16m00s 1919.36
DM-a2 0.944 0.771 0.849 16m21s 1961.35
Neuron3 GTree 0.621 0.729 0.670 - 0.640 -
GTree-a1 0.781 0.744 0.762 23m21s 2801.07
GTree-a2 0.726 0.746 0.736 22m45s 2729.09
DM 0.890 0.789 0.836 - -
DM-a1 0.939 0.771 0.847 18m20s 2199.27
DM-a2 0.925 0.789 0.852 17m46s 2131.29
Table 1:

Proofreading is done on GTree and DM-Skeleton (abbreviated as DM in this table) outputs on all three fMOST neuron regions. We further computed the evaluation metrics on those proofread results with the original ground truth. The suffixes “a1” and “a2” represents the two annotators for proofreading. The F1-scores are improved after proofreading. The proofreading time spent on the DM-Skeleton method is also significantly shorter.

[width=10em]SimplificationPersistence 128 256 512 768 1024
0 0.352 0.368 0.468 0.503 0.445
0.05 0.899 0.911 0.912 0.828 0.639
0.10 0.909 0.918 0.910 0.825 0.627
0.15 0.910 0.918 0.907 0.818 0.590
0.20 0.912 0.920 0.899 0.791 0.584
0.25 0.902 0.910 0.889 0.788 0.583
0.30 0.897 0.902 0.872 0.776 0.582
Table 2: This table shows the F1-score of DM-Skeleton outputs with different simplification and persistence thresholds. The F1-score is calculated based on the region of neuron1. The result of the optimal F1-score uses persistence threshold = 256 and simplification threshold = 0.2. The same thresholds (256, 0.2) are applied to the other two regions.
Figure 1: F1-scores vs different distance bounds. The F1-scores for both DM-Skeleton and GTree outputs start to stabilize when the upper bound becomes larger than 4 m. Therefore m is a reasonable choice for the bound.
Figure 2: On the STP dataset, due to the degenerated gradient on the background pixels, there are redundant edges going to the boundary (on the left). Those edges can be filtered out by a small density threshold (result on the right) since the pixel value of the background is zero after pre-processing.
Figure 3: Summarization results of DM-Skeleton method (red), GTree (yellow) and APP2 (green).
Figure 4: Summarization results of DM-Skeleton (abbreviated as DM in the table for better presentation) method, and the version after vector simplification. The coverage percentages are shown in the table. We can see that the vector simplification can provide a clearer structure without much loss of coverage. All the selected regions are still covered by the DM-Skeleton output after vector simplification, but the ranking of regions is slightly distorted.
Figure 5: Mouselight superposition. a, The selected 11 mouselight neurons are superposed with the proofread volume and compared in 2 angles. We mapped the volume to atlas space and slightly raised the brightness and contrast for easier comparison. b, The neurons are also superposed with the DM-Skeleton output. To see the clear contour of DM-Skeleton summarization, we only selected a representative subset of those 11 neurons which can show the shape with lower complexity.