## 1 Introduction

In Connectomics a core task is extracting neuron segmentation from 3D volumes of electron microscopy (EM) images [mitya-large-scale]. The problem of consecutive missing sections is a result of errors that occur in the imaging process. EM volumes are usually built by capturing parallel 2D cross-sectional images and stacking these into a 3D volume. But some slices are often rendered unusable due to blurring, noise, or some other error that loses 2D slices entirely during the imaging process. Moreover, these losses can happen over multiple consecutive slices, creating large sections in the volumes where there is no data connecting neurons [missing-fly] [missing-human] [missing-mouse]. Examples of these errors can be seen in Figure 1.

Researchers have attempted to make UNets robust to missing sections through data augmentations techniques [superhuman]

. Here, sections of training data are deliberately replaced with zeros or some form of noise, while the target boundary map remains the same. This requires the UNet to predict the affinity map for the missing slices by interpolating from the surrounding context. While data augmentation has been shown to improve the UNet’s ability to predict over missing sections, there are no explicit studies regarding the robustness or scalability of this method.

In this paper, we provide a novel method for merging disjoint neurons across large gaps in EM image. To accomplish this we train a deep network to classify which neurons ought to be merged on either side of the gap. Our approach to solving this problem is based purely on the existing segmentations and does not utilize underlying EM image or intermediate affinity maps. Our hypothesis is that the segmentations alone capture the relevant morphological features of neurons necessary to accurately identify which neurons ought to be merged. We do not represent neurons as dense labels within some larger 3D volume. Instead, we convert these volumetric representations into point cloud representations. Work in the area of deep geometric learning has postulated that non-euclidean representations of complex shapes can better capture the underlying geometric structure contained within data

[deepgeo]. By using point clouds here, we hope to efficiently represent the morphological features of neurons, using this as a basis to determine which neurons ought to be merged across the gap. To our knowledge, this is the first direct application of point cloud representations in segmentation for Connectomics. Thus, in this paper, we provide a method for merging neurons and show the viability of point cloud representations in the development of automated methods for improving segmentations.## 2 Method

We formulate the problem of merging neurons across a gap as a binary classification task. Given two neurons within the volume, predict 1 to merge (i.e., assign both neurons the same label) or predict 0 to split (i.e. remain with different labels). For a single example, the direct output of the model is a two dimensional vector whose values are between 0 and 1. Each component can be interpreted as the probability of the label being 0 or 1. We then get a final prediction by thresholding the output by some number

. If the probability of a merge is greater than then we predict , and we predict otherwise. This thresholding is an important feature as it gives a user direct control over the trade-off between correct and false merges. We illustrate the full method in Figure 2.### 2.1 Data Preparation

For training data, we start with a segmentation and simulate missing sections from volume by simply zeroing out entire slices. For test data, we once again simulate missing sections from unseen ground truth in order to measure generalization. In either case, once there are missing sections, the first step is to prepare an example for classification as shown in Figure 3

. We begin by selecting some neuron that exists along a z-slice that borders the missing sections. We may refer to this as the top neuron. We then must select a group of neurons on the other side of the gap as candidates to either merge or remain split. We may refer to these as the bottom neurons. The process of selecting the candidate group is essential to the success of the algorithm. The larger the group of candidates, the more opportunities our model has to make a mistake in prediction. Conversely, the smaller the group of candidates, the higher likelihood that we will not even consider the correct neuron. Therefore it is necessary to construct a heuristic for selecting the group of candidates which is as restrictive as possible while retaining a high likelihood that the top neuron’s correct partners are still within the batch. To this end, we examined a couple of possible heuristics and evaluated their performance based on the size of the candidate group they produce relative to how many positive examples are left out of the group. The best performing heuristic is based on the average Euclidean distance of each bottom neuron from the top neuron.

many neurons with the smallest average distance are selected as the candidate group for our algorithm, whereis a hyperparameter that may be selected. In our case, we found

to be optimal.This method of creating the candidate group can also serve as a non-learning based baseline method. Here, one would simply merge the top neuron with the bottom neuron with the minimum average distance. We compare this baseline methods to ours in the results section and in Figure 4.

Once the candidate group is selected, we have many examples. Each example consists of the top neuron and one bottom neuron from the group. It is important to note here that we preserve the relative position of the top and bottom neurons within the entire volume. Each neuron may span many slices, up to the entire volume. But it is clear the most relevant information for merging neurons across a gap is the neuron’s shape near that gap. So a choice must be made as to how much context to include. To control this, we introduce the hyperparameter of context slices (). This refers to the number of slices parallel to the gap we include to represent each neuron (top or bottom). We truncate each example according to the number of context slices. This means the resulting volume will have many slices where .

The next step is to transform the volumetric representation of each example into a point cloud representation. This is done by removing the interiors of each neuron then simply translating each voxel where a neuron exists to an coordinate based on its relative position in the example. For each example, this will generate a different number of points based on the size of the neurons. To standardize the number of points, we uniformly sample points from each example. We sample with replacement in the case that the number of voxels is less than . Thus the resulting example is an array of shape with a label . Lastly, the coordinates of each example are centered and normalized so coordinates are in , but the relative size between examples is maintained. Unless otherwise noted, all our experiments are performed with the , , and .

### 2.2 Metrics

We report two metrics, that of merge success rate and merge error rate. We define merge error rate as the number of merge errors we create (i.e., False Positives) out of the total number of neurons we attempt to merge (i.e., the total number of top neurons). For each top neuron, it is possible to create arbitrarily many merge errors, so the merge error rate may exceed 1. We define merge success rate as the number of correct connections we make (i.e., True Positives) out of the total number of correct connections there are in the dataset (i.e., True Positives + False Negatives). This metric is equivalent to recall. It is worth noting that some neurons have greater than one correct connection across the gap, so the denominator is not equivalent to the number of top neurons.

We also report Variation of Information (VI) [voi]. VI is a standard metric in Connectomics that is used to evaluate the overall quality of a segmentation in relation to its ground truth. In our experiments, we measure VI after the missing sections are dropped () and then again after we attempt to stitch neurons back together (). The final number we report is the percent reduction in VI which is simply given by: . The VI generated by dropping slices, given by , is dependent on where the gap occurs within the entire volume. The VI is largest when the gap occurs closest to the middle of the volume and decreases as you move towards the edges. Thus to account for this, we measure VI on a given test volume by dropping slices and applying our method at each possible index on the z-axis of the volume. We then average over the results of each iteration.

### 2.3 Model

We experimented with a variety of point cloud classification models. But ultimately, we found that CurveNet [curvenet], a recently developed model which is a top performer on classification over the ModelNet40 dataset [modelnet40], performed best across all our metrics. The model was trained using a learning rate of using a cross entropy loss.

## 3 Results

Our initial experiments were run using publicly available training data from the CREMI challenge [cremi]. While this is technically ground truth data, in practice, there are many flaws and imperfections which make it a reasonable proxy of real-world segmentations. There are three volumes (A, B, C), each of which is of size (given in z, y, x) of with an anisotropic resolution of . This is a relatively large resolution along the z-axis, making this a particularly challenging data set for merging neurons across gaps from missing z-slices. From these volumes, we take 16 slices each for the test and validation datasets.

Our main results concern how the method performs as we increase the number of missing sections (NS) from 1 to 8. We report these results in Figure 4. The first plot we show we refer to as the merge curve. This shows the merge error rate on the x-axis and the merge success rate on the y-axis. Each point is the performance at a given threshold, starting at 0.1 and going to 0.9. The stared points are the optimal thresholds for that model in terms of the best reduction in VI. As expected, we perform best for one missing slice, and performance decreases as more slices are removed. But there is still a meaningful amount of success even in the most difficult case. At 8 slices, we are able to merge a little above of neurons while creating merge errors in less than of cases. One interesting point to note is that for most runs, the optimal VI is achieved at a threshold of or . These correspond to error rates of which are less than . This suggests that, in terms of the VI metric, we should strongly prefer to avoid merge errors over increasing the possible number of merge successes. Additionally our method shows substantial improvement over our proposed baseline method. This shows that this problem cannot be easily solved by a simple heuristic and that learning, especially for larger gaps, is required.

### 3.1 Parallel is Easy

A natural question is: how do the underlying arrangement of neurons affect relative to the EM images affect the performance of our method. By design, the CREMI volumes provide a perfect opportunity to investigate this question. In Volume A, most of the neurons run parallel to the z-axis. In Volume B and Volume C, the neurons run through the volume at many different angles, at times almost completely perpendicular to the z-axis. The effect of this difference is made clear in Figure 5. Here we see the two border slices adjacent to either side of the gap. The images show merge errors and merge successes for an example of 5 missing sections. Neurons colored green were merged with their partner correctly, red indicates there was some merge error, and blue indicates that no attempt to merge was made at all due to thresholding. The images make clear that we can successfully merge many more neurons in volume A relative to volume B. Additionally, in both volumes, the neurons that run parallel to the z-axis are successfully merged at a much higher rate than those that cross the z-axis. A possible reason for this stark difference is that when the neurons run parallel to the z-axis they have much less variation in terms of the relative displacement between slices and in terms of the variation in their cross-sectional shape along slices.

### 3.2 Context Slices

As mentioned before, an important hyperparameter for the method is that of context slices. This refers to the number of slices parallel to the missing sections used to represent the top and bottom neurons. To understand the influence of this choice in representation on the method’s performance, we perform an ablation study on this parameter. The results are given in Figure 6. The clear trend is that it is optimal to have more than one context slice, but the choice of 2, 3, or 4 slices does not have a large impact. One interpretation of this result is that the single context slice does not allow the network to understand the direction that the neuron is ”moving” along the z-axis. The addition of just one more slice allows the network to compute some form of a derivative that indicates whether the top neuron is moving towards or away from the bottom neuron.

### 3.3 Data Efficiency

It is clear that point clouds representations are much more efficient in terms of data than volumetric representations of a segmentation neuron. This is because the volumetric representation of a neuron is data inefficient in two ways. The interior of the neuron must be represented, and the neuron must be padded into a rectangular volume. It is easy to see that with no information loss, one could represent a single neuron with an array of x, y, z coordinates for each voxel on the exterior surface of the neuron. In almost all cases, this will use significantly less data. We can also look at robustness to downsampling. That is, of the points on the exterior, how many are necessary to capture the relevant morphological features to merge neurons successfully.

We study this explicitly by varying the number of points with which we sample the volumetric representation of each example. The results are shown in Figure 6. Optimal performance occurs at 2048 points. But the loss in VI decreases very slowly until 128 points, after which there is a steep drop off. With 3 spatial dimensions and 128 points, the representation uses 384 total floating point numbers. It is important to emphasize how small this is in comparison to volumetric representations. To represent an example as a volume with the same amount of data, one would have to use a volume of size roughly . This is completely unfeasible no matter how one attempts to formulate the problem.

## 4 Conclusion

We have presented a novel method for merging neuron fragments across consecutive missing sections in EM volumes. Our method shows a high degree of success in correctly identifying neuron pairs across missing data while suggesting few false merges. We showed that our method is viable for solving this problem across gaps of up to 8 at successive slices, more than any other method has attempted to address.

Other work has attempted to automate the correction process, or proofreading, of imperfect segmentations. These are often based around 3D convolutional networks [auto-proof-1] [auto-proof-2] and other work has attempted to learn over graph structures [graph-proof]. But to our knowledge, no work has been done which attempts to learn over segmentations represented as point clouds. The success of our method shows that point cloud representations from segmentation alone can efficiently capture the underlying structure of neuron morphology. This suggests that point clouds shows are a viable representation of segmentations for other automated proofreading tasks. Future work may wish to extend this method to identifying merged and split neurons throughout entire datasets. We hope that this work not only provides researchers with another tool to improve neuron segmentations, but also is a first step in using geometric representations of Connectomics data.

Comments

There are no comments yet.