Many nematode (roundworm) species are parasitic on crops, such as potatoes, sugar beets and soybeans, causing billions of losses in agriculture worldwide . Robust nematode detection in microscopic images is a prerequisite for quantifying nematode infestation based on (soil) samples and for phenotyping, i.e. measuring quantitative features that characterize the nematodes. Moreover, the nematode C. elegans is an important model organism in biology, having, for example, been used for high-throughput screening of antimicrobial drugs . A lesioning study of C. elegans
motorneurons served to infer the function of individual neurons. Large-scale chemical and RNAi screens using nematodes are also widespread .
We introduce a CNN-based approach for detecting worm-shaped objects in microscopic images (Section 2.1). The worms are long and thin, i.e. they extend over a large range, but cover only a small number of pixels. Thus, we propose to represent worms by curved lines along the body instead of bounding boxes as they are used by most object detection approaches . Given well-estimated worm skeletons and endpoints, overlapping worms can be untangled (Section 2.2) and segmentation masks can be reconstructed (Section 2.3).
1.3 Relationship to prior work
Previous work focusses largely on untangling worms on clean backgrounds and on phenotyping. In [12, 13, 10], a graph is constructed from skeleton segments for each worm cluster. Individuals are untangled by searching for best fits of the learned worm model while minimizing overlap. Worm Tracker 2.0  was developed to record the behavior of worms and can extract 702 features relevant to behavioral phenotypes.
However, the approaches mentioned above can only be applied to pure samples in which worms are relatively easy to segment (as in ). In contrast, microscopic images acquired from soil samples (Section 3
) contain eggs, nematode cyst fragments and other organic debris. Recent advances in deep learning have greatly improved detection performance for objects in complex contexts. Standard detection approaches make predictions in the form of bounding boxes, which are a good representation of approximately convex objects, but not very informative for elongated worms. Instead, we use curved lines along the worm body as a more suitable representation.
Worms in the image are likely to overlap, especially in case of a large number of worms per area. Different from the model search approaches used by [12, 13, 10], we find that individuals can be well untangled with simple geometric criteria, when the endpoints (head and tail) are known. Therefore, the network is trained to output both the skeleton and endpoints. To handle the instability of the training due to highly unbalanced positive and negative pixels, we adopt a focal loss  with reduced penalty around the positive pixels, which is inspired by CornerNet .
After skeletons of individuals are obtained, we reconstruct segmentation masks by estimating the body width at each skeleton pixel. The entire pipeline outputs object segmentations, requiring only skeleton annotations for training.
2 Our Approach
2.1 Skeleton and endpoint prediction
We employ the standard U-Net architecture  as the CNN component of our framework. Two branches are added on the last feature map for predicting the worm skeleton and body endpoints
, respectively. Each branch consists of one feature convolutional layer (ReLu activation) with 64 channels and a following output layer with 2 classes (softmax activation).
A common problem of dense prediction at each pixel with highly unbalanced positive and negative labels is that the model will degenerate to predict all pixels as the majority. To make the training gradually focus on the mispredicted minority, we apply a variant  of the focal loss :
where and are the image height and width, is the number of objects in the image,
is the probability of being a positive label. The focusing parameter is set to 2.
We compute the weight map by applying a 1D unnormalized Gaussian function to the distance transforms 
of the ground truth. The distance-based weighting reduces the penalty of negative pixels around positive pixels, giving a certain degree of tolerance to the offset in both annotation and training. The standard deviationof the Gaussian determines the distance within which the penalty is reduced. During the training process, this ”ground truth slack” is applied separately to the skeletons and to the endpoints.
The hyperparameteris set to 4 for all experiments. The total training loss is the sum of the losses of both branches.
2.2 Worm untangling
Skeletons of overlapping worms can be untangled with the help of the endpoints (head/tail; for brevity) predicted by the CNN. Based only on the geometry of the skeletons, points on the skeletons (
) can be classified asgeometric endpoints, intersections and line points (, , ).
The skeletons of individuals are computed with Algorithm 1 that consists of two steps: 1) cutting the skeleton (line 2-7) to separate worms fused at the endpoints, and 2) resolving intersections by cutting worms and connecting the cut segments from the same worm (line 10-17): see Figure 1a. The ”matches_one_of” operations localize the closest target inside a search area (a search circle with radius 5).
2.3 Reconstructing masks from the skeletons
To reconstruct segmentation masks from the skeletons (Figure 1c), we need to estimate the width of the worm body at each skeleton point. First, edges are detected in the original image with the Canny edge detector . For each skeleton pixel, we use the shortest distance to the edge as the radius and fill a circle centered at the skeleton pixel.
In order to make the reconstruction more stable, the estimated radius is smoothed (two pixels before and after) along the skeleton. In addition, we limit the radius to be less than the path length from the skeleton pixel to the skeleton end, so that the segmentation will form a tip at the endpoint.
3 Image data
We evaluated our method on two data sets (Figure 2): a motivating potato cyst nematode (Globodera spp.) data set recorded by PSA, and a public C. elegans reference data set (Broad Bioimage Benchmark Collection: BBBC010 ).
The PSA data set contains 3376 nematodes in 1973 microscopic images. Images without nematodes account for 43.6% in the data set, 26.6% contain only one, 16.8% two objects. Since cyst nematodes like Globodera spp. produce their offspring in a cyst, which needs to be physically crushed to release worm shaped nematode stages, the samples contain large numbers of distracting objects, such as cysts wall fragments and cyst attached organic material. In contrast, the images of the BBBC010 data set contain only nematodes, such that segmentation is easy. However, overlap occurs more frequently due to the worm density that is higher than for PSA.
4 Results and discussion
Figure 2 shows exemplary qualitative results demonstrating that our approach works robustly on samples with and without distractors, as well as for dense and sparse object collections.
4.1 Evaluation metric
We quantified object detection performance for individual worms by computing F-scores. This measure is similar to the IoU commonly used in object detection, and it allows us to compare results on BBBC010 with those reported in
. We measured precision and recall for different F-score thresholds above which a worm detection was considered to be correct.
We evaluate both skeletons and masks in this work. When evaluating skeletons (single-pixel), small deviations should be allowed. We formulate the calculation of overlap between prediction and the ground truth as a maximum bipartite matching problem, with each predicted skeleton pixel connected to ground truth pixels within a range of 3.
4.2 Experiments on the PSA data set
The PSA data set contains line and mask annotations as ground truth. It was collected on 10 different dates, and we chose the 367 images from the last three dates as test and the others as training data. To avoid training instability, we ignored images without worms during training.
Our approach achieved 90.34% precision and 86.28% recall for mask detection at a F-score threshold of 0.5. After inreasing the F-score threshold to 0.8, the precision was still 75.85%, with 73.02% of the worms detected (Table 4.3).
It is worth noting that the performance drop from skeleton to mask was more pronounced on the PSA data set, for example at a F-score threshold of 0.8, while no significant decline could be observed on BBBC010. This resulted from the smaller worm widths in the PSA data set for which the error of width estimation (Section 2.3) was relatively larger.
Analyzing the effect of balanced labels (loss L, Section 2.1), we trained a standard segmentation U-Net with unbalanced labels using a binary cross-entropy loss: The model turned out to predict all pixels as background, the larger class (data not shown).
Finally, studying the effect of the ground truth slack (Section 2.1), we trained the model using Gaussians with different standard deviations. Table 1 shows that the variants with ground truth slack performed cleary better than those without.
|PSA_skeleton||no slack||89.67 / 84.35||76.93 / 82.24||56.56 / 70.71||42.51 / 53.76||28.65 / 36.24|
|slack_2_3||90.38 / 88.00||89.55 / 87.41||85.75 / 84.82||81.00 / 80.24||71.14 / 70.47|
|slack_3_5||90.94 / 87.29||89.86 / 86.82||87.44 / 85.18||82.73 / 80.59||75.24 / 73.29|
|no slack||85.49 / 82.91||64.19 / 76.16||47.91 / 59.88||31.91 / 39.88||4.00 / 5.00|
|slack_2_3||89.90 / 86.98||86.82 / 85.00||82.19 / 80.47||72.45 / 70.93||9.74 / 9.53|
|slack_3_5||90.34 / 86.28||87.92 / 84.65||83.70 / 80.58||75.85 / 73.02||9.66 / 9.30|
|no slack||97.68 / 94.51||85.89 / 93.98||62.68 / 84.30||47.35 / 65.47||37.32 / 51.60|
|slack_2_3||96.89 / 94.66||94.43 / 94.51||89.08 / 92.99||82.72 / 87.20||78.02 / 82.24|
|slack_3_5||97.82 / 95.12||95.07 / 94.59||89.91 / 93.22||83.02 / 87.20||78.45 / 82.39|
|no slack||87.86 / 96.44||72.32 / 89.83||59.20 / 76.67||46.51 / 60.24||19.44 / 25.18|
|slack_2_3||95.94 / 97.08||93.92 / 95.52||90.91 / 92.46||84.20 / 85.63||35.03 / 35.63|
|slack_3_5||96.01 / 96.80||94.12 / 95.45||90.34 / 91.75||82.91 / 84.21||27.73 / 28.17|
4.3 Experiments on the BBBC010 data set
For BBBC010, object masks are provided as ground truth. We hence used the morphological skeleton of the ground truth masks to train our model with line annotations.
In addition, we employed data augmentation to increase the size of the training set: We performed gamma correction with to generate images with different contrast. Afterwards, original and generated images were rotated with a step size of 30 degrees. Overall, the training set was expanded to 36 times its original size.
For training, we split the BBBC010 data set into two parts (A01-B24, C01-E04) and performed two-fold cross validation. Our approach achieved 84.20% precision and 85.63% recall for masks at a F-score threshold of 0.8 (Table 4.3). As for PSA, we found that the variants with ground truth slack performed better than those without (Table 4.3).
We have proposed a CNN-based framework for detecting worm-shaped objects. With the focal loss and the ground truth slack strategy, the CNN model can predict worm skeletons and endpoints robustly. Individuals are untangled from worm clusters, and finally we reconstruct a segmentation mask based on the skeleton. The overall pipeline requires only line (skeleton) annotations for training and outputs segmentation masks. Employing a CNN enables our framework to cope also with images with complex background as they occur in the PSA data set. Future work will focus on improving segmentation accuracy, as well as on extracting descriptive features for nematode phenotyping.
-  (2015) Limitations, research needs and future prospects in the biological control of phytonematodes. In Biocontrol Agents of Phytonematodes, pp. 446–454. Cited by: §1.1.
-  (1986) A computational approach to edge detection.. IEEE Transactions on Pattern Analysis and Machine Intelligence 8 (6), pp. 679–698. Cited by: §2.3.
-  (2017) Recordings of caenorhabditis elegans locomotor behaviour following targeted ablation of single motorneurons.. Scientific Data 4. Cited by: §1.1.
-  (2016) Speed/accuracy trade-offs for modern convolutional object detectors.. In Proc. IEEE CVPR, pp. 3296–3297. Cited by: §1.2, §1.3.
-  (2003) Genome-wide RNAi screening in Caenorhabditis elegans.. Methods 30 (4), pp. 313–321. Cited by: §1.1.
-  (2018) CornerNet: detecting objects as paired keypoints. In ECCV 2018, pp. 734–750. Cited by: §1.3, §2.1.
-  (2017) Focal loss for dense object detection.. In Proc. IEEE CVPR, pp. 2999–3007. Cited by: §1.3, §2.1, §2.1.
-  (2003) A linear time algorithm for computing exact euclidean distance transforms of binary images in arbitrary dimensions.. IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (2), pp. 265–270. Cited by: §2.1.
-  (2009) High-throughput screen for novel antimicrobials using a whole animal infection model.. ACS chemical biology 4 (7), pp. 527–533. Cited by: §1.1, §1.2, §3.
-  (2010) Morphology-guided graph search for untangling objects: C. elegans analysis. In Proc. MICCAI, pp. 634–641. Cited by: §1.3, §1.3.
-  (2015) U-net: convolutional networks for biomedical image segmentation.. In Proc. MICCAI, pp. 234–241. Cited by: §2.1.
-  (2012) An image analysis toolbox for high-throughput C. elegans assays.. Nature Methods 9 (7), pp. 714–716. Cited by: §1.3, §1.3, §1.3, §4.1, §4.3.
-  (2010) Resolving clustered worms via probabilistic shape models.. In Proc. IEEE ISBI, pp. 552–555. Cited by: §1.3, §1.3.
-  (2013) A database of caenorhabditis elegans behavioral phenotypes.. Nature Methods 10 (9), pp. 877–879. Cited by: §1.3.