Contour Proposal Networks for Biomedical Instance Segmentation

04/07/2021 ∙ by Eric Upschulte, et al. ∙ Forschungszentrum Jülich 0

We present a conceptually simple framework for object instance segmentation called Contour Proposal Network (CPN), which detects possibly overlapping objects in an image while simultaneously fitting closed object contours using an interpretable, fixed-sized representation based on Fourier Descriptors. The CPN can incorporate state of the art object detection architectures as backbone networks into a single-stage instance segmentation model that can be trained end-to-end. We construct CPN models with different backbone networks, and apply them to instance segmentation of cells in datasets from different modalities. In our experiments, we show CPNs that outperform U-Nets and Mask R-CNNs in instance segmentation accuracy, and present variants with execution times suitable for real-time applications. The trained models generalize well across different domains of cell types. Since the main assumption of the framework are closed object contours, it is applicable to a wide range of detection problems also outside the biomedical domain. An implementation of the model architecture in PyTorch is freely available.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 5

page 6

page 8

page 9

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1: The Contour Proposal Network (CPN) setup for instance segmentation. An initial backbone network computes feature maps and . Based on the low-resolution

a classification head determines for each pixel if an object is present or not, while the contour regression heads generate object contours, defined in the frequency domain, at each pixel. All contour representations that are classified to represent an object are extracted and converted to pixel space using Eq. 

1. The high-resolution

is used to regress a refinement tensor that is used during a Local Refinement step (Alg. 

1) that maximizes pixel accuracy. Finally, non-maximum suppression (NMS) is applied to remove redundant detections.

1.1 Motivation

Instance segmentation is the task of labeling each pixel in an image with an index that represents distinct objects of predefined object classes. This is different from semantic segmentation, which assigns the object class itself to each pixel, and does not distinguish objects of the same type if their shapes touch or overlap. A common instance segmentation problem in biomedical imaging is the detection of cells in microscopic images, in particular for quantitative analysis. While the pixel accuracy of recent cell segmentation methods has become sufficient for many imaging setups, detection accuracy often remains a bottleneck, especially wrt. handling of touching and overlapping objects. In many biomedical applications, accurate object detection and realistic recovery of object shape is both desirable. However, many instance segmentation methods define one unique object index per pixel, referring to the foreground object only. This results in an incomplete capture of partially superimposed objects, and consequently to a misrepresentation of their actual shape (as in e.g. Fig. 4g top) which in turn might impair shape-sensible downstream tasks like morphological cell analysis. To avoid such problems, instance segmentation methods with appropriate modeling of object boundaries are required.

Furthermore, segmentation models should generalize well to variations in the data distribution. This is important for small variations, which inevitably occur in practical lab settings due to variations between different samples, fluctuations of histological protocols and digital scanning processes (Stacke et al., 2019; Yagi, 2011). Generalizability is also important at the scale of data domains, in order to allow transfer of trained models with manageable annotation efforts.

1.2 Related work

Pixel classifiers

Instance segmentation can be achieved using a dense pixel classifier such as the U-Net (Ronneberger et al., 2015), and can be casted from a semantic segmentation solution to an instance-agnostic approach using a grouping strategy such as connected component labeling (CCL). This will group multiple pixels of the same class into non-overlapping instances. To distinguish touching instances as well, one may introduce narrow background gaps between objects with careful per pixel loss weightings (Ronneberger et al., 2015). Improved versions define border pixels as an additional class (Chen et al., 2016; Guerrero-Pena et al., 2018; Zabawa et al., 2020). Such models have demonstrated to segment the borders of isolated objects very precisely. However, in case of crowded images, already a few falsely classified pixels can merge close-by instances and critically impair the detection result (Caicedo et al., 2019).

Pixel classifiers coupled with shape models

To reach better robustness on crowded images, some authors proposed to couple active contour models with CNN-based segmentation models. Thierbach et al. (2018)

first employed a dense pixel classifier to predict probability maps of object centroids. These maps were then thresholded to initialize a subsequent active contour segmentation.

Zhang et al. (2018) suggested a scheme where a CNN is trained to explicitly predict the energy function for fitting an active contour model to a given object. The contour computation is here attached as a black box to the learning loop, so that the conversion from pixel to shape space and back is invisible to the network training, and thus not part of the actual learning. Gur et al. (2019) proposed to use a neural renderer as a differentiable domain transition from polygon to pixels, allowing a full learning path. This way they train a U-Net-like CNN that produces 2D displacement fields for polygonal contour evolution with a loss that addresses the segmentation as well as ballooning and curvature minimizing forces in the pixel domain. While this allows end-to-end training, the actual boundary representation remains hidden and is not accessible to downstream tasks.

Dense vs sparse detectors

The above-mentioned solutions have in common that they learn object masks in the pixel domain under a hidden or decoupled shape model, producing a dense classification by assigning a label to each pixel of an image. An alternative is to perform object detection by directly estimating the parameters of a contour model in its embedding space, and attaching a pixel location to the shape descriptor. This way the bounds of an entire object are concentrated at a single pixel, leading to a sparse detection scheme and forcing the model to develop an explicit internal understanding of instances. For closed contours, pixel masks can then be obtained by rasterization. Giving direct emphasis (and possibly supervision) to the shape model, such an approach could provide a more interpretable and efficient problem representation.

Bounding box regression

The de-facto standard for modeling boundaries in object detection networks are bounding boxes (Ren et al., 2017; Liu et al., 2016; Lin et al., 2017; Redmon and Farhadi, 2018; Bochkovskiy et al., 2020; Yang et al., 2020). Here, the models predict at least four outermost object locations. This approach captures little information about the object instance beyond location, scale and aspect ratio (Jetley et al., 2017). The most established approach from this category is the Mask R-CNN (He et al., 2017). It first detects bounding boxes by regression, and then gathers image features inside the bounding box to produce pixel masks.

Regression of shape representations

More detailed shape representations have been proposed in recent years as well (Jetley et al., 2017; Schmidt et al., 2018; Miksys et al., 2019; Xie et al., 2020). Closest to our work is the approach of Jetley et al. (2017), who combined the popular YOLO architecture (Redmon et al., 2016) with an additional regression of a decodable shape representation for each object proposal. They showed that integration of a higher-order shape reasoning into the network architecture improves generalization. In particular, it allowed to predict plausible masks for previously unseen object classes. They evaluated three different shape representations, namely fixed-sized binary shape masks, a radial representation, and a learned shape encoding. The binary shape masks show quantitatively worse results than the other two representations. The radial representation defines a series of offsets between an anchor pixel and points on its contour, and turned out to be inferior for common object classes in natural images. It has also been applied for cell nuclei detection in the StarDist architecture (Schmidt et al., 2018), which showed good detection accuracy but stays behind the pixel precision achieved by U-Nets (Ronneberger et al., 2015). StarDist was extended as PolarMask (Xie et al., 2020)

to be applicable to multiclass problems, such as the COCO dataset

(Lin et al., 2014). Also it was coupled with a different loss and evaluated with multiple backbone architectures. In general, the applicability of the radial model is limited to the ”star domain”, which excludes many non-convex shapes (Dietler et al., 2020). Predicting radial representations also involves a predefined number of rays, leading to possibly suboptimal sampling and limited precision of the contour (Schmidt et al., 2018). As a third representation, (Jetley et al., 2017) train an auto-encoder on the Caltech-101 silhouettes dataset to learn a shape embedding for the detection network. Miksys et al. (2019) extended this approach with an additional distance transform acting as a proxy between decoder and shape image, which allows to superimpose ”discs” at every pixel location and hence mitigate the impact of falsely predicted pixels. They also considered the inclusion of the decoder in the training process, and showed quantitative improvements. However, this model still lacks optimal pixel precision.

1.3 The Contour Proposal Network

Based on existing strengths and weaknesses in the field, we here introduce the Contour Proposal Network (CPN). Similar in spirit to the approach of Jetley et al. (2017), it models instance segmentation as a sparse detection problem by performing regression of object shape representations at single pixel locations. The model architecture is depicted in Fig. 1: A backbone network derives feature maps from an input image. For each pixel of the feature map, regression heads generate a contour representation, while a classification head determines whether an object is present at a given location. Based on the classifications, a proposal sampling stage then extracts a sparse list of contour representations. By converting these to the pixel domain using the fully differentiable Fourier sine and cosine transformation, we implicitly enforce the contour representations to be defined in the frequency domain, inspired by Elliptical Fourier Descriptors (Kuhl and Giardina, 1982). The resulting contour coordinates are optimized by a local refinement procedure to further maximize pixel precision using a residual field, produced from an additional regression head. This is similar in spirit to the displacements fields used by Gur et al. (2019), but integrates more naturally as the CPN already operates with near-final contour proposals in the pixel domain at this stage. The complete framework is trained end-to-end across all these stages. As a final inference step, non maximum suppression removes redundant detections from the object proposals.

We train CPNs, that outperform U-Nets and Mask R-CNNs in instance segmentation accuracy in our experiments and demonstrate that inference speed of selected CPNs is suitable for real-time applications, especially when considering automatic mixed precision (amp). The trained models generalize well to other datasets which cover different families of biological cells.

2 Methods

The Contour Proposal Network (CPN) uses five basic building blocks (Fig. 1). Initially, dense feature maps (high-resolution) and (low-resolution) are generated by a backbone CNN which can be freely chosen. From the latent feature map , a classifier head detects objects, while parallel regression heads jointly generate explicit contour representations. The classification scores generated by the classifier estimate whether an object exists at the given locations. Contours are modelled as a series of 2d coordinates by applying the Fourier sine and cosine transformation of degree to the latent outputs of the contour regression head, resulting in a fully differentiable, fixed-sized format for boundary regression (Sec. 2.2). The contour proposals are a dense map on the pixel grid, with an tensor of shape descriptors and an tensor of corresponding object classification scores. The output resolution is independent of the input resolution and effectively defines the maximum number objects that can be detected. All representations that are deemed to describe a present object are extracted as a list of contour proposals and mapped to pixel space using Eq. 1. The proposals are then processed by a trainable refinement block to maximize fit of contours with image content using high-resolution features . The last building block filters redundant detections using non-maximum suppression.

2.1 Detection

A classification head produces a detection score for each contour representation and states whether it represents a present object or not. In our work we focus on the binary case and do not distinguish different object categories.

2.2 Contour Representation

Following Kuhl and Giardina (1982), we define a contour of degree as a series of 2d coordinates with , using the Fourier sine and cosine transformation

(1)

For legibility we omit the subscript of and in the following. The evolution of the x-coordinate along the contour is parameterized by two series of coefficients and , with determining the spatial offset of the contour on the pixel grid. Accordingly, is parameterized by coefficients and

. The parameter vector

hence determines a 2D object contour. The location parameter with interval length determines at which fraction of the contour a coordinate is sampled. The orderhyperparameter determines the smoothness of the contour, with larger adding higher frequency coefficients and thus allowing closer approximations of object contours (Fig. 2). This formulation always produces closed contours. It is differentiable, and both the contour representation and the sampled contour coordinates are fixed in size, given an order and a sample size

. Thus, we can directly regress the parameters of this representation to predict closed object contours with convolutional neural networks.

The CPN employs two separate regression heads for predicting contour shape and localization in the image . By isolating regression of shape and location, we intend to preserve translational invariance of the contour representation and equivariance of the offset regression.

(a) , d vector
(b) , d vector
(c) , d vector
Figure 2: Contour representation with different settings of the order hyperparameter . It defines the vector size of the descriptor that is given by . The higher the order, the more detail is preserved. The 2d contour coordinates are sampled from the descriptor space with Eq. 1. Even small settings of

yield good approximations of odd and non-convex shapes, in this case human neuronal cells, including a curved apical dendrite.

2.3 Local Refinement

(a) Refinement tensor
(b) Refinement: before (blue), after (green)
Figure 3: Local refinement example. 2(a) illustrates the learned refinement tensor as a vector field, superimposed with the input image. 2(b) shows a contour proposal before and after refinement. The refinement tensor learned to shift contour coordinates to maximize pixel-precision. (Best viewed in color)

To maximize the pixel-precision of estimated contours, we propose a local refinement of predicted contour coordinates in the pixel domain as defined in Alg. 1. Using an additional regression head we generate a two channel feature map which represents a 2D residual field on the pixel grid. We correct each rounded contour proposal coordinate using its residual , with as the maximum correction margin, minimizing the distance between estimated and actual contour coordinate.111Note that the use of rounded coordinates prevents contour proposal heads from being influenced by the refinement head during training. Also the rounding provides a consistent starting point for the refinement. This correction can be applied multiple times by reusing at updated pixel coordinates. Fig. 3 shows an example. A contour coordinate has reached its final position once yields an offset of zero for all spatial dimensions at a given location. The combination of correction margin and the number of iterations limits the influence that the refinement may have on the final result. The local refinement reduces localization errors of the contour regression and can compensate the exclusion of higher contour frequencies by choosing small values for the order hyperparameter . Such a refinement becomes tractable since the CPN directly outputs boundary coordinates - in fact the procedure can be efficiently implemented using fancy indexing. The prediction of the refinement tensor is trained implicitly by minimizing the distance between refined contour coordinates and ground truth coordinates in pixel space.

1:procedure Refine(x, y, , , )
2:     for  iterations do
3:               return
Algorithm 1 Local Refinement. Iteratively refine a contour coordinate using refinement tensor and maximum correction margin , assuming and . Rounding is denoted by .

2.4 Non-Maximum Suppression

Similar to other object detection methods (He et al. (2017); Redmon et al. (2016); Lin et al. (2017)) the CPN generates dense proposals, thus multiple pixels of the produced output grid may represent the same object. To remove redundant detections during inference, we apply bounding-box non-maximum suppression (NMS). NMS specifically keeps proposals with a high detection score, but suppresses proposals with lower scores and a bounding box IoU (Intersection over Union) that exceeds a given threshold. As the CPN outputs lists of contour coordinates, we can define bounding boxes very efficiently for each contour proposal as .

2.5 Loss functions

We define objectives for two components: Detection score and contour prediction. For legibility we present objectives per pixel.

Detection score

The detection head performs binary classification for each pixel, producing a score that states whether an object instance is present or not at the pixel location. The loss for this task is the standard Binary Cross Entropy (BCE).

Contour Coordinate Loss

At each pixel where a contour should be attached, we apply a loss that minimizes the distance between ground truth contour coordinates and estimated coordinates. For a single coordinate it is given by

(2)

The contour proposal prediction is trained using

(3)

with ground truth contour coordinate and estimated , at random positions . Coordinates are defined as in Eq. 1 given targets and estimates and . Local refinement is trained accordingly with

(4)

substituting with refined coordinates using Alg. 1.

Representation Loss

Additionally, we can directly supervise the shape parameters in the frequency domain using

(5)

with denoting the exclusion of from . While the objective is already well defined without this representation loss, it provides additional regularization of the shape space and enables to emphasize specific detail levels by applying individual factors . An intuitive setting decreases with growing to put more relative emphasis on the coarse contour outlines represented by low frequency coefficients.

CPN Loss

Combining the components above, the overall per pixel loss is given by

(6)

with for pixels that represent an object and otherwise.

3 Experiments and results

We evaluate the instance segmentation performance of the CPN on three datasets (NCB, BBBC039, SYNTH) and compare the results with U-Net and Mask R-CNN as baseline models. Also, the cross-dataset generalization performance is examined by training models on BBBC039 and testing them on a fourth dataset, BBBC041. To better understand the effects of employing the CPN feature space, this experiment includes a U-Net that is first trained as part of a CPN. Finally, we compare inference speeds of different models.

Figure 4: Example patches from different datasets with reference annotations (5 row) and detections computed by the proposed CPN-U22 (4 row), U-Net (3 row) and Mask R-CNN (2 row) models.

3.1 Datasets

(a) NCB
(b) BBBC039
(c) SYNTH
(d) BBBC041
Figure 5: Examples from the used datasets (Sec. 3.1).

NCB - Neuronal Cell Bodies

This dataset consists of 82 grayscale image patches from microscopic scans of cellbody-stained brain tissue sections, with annotations of approximately 29,000 cell bodies. Fig. 4(a) shows examples. It includes significant variations in cell shape, intensity, and object overlap, as well as challenging configurations like occlusions, noise, varying contrast and histological artifacts. Brain samples come from the body donor program of the Anatomical Institute of Düsseldorf in accordance with legal and ethical requirements222ethics approval #. Tissue sections were stained using a modified Merker stain (Merker, 1983). Each tissue section has an approximate thickness of 20 µm and is captured with a resolution of 1 µm using a high-throughput light-microscopic scanner (TissueScope HS, Huron Digital Pathology Inc.). Note that cell bodies in this dataset are always continuously and fully annotated even under occlusions, in order to allow a model to learn mostly realistic morphologies. Image patches were manually labeled for cell body instances by a group of experts in our institute. This was performed using a custom web-based annotation software, which allowed to enter overlapping pixel labels and to inspect the 3d context provided by depth focusing. To minimize highly subjective annotations of ambiguous cases, the software includes collaborative feedback features that allow consensus among multiple experts during annotation. The complete dataset will be publicly available on the EBRAINS333https://ebrains.eu platform.

Bbbc039 - Nuclei of U2OS cells in a chemical screen

This is a dataset from the Broad Institute Bioimage Benchmark Collection (Ljosa et al., 2012). It consists of 200 grayscale images from a high-throughput chemical screen on U2OS cells, depicting approximately 23,000 annotated nuclei. Fig. 4(b) shows examples.

Bbbc041 - P. vivax (malaria) infected human blood smears

Also from the Broad Bioimage Benchmark Collection (Ljosa et al., 2012), this dataset consists of 1364 images depicting approximately Malaria infected human blood smear cells, annotated with bounding boxes. Fig. 4(d) shows examples.

Synth - Synthetic shapes

This dataset consists of 4129 grayscale images that show a large variety of different synthetic shapes in different sizes. It contains approximately 1,305,000 annotated objects. Fig. 4(c) shows examples. Shapes include simple structures, such as circles, ellipses and triangles, as well as more complex non-convex structures. Objects and background vary in intensity and texture, with objects showing mostly darker intensities than background. Similar to the NBC dataset mentioned above, objects can overlap and are fully annotated, even if occluded. Thus a single pixel can belong to more than one instance.

3.2 Baseline Methods

We evaluate performance against the same baseline methods used in Schmidt et al. (2018).

  • U-Net (Ronneberger et al., 2015) is an encoder-decoder network with lateral skip-connections and a de-facto standard for biomedical image segmentation (cf. Sec. 1.2

    ). In addition to its original definition we use batch normalization after each convolutional layer. Following

    Caicedo et al. (2019) the network classifies each pixel into one of three classes: cell, background and boundary.

  • Mask R-CNN (He et al., 2017) is a widely used instance segmentation method that proposes bounding boxes for each object, filters proposals by non maximum suppression and finally produces masks based on proposed bounding box regions (cf. Sec. 1.2). The implementation used in our experiments is based on torchvision, a Python package that includes popular model architectures and is part of PyTorch.

3.3 CPN Training

For comparability, we instantiate CPN models with the same backbone architectures as the baseline models, and train them with the same number of epochs, data size, batch size and data augmentation. In particular, we use four CPN variants:

  • CPN-R50-FPN uses a Feature Pyramid Network (FPN) (Lin et al., 2017) with a layer residual architecture (He et al., 2016, ResNet-50) as its backbone and applies 4 iterations of contour refinement (Sec. 2.3)

  • CPN-R50-FPN is CPN-R50-FPN with contour refinement disabled

  • CPN-U22 uses a 22 layer U-Net as a backbone, which is setup like the baseline described in Sec. 3.2, but omitting its final output layer. It uses 4 iterations of contour refinement

  • CPN-U22 is CPN-U22 with contour refinement disabled

For assessing inference speed, we will use additional backbone architectures (Sec. 3.6).

We supervise both the contour representation and the sampled contour coordinates. As the contour representation is well defined, we calculate ground truth representations on the fly and use them to guide the network during training. Using Eq. 1 with uniform sampling () we retrieve contour coordinates from both ground truth and prediction for supervision. The sample size hyperparameter influences precision and performance by fixing the number of coordinates used during training. We choose here.

While it is also possible to use the derivable and non-parametric formula from Kuhl and Giardina (1982) to derive the contour representation from another latent space, we did not observe any benefits and thus omit this possibility.

Model Backbone F1 F1 F1 F1 F1 F1
Neuronal Cell Bodies
CPN-U22 U-Net
CPN-U22 U-Net
CPN-R50-FPN ResNet-50-FPN
CPN-R50-FPN ResNet-50-FPN
U-Net U-Net
Mask R-CNN ResNet-50-FPN
BBBC039
CPN-U22 U-Net
CPN-U22 U-Net
CPN-R50-FPN ResNet-50-FPN
CPN-R50-FPN ResNet-50-FPN
U-Net U-Net
Mask R-CNN ResNet-50-FPN
Synthetic Shapes
CPN-U22 U-Net
CPN-U22 U-Net
CPN-R50-FPN ResNet-50-FPN
CPN-R50-FPN ResNet-50-FPN
U-Net U-Net
Mask R-CNN ResNet-50-FPN
Table 1: Instance segmentation results for selected datasets and methods. The F1 score F1 is reported for a range of intersection over union (IoU) thresholds and as the average F1 for thresholds .
Model Backbone F1 F1 F1 F1 F1 F1
CPN-U22 U-Net
U-Net Pretrained U-Net
U-Net U-Net
Mask R-CNN ResNet-50-FPN

3.4 Detection and segmentation performance

To evaluate the detection performance and the shape quality of the produced contours we use the harmonic mean of precision and recall

for different Intersection over Union (IoU) thresholds . The IoU threshold defines the minimal IoU that is required for two shapes to be counted as a match. Each ground truth shape can be a match for at most one predicted shape. A True Positive is a predicted shape that matches a ground truth shape, a False Positive (FP) is a shape that does not match any ground truth shape and a False Negative (FN) is a ground truth shape that does not match any predicted shape. scores with a small quantify the coarse detection performance of a model, yielding good scores if the model correctly infers object presence along with a roughly matching contour. scores with a larger quantify the fine detection performance, allowing little deviance from the target shape. We define F1 for thresholds , to measure the average performance for different thresholds.

Tab. 1 shows quantitative results of the CPN, U-Net and Mask R-CNN on three different datasets. The CPN with local refinement yields highest scores on all datasets. Local refinement further increases the average F1 scores, especially for high thresholds , thus increasing the quality of the contours as expected. On the datasets BBBC039 and SYNTH CPN-U22 outperforms the baseline models for all thresholds.

3.5 Cross-dataset generalization

We assessed how well the baseline and CPN models generalize to variations in the input data distribution as follows: Models are trained for instance segmentation on the BBBC039 dataset. Without any retraining or adaptation, models are then applied to BBBC041. Generalization capabilities are then evaluated with the F1 score on the basis of bounding boxes derived from the respective instance segmentation results. This provides a quantitative characterization of detection and segmentation performance under transfer to different data domains. To comply with the basic characteristics of BBBC039, we converted the images to inverted grayscale images and applied a fixed contrast adjustment and downscaling.

(a) Annotation

(b) CPN-U22

(c) U-Net

(d) Mask R-CNN
Figure 6: Cross-dataset generalization examples from three different models. The models were trained on BBBC039 and applied to images from BBBC041 without retraining or adaptation. Two samples are depicted above.

Results are shown in Tab. 3.3. CPN models consistently show higher scores than baseline methods. For small IoU thresholds such as , the scores of CPN and Mask R-CNN models are particularly distinguished from U-Net. Fig. 6 shows detected instances from different methods on two typical examples, illustrating the problems in cross-dataset generalization. CPN-U22 tends to detect conservatively, preferring some false negatives for avoiding false positives. U-Net shows more false positives, sometimes seemingly detecting noise. For Mask R-CNN, contours are less precise than for the others, and overall less true instances are detected.

Reusing trained CPN backbones for different tasks

We also examined the generalization performance of a U-Net when its encoder and decoder are trained as the backbone of CPN-U22, and then reused with a new final prediction layer in a U-Net to output segmentation results. For retraining, the encoder part is frozen, to ensure that the CPN feature space is kept intact. This case is reported in the second row of Tab. 3.3. All scores are significantly higher compared to the U-Net without such pre-training. For high IoU thresholds, e.g. , this variant even provides overall highest scores. However, as the generalization performance increased, the F1 score on the BBBC039 test set dropped from to .

3.6 Inference speed

Model FPS FPS (amp)
CPN-R50-FPN
CPN-R50-FPN
CPN-X50-FPN
U-Net
CPN-U22 (stride 2)
Mask R-CNN-R50-FPN -
CPN-X101-FPN
CPN-U22
Table 3: Inference speeds of different models. We report the number of frames per second (FPS) for the BBBC039 test set with an image size of . We measure times with single-precision (float32) and and automatic mixed precision (amp). The initial run and possible post-processing steps are excluded. All models were implemented and executed as PyTorch models on an NVIDIA A100. We denote ResNet by ’R’, ResNeXt by ’X’ and U-Net by ’U’ for brevity.

We computed the number of frames per seconds (FPS) on the BBBC039 test set for different models. Each image has a size of . To improve the precision of the measurement we reiterated over the test set multiple times. Pre- and post-processing steps were excluded from the timings, as well as initial warm-up runs. This experiment was performed in single-precision (float32) and automatic mixed precision (amp) via PyTorch’s autocast feature. The latter automatically selects CUDA operations to run in half-precision (float16) to improve performance while aiming to maintain accuracy.

Results are presented in Tab. 3. In terms of inference speed, CPN-R50-FPN outperforms both Mask R-CNN-R50-FPN and U-Net when applied with normal single-precision (float32). As this CPN reached FPS, it qualifies for many online video processing applications. When applied with automatic mixed precision (amp) the CPN-U22, that uses a stride of in the classification and regression head, achieved FPS - the highest performance of the tested CPN models and the second highest overall performance among the tested models. U-Net, which shares the same backbone, showed the best inference speed performance using amp.

The influence of local refinement on inference speed was evaluated for the R50-FPN based CPN, for which four refinement iterations reduced the result by 0.33 FPS, when used with single-precision (float32).

4 Discussion and conclusion

We proposed the Contour Proposal Network (CPN), a framework for segmenting object instances by proposing contours which are encoded as interpretable, fixed-sized representations based on Fourier Descriptors. CPN models can be constructed with different backbone CNN architectures to produce image features. We assessed the performance of four different CPN variants, employing both U-Net and ResNet-FPN backbones, against a standard U-Net and Mask R-CNN as baseline. All U-Net based CPNs outperformed the U-Net counterpart in terms of F1 instance segmentation performance on all three tested datasets, both with and without local refinement. Given that CPN and U-Net share the same backbone architecture U22, the results indicate that the CPN provides a more effective problem description. This is also supported by the comparison of the CPN and Mask R-CNN in our experiments. For both tested backbone architectures, the CPNs show consistently higher F1 than Mask R-CNN-R50-FPN.

The CPN models employ a highly entangled representation of object shapes and sparse detections: The networks are effectively forced to concentrate the description of a complete object into a single pixel by anchoring the boundary representation to a specific coordinate. This requires them to form an intrinsic spatial relationship between whole objects and their parts, which encourages compact and robust representations with good generalization properties. This principle shares some commonalities with Capsule Networks (Sabour et al., 2017), which also aim to condense instances of objects or object parts into vector representations coupled with a detection score. The effects can be observed when looking at examples as depicted in Fig. 4: If boundaries are invisible or poorly defined, CPN models exploit the learned knowledge of boundary shapes to find highly plausible separations (e.g. Fig. 4 e, g, h). Very small and touching objects, which are often overseen by pixel-based methods, are well detected (e.g. Fig. 4 a, b). Separation of clusters of kissing objects is typically modelled quite accurately, reproducing gaps between touching shapes very consistently (e.g. Fig. 4 c, d). Thin structures, such as dendrites, can be modeled accurately (e.g. Fig. 4f). Discontinuities that may occur with pixel-based methods can be avoided, as proposed contours are continuous and closed by design.

While leading to accurate object representations, experiments on cross-dataset generalization showed that the learned shape priors are not overly restrictive and transfer well to different data distributions. Even more, the CPN models are able to produce plausible contours for previously unseen objects as long as their basic morphology is consistent with the training examples. In particular the F margin between CPN-U22 and U-Net of suggests that the better performing CPN formed a more universal intrinsic understanding of what an instance is. In this context, we also observed that the CPN and its objective have a positive influence on the backbone CNNs to produce a feature space with good generalization properties - a pixel-classifying U-Net showed significantly better performance on our cross-dataset evaluation when its encoder and decoder were trained as a CPN backbone.

By modeling the contour representation in the frequency domain, CPNs can bypass several sampling problems occurring in previous works, like selecting the optimal sampling rate in pixel space. Instead, by setting the order of the Fourier series, the user can specify different levels of contour complexity in a natural way. Furthermore, the representation allows to generate arbitrary output resolutions without compromising detection accuracy.

The local refinement step, which is an integrated and fully trainable part of the CPN, supports contour proposals to achieve high pixel precision despite the regularization imposed by the shape model. We measured a notable increase in performance for high IoU thresholds when applying refinement, indicating that high contour frequencies can be modeled efficiently using a residual field. While the refinement can improve contour details, it only had a minor influence on inference speed in our experiment. We see the refinement as an important complementary module of the CPN framework.

In terms of inference speed, CPN-R50-FPN outperforms all other tested methods when applied with normal single-precision (float32). With FPS it is even suitable for online processing tasks, especially as it produces ready-to-use object instance descriptions, not requiring additional post-processing steps like connected component labeling. The experiments also showed that local refinement adds little time overhead, in the case of R50-FPN based CPN four refinement iterations increased pixel-precision while costing less than half a frame per second. For automatic mixed precision CPN-U22 with strided heads showed fasted inference speed among all CPNs with FPS.

Since the only assumption of the proposed approach are closed object contours, it is applicable to a wide range of detection problems, also outside the biomedical domain that have not been investigated in the present work.

An implementation of the model architecture in PyTorch is available at https://github.com/FZJ-INM1-BDA/celldetection.

Acknowledgments

This project received funding from the European Union’s Horizon 2020 Research and Innovation Programme, grant agreement 945539 (HBP SGA3), and Priority Program 2041 (SPP 2041) ”Computational Connectomics” of the German Research Foundation (DFG). Computing time was granted through JARA-HPC on the supercomputer JURECA at Juelich Supercomputing Centre (JSC) as part of the project CJINM14.

References

  • A. Bochkovskiy, C. Wang, and H. M. Liao (2020) YOLOv4: optimal speed and accuracy of object detection. External Links: 2004.10934 Cited by: §1.2.
  • J. C. Caicedo, J. Roth, A. Goodman, T. Becker, K. W. Karhohs, M. Broisin, C. Molnar, C. McQuin, S. Singh, F. J. Theis, and et al. (2019)

    Evaluation of deep learning strategies for nucleus segmentation in fluorescence images

    .
    Cytometry Part A 95 (9), pp. 952–965. External Links: ISSN 1552-4922, 1552-4930, Document Cited by: §1.2, item 1.
  • H. Chen, X. Qi, L. Yu, and P. Heng (2016) DCAN: deep contour-aware networks for accurate gland segmentation. External Links: 1604.02677 Cited by: §1.2.
  • N. Dietler, M. Minder, V. Gligorovski, A. M. Economou, D. A. H. L. Joly, A. Sadeghi, C. H. M. Chan, M. Koziński, M. Weigert, A. Bitbol, and S. J. Rahi (2020)

    A convolutional neural network segments yeast microscopy images with high accuracy

    .
    Nature Communications 11 (1), pp. 5723 (en). External Links: ISSN 2041-1723, Link, Document Cited by: §1.2.
  • F. A. Guerrero-Pena, P. D. Marrero Fernandez, T. Ing Ren, M. Yui, E. Rothenberg, and A. Cunha (2018) Multiclass weighted loss for instance segmentation of cluttered cells. 2018 25th IEEE International Conference on Image Processing (ICIP). External Links: ISBN 9781479970612, Link, Document Cited by: §1.2.
  • S. Gur, T. Shaharabany, and L. Wolf (2019) End to end trainable active contours via differentiable rendering. arXiv:1912.00367 [cs]. Note: arXiv: 1912.00367 External Links: Link Cited by: §1.2, §1.3.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    ,
    Vol. , pp. 770–778. External Links: Document, ISSN 1063-6919 Cited by: item 1.
  • K. He, G. Gkioxari, P. Dollar, and R. Girshick (2017) Mask r-cnn. In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988. External Links: ISBN 978-1-5386-1032-9, Link, Document Cited by: §1.2, §2.4, item 2.
  • S. Jetley, M. Sapienza, S. Golodetz, and P. H. S. Torr (2017) Straight to shapes: real-time detection of encoded shapes. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4207–4216. External Links: ISBN 978-1-5386-0457-1, Link, Document Cited by: §1.2, §1.2, §1.3.
  • F. P. Kuhl and C. R. Giardina (1982) Elliptic fourier features of a closed contour. Computer Graphics and Image Processing 18, pp. 236–258. Cited by: §1.3, §2.2, §3.3.
  • T. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie (2017) Feature Pyramid Networks for Object Detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp. 936–944. External Links: ISBN 978-1-5386-0457-1, Link, Document Cited by: item 1.
  • T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar (2017) Focal loss for dense object detection. In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007. External Links: ISBN 978-1-5386-1032-9, Link, Document Cited by: §1.2, §2.4.
  • T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick (2014) Microsoft coco: common objects in context. In Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars (Eds.), Cham, pp. 740–755. External Links: ISBN 978-3-319-10602-1 Cited by: §1.2.
  • W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg (2016) SSD: single shot multibox detector. In Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling (Eds.), Lecture Notes in Computer Science, Vol. 9905, pp. 21–37. External Links: ISBN 978-3-319-46447-3, Link, Document Cited by: §1.2.
  • V. Ljosa, K. L. Sokolnicki, and A. E. Carpenter (2012) Annotated high-throughput microscopy image sets for validation. Nature Methods 9 (7), pp. 637–637 (en). External Links: ISSN 1548-7091, 1548-7105, Link, Document Cited by: §3.1, §3.1.
  • B. Merker (1983) Silver staining of cell bodies by means of physical development. Journal of Neuroscience Methods 9 (3), pp. 235–241 (en). External Links: ISSN 0165-0270, Document Cited by: §3.1.
  • L. Miksys, S. Jetley, M. Sapienza, S. Golodetz, and P. H. S. Torr (2019) Straight to shapes++: real-time instance segmentation made more accurate. External Links: 1905.11358 Cited by: §1.2.
  • J. Redmon, S. Divvala, R. Girshick, and A. Farhadi (2016) You only look once: unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788. External Links: ISBN 978-1-4673-8851-1, Link, Document Cited by: §1.2, §2.4.
  • J. Redmon and A. Farhadi (2018) YOLOv3: an incremental improvement. External Links: 1804.02767 Cited by: §1.2.
  • S. Ren, K. He, R. Girshick, and J. Sun (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (6), pp. 1137–1149. External Links: ISSN 0162-8828, 2160-9292, Document Cited by: §1.2.
  • O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi (Eds.), Lecture Notes in Computer Science, Vol. 9351, pp. 234–241. External Links: ISBN 978-3-319-24573-7, Link, Document Cited by: §1.2, §1.2, item 1.
  • S. Sabour, N. Frosst, and G. E. Hinton (2017) Dynamic Routing Between Capsules. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, USA, pp. 3859–3869. External Links: ISBN 978-1-5108-6096-4 Cited by: §4.
  • U. Schmidt, M. Weigert, C. Broaddus, and G. Myers (2018) Cell detection with star-convex polygons. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2018, A. F. Frangi, J. A. Schnabel, C. Davatzikos, C. Alberola-López, and G. Fichtinger (Eds.), Lecture Notes in Computer Science, Vol. 11071, pp. 265–273. External Links: ISBN 978-3-030-00933-5, Link, Document Cited by: §1.2, §3.2.
  • K. Stacke, G. Eilertsen, J. Unger, and C. Lundström (2019) A closer look at domain shift for deep learning in histopathology. CoRR abs/1909.11575. External Links: Link, 1909.11575 Cited by: §1.1.
  • K. Thierbach, P. Bazin, W. d. Back, F. Gavriilidis, E. Kirilina, C. Jäger, M. Morawski, S. Geyer, N. Weiskopf, and N. Scherf (2018) Combining deep learning and active contours opens the way to robust, automated analysis of brain cytoarchitectonics. In Machine Learning in Medical Imaging, Y. Shi, H. Suk, and M. Liu (Eds.), Lecture Notes in Computer Science, Vol. 11046, pp. 179–187. External Links: ISBN 978-3-030-00918-2, Link, Document Cited by: §1.2.
  • E. Xie, P. Sun, X. Song, W. Wang, X. Liu, D. Liang, C. Shen, and P. Luo (2020) PolarMask: single shot instance segmentation with polar representation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12190–12199. External Links: ISBN 978-1-72817-168-5, Link, Document Cited by: §1.2.
  • Y. Yagi (2011) Color standardization and optimization in Whole Slide Imaging. Diagnostic Pathology 6 (S1), pp. S15 (en). External Links: ISSN 1746-1596, Link, Document Cited by: §1.1.
  • L. Yang, R. P. Ghosh, J. M. Franklin, S. Chen, C. You, R. R. Narayan, M. L. Melcher, and J. T. Liphardt (2020) NuSeT: A deep learning tool for reliably separating and analyzing crowded cells. PLOS Computational Biology 16 (9), pp. e1008193 (en). External Links: ISSN 1553-7358, Document Cited by: §1.2.
  • L. Zabawa, A. Kicherer, L. Klingbeil, R. Töpfer, H. Kuhlmann, and R. Roscher (2020) Counting of grapevine berries in images via semantic segmentation using convolutional neural networks. ISPRS Journal of Photogrammetry and Remote Sensing 164, pp. 73–83. External Links: ISSN 0924-2716, Link, Document Cited by: §1.2.
  • L. Zhang, M. Bai, R. Liao, R. Urtasun, D. Marcos, D. Tuia, and B. Kellenberger (2018) Learning deep structured active contours end-to-end. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8877–8885. External Links: ISBN 978-1-5386-6420-9, Link, Document Cited by: §1.2.
Table 2: Cross-dataset evaluation of object detection performance. We report F1 scores for models trained on BBBC039 dataset and tested on BBBC041 dataset. Results are based on bounding boxes using same metrics as Tab. 1. The pretrained U-Net was trained as part of CPN-U22.

4 Discussion and conclusion

We proposed the Contour Proposal Network (CPN), a framework for segmenting object instances by proposing contours which are encoded as interpretable, fixed-sized representations based on Fourier Descriptors. CPN models can be constructed with different backbone CNN architectures to produce image features. We assessed the performance of four different CPN variants, employing both U-Net and ResNet-FPN backbones, against a standard U-Net and Mask R-CNN as baseline. All U-Net based CPNs outperformed the U-Net counterpart in terms of F1 instance segmentation performance on all three tested datasets, both with and without local refinement. Given that CPN and U-Net share the same backbone architecture U22, the results indicate that the CPN provides a more effective problem description. This is also supported by the comparison of the CPN and Mask R-CNN in our experiments. For both tested backbone architectures, the CPNs show consistently higher F1 than Mask R-CNN-R50-FPN.

The CPN models employ a highly entangled representation of object shapes and sparse detections: The networks are effectively forced to concentrate the description of a complete object into a single pixel by anchoring the boundary representation to a specific coordinate. This requires them to form an intrinsic spatial relationship between whole objects and their parts, which encourages compact and robust representations with good generalization properties. This principle shares some commonalities with Capsule Networks (Sabour et al., 2017), which also aim to condense instances of objects or object parts into vector representations coupled with a detection score. The effects can be observed when looking at examples as depicted in Fig. 4: If boundaries are invisible or poorly defined, CPN models exploit the learned knowledge of boundary shapes to find highly plausible separations (e.g. Fig. 4 e, g, h). Very small and touching objects, which are often overseen by pixel-based methods, are well detected (e.g. Fig. 4 a, b). Separation of clusters of kissing objects is typically modelled quite accurately, reproducing gaps between touching shapes very consistently (e.g. Fig. 4 c, d). Thin structures, such as dendrites, can be modeled accurately (e.g. Fig. 4f). Discontinuities that may occur with pixel-based methods can be avoided, as proposed contours are continuous and closed by design.

While leading to accurate object representations, experiments on cross-dataset generalization showed that the learned shape priors are not overly restrictive and transfer well to different data distributions. Even more, the CPN models are able to produce plausible contours for previously unseen objects as long as their basic morphology is consistent with the training examples. In particular the F margin between CPN-U22 and U-Net of suggests that the better performing CPN formed a more universal intrinsic understanding of what an instance is. In this context, we also observed that the CPN and its objective have a positive influence on the backbone CNNs to produce a feature space with good generalization properties - a pixel-classifying U-Net showed significantly better performance on our cross-dataset evaluation when its encoder and decoder were trained as a CPN backbone.

By modeling the contour representation in the frequency domain, CPNs can bypass several sampling problems occurring in previous works, like selecting the optimal sampling rate in pixel space. Instead, by setting the order of the Fourier series, the user can specify different levels of contour complexity in a natural way. Furthermore, the representation allows to generate arbitrary output resolutions without compromising detection accuracy.

The local refinement step, which is an integrated and fully trainable part of the CPN, supports contour proposals to achieve high pixel precision despite the regularization imposed by the shape model. We measured a notable increase in performance for high IoU thresholds when applying refinement, indicating that high contour frequencies can be modeled efficiently using a residual field. While the refinement can improve contour details, it only had a minor influence on inference speed in our experiment. We see the refinement as an important complementary module of the CPN framework.

In terms of inference speed, CPN-R50-FPN outperforms all other tested methods when applied with normal single-precision (float32). With FPS it is even suitable for online processing tasks, especially as it produces ready-to-use object instance descriptions, not requiring additional post-processing steps like connected component labeling. The experiments also showed that local refinement adds little time overhead, in the case of R50-FPN based CPN four refinement iterations increased pixel-precision while costing less than half a frame per second. For automatic mixed precision CPN-U22 with strided heads showed fasted inference speed among all CPNs with FPS.

Since the only assumption of the proposed approach are closed object contours, it is applicable to a wide range of detection problems, also outside the biomedical domain that have not been investigated in the present work.

An implementation of the model architecture in PyTorch is available at https://github.com/FZJ-INM1-BDA/celldetection.

Acknowledgments

This project received funding from the European Union’s Horizon 2020 Research and Innovation Programme, grant agreement 945539 (HBP SGA3), and Priority Program 2041 (SPP 2041) ”Computational Connectomics” of the German Research Foundation (DFG). Computing time was granted through JARA-HPC on the supercomputer JURECA at Juelich Supercomputing Centre (JSC) as part of the project CJINM14.

References

  • A. Bochkovskiy, C. Wang, and H. M. Liao (2020) YOLOv4: optimal speed and accuracy of object detection. External Links: 2004.10934 Cited by: §1.2.
  • J. C. Caicedo, J. Roth, A. Goodman, T. Becker, K. W. Karhohs, M. Broisin, C. Molnar, C. McQuin, S. Singh, F. J. Theis, and et al. (2019)

    Evaluation of deep learning strategies for nucleus segmentation in fluorescence images

    .
    Cytometry Part A 95 (9), pp. 952–965. External Links: ISSN 1552-4922, 1552-4930, Document Cited by: §1.2, item 1.
  • H. Chen, X. Qi, L. Yu, and P. Heng (2016) DCAN: deep contour-aware networks for accurate gland segmentation. External Links: 1604.02677 Cited by: §1.2.
  • N. Dietler, M. Minder, V. Gligorovski, A. M. Economou, D. A. H. L. Joly, A. Sadeghi, C. H. M. Chan, M. Koziński, M. Weigert, A. Bitbol, and S. J. Rahi (2020)

    A convolutional neural network segments yeast microscopy images with high accuracy

    .
    Nature Communications 11 (1), pp. 5723 (en). External Links: ISSN 2041-1723, Link, Document Cited by: §1.2.
  • F. A. Guerrero-Pena, P. D. Marrero Fernandez, T. Ing Ren, M. Yui, E. Rothenberg, and A. Cunha (2018) Multiclass weighted loss for instance segmentation of cluttered cells. 2018 25th IEEE International Conference on Image Processing (ICIP). External Links: ISBN 9781479970612, Link, Document Cited by: §1.2.
  • S. Gur, T. Shaharabany, and L. Wolf (2019) End to end trainable active contours via differentiable rendering. arXiv:1912.00367 [cs]. Note: arXiv: 1912.00367 External Links: Link Cited by: §1.2, §1.3.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    ,
    Vol. , pp. 770–778. External Links: Document, ISSN 1063-6919 Cited by: item 1.
  • K. He, G. Gkioxari, P. Dollar, and R. Girshick (2017) Mask r-cnn. In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988. External Links: ISBN 978-1-5386-1032-9, Link, Document Cited by: §1.2, §2.4, item 2.
  • S. Jetley, M. Sapienza, S. Golodetz, and P. H. S. Torr (2017) Straight to shapes: real-time detection of encoded shapes. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4207–4216. External Links: ISBN 978-1-5386-0457-1, Link, Document Cited by: §1.2, §1.2, §1.3.
  • F. P. Kuhl and C. R. Giardina (1982) Elliptic fourier features of a closed contour. Computer Graphics and Image Processing 18, pp. 236–258. Cited by: §1.3, §2.2, §3.3.
  • T. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie (2017) Feature Pyramid Networks for Object Detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp. 936–944. External Links: ISBN 978-1-5386-0457-1, Link, Document Cited by: item 1.
  • T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar (2017) Focal loss for dense object detection. In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007. External Links: ISBN 978-1-5386-1032-9, Link, Document Cited by: §1.2, §2.4.
  • T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick (2014) Microsoft coco: common objects in context. In Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars (Eds.), Cham, pp. 740–755. External Links: ISBN 978-3-319-10602-1 Cited by: §1.2.
  • W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg (2016) SSD: single shot multibox detector. In Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling (Eds.), Lecture Notes in Computer Science, Vol. 9905, pp. 21–37. External Links: ISBN 978-3-319-46447-3, Link, Document Cited by: §1.2.
  • V. Ljosa, K. L. Sokolnicki, and A. E. Carpenter (2012) Annotated high-throughput microscopy image sets for validation. Nature Methods 9 (7), pp. 637–637 (en). External Links: ISSN 1548-7091, 1548-7105, Link, Document Cited by: §3.1, §3.1.
  • B. Merker (1983) Silver staining of cell bodies by means of physical development. Journal of Neuroscience Methods 9 (3), pp. 235–241 (en). External Links: ISSN 0165-0270, Document Cited by: §3.1.
  • L. Miksys, S. Jetley, M. Sapienza, S. Golodetz, and P. H. S. Torr (2019) Straight to shapes++: real-time instance segmentation made more accurate. External Links: 1905.11358 Cited by: §1.2.
  • J. Redmon, S. Divvala, R. Girshick, and A. Farhadi (2016) You only look once: unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788. External Links: ISBN 978-1-4673-8851-1, Link, Document Cited by: §1.2, §2.4.
  • J. Redmon and A. Farhadi (2018) YOLOv3: an incremental improvement. External Links: 1804.02767 Cited by: §1.2.
  • S. Ren, K. He, R. Girshick, and J. Sun (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (6), pp. 1137–1149. External Links: ISSN 0162-8828, 2160-9292, Document Cited by: §1.2.
  • O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi (Eds.), Lecture Notes in Computer Science, Vol. 9351, pp. 234–241. External Links: ISBN 978-3-319-24573-7, Link, Document Cited by: §1.2, §1.2, item 1.
  • S. Sabour, N. Frosst, and G. E. Hinton (2017) Dynamic Routing Between Capsules. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, USA, pp. 3859–3869. External Links: ISBN 978-1-5108-6096-4 Cited by: §4.
  • U. Schmidt, M. Weigert, C. Broaddus, and G. Myers (2018) Cell detection with star-convex polygons. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2018, A. F. Frangi, J. A. Schnabel, C. Davatzikos, C. Alberola-López, and G. Fichtinger (Eds.), Lecture Notes in Computer Science, Vol. 11071, pp. 265–273. External Links: ISBN 978-3-030-00933-5, Link, Document Cited by: §1.2, §3.2.
  • K. Stacke, G. Eilertsen, J. Unger, and C. Lundström (2019) A closer look at domain shift for deep learning in histopathology. CoRR abs/1909.11575. External Links: Link, 1909.11575 Cited by: §1.1.
  • K. Thierbach, P. Bazin, W. d. Back, F. Gavriilidis, E. Kirilina, C. Jäger, M. Morawski, S. Geyer, N. Weiskopf, and N. Scherf (2018) Combining deep learning and active contours opens the way to robust, automated analysis of brain cytoarchitectonics. In Machine Learning in Medical Imaging, Y. Shi, H. Suk, and M. Liu (Eds.), Lecture Notes in Computer Science, Vol. 11046, pp. 179–187. External Links: ISBN 978-3-030-00918-2, Link, Document Cited by: §1.2.
  • E. Xie, P. Sun, X. Song, W. Wang, X. Liu, D. Liang, C. Shen, and P. Luo (2020) PolarMask: single shot instance segmentation with polar representation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12190–12199. External Links: ISBN 978-1-72817-168-5, Link, Document Cited by: §1.2.
  • Y. Yagi (2011) Color standardization and optimization in Whole Slide Imaging. Diagnostic Pathology 6 (S1), pp. S15 (en). External Links: ISSN 1746-1596, Link, Document Cited by: §1.1.
  • L. Yang, R. P. Ghosh, J. M. Franklin, S. Chen, C. You, R. R. Narayan, M. L. Melcher, and J. T. Liphardt (2020) NuSeT: A deep learning tool for reliably separating and analyzing crowded cells. PLOS Computational Biology 16 (9), pp. e1008193 (en). External Links: ISSN 1553-7358, Document Cited by: §1.2.
  • L. Zabawa, A. Kicherer, L. Klingbeil, R. Töpfer, H. Kuhlmann, and R. Roscher (2020) Counting of grapevine berries in images via semantic segmentation using convolutional neural networks. ISPRS Journal of Photogrammetry and Remote Sensing 164, pp. 73–83. External Links: ISSN 0924-2716, Link, Document Cited by: §1.2.
  • L. Zhang, M. Bai, R. Liao, R. Urtasun, D. Marcos, D. Tuia, and B. Kellenberger (2018) Learning deep structured active contours end-to-end. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8877–8885. External Links: ISBN 978-1-5386-6420-9, Link, Document Cited by: §1.2.

Acknowledgments

This project received funding from the European Union’s Horizon 2020 Research and Innovation Programme, grant agreement 945539 (HBP SGA3), and Priority Program 2041 (SPP 2041) ”Computational Connectomics” of the German Research Foundation (DFG). Computing time was granted through JARA-HPC on the supercomputer JURECA at Juelich Supercomputing Centre (JSC) as part of the project CJINM14.

References

  • A. Bochkovskiy, C. Wang, and H. M. Liao (2020) YOLOv4: optimal speed and accuracy of object detection. External Links: 2004.10934 Cited by: §1.2.
  • J. C. Caicedo, J. Roth, A. Goodman, T. Becker, K. W. Karhohs, M. Broisin, C. Molnar, C. McQuin, S. Singh, F. J. Theis, and et al. (2019)

    Evaluation of deep learning strategies for nucleus segmentation in fluorescence images

    .
    Cytometry Part A 95 (9), pp. 952–965. External Links: ISSN 1552-4922, 1552-4930, Document Cited by: §1.2, item 1.
  • H. Chen, X. Qi, L. Yu, and P. Heng (2016) DCAN: deep contour-aware networks for accurate gland segmentation. External Links: 1604.02677 Cited by: §1.2.
  • N. Dietler, M. Minder, V. Gligorovski, A. M. Economou, D. A. H. L. Joly, A. Sadeghi, C. H. M. Chan, M. Koziński, M. Weigert, A. Bitbol, and S. J. Rahi (2020)

    A convolutional neural network segments yeast microscopy images with high accuracy

    .
    Nature Communications 11 (1), pp. 5723 (en). External Links: ISSN 2041-1723, Link, Document Cited by: §1.2.
  • F. A. Guerrero-Pena, P. D. Marrero Fernandez, T. Ing Ren, M. Yui, E. Rothenberg, and A. Cunha (2018) Multiclass weighted loss for instance segmentation of cluttered cells. 2018 25th IEEE International Conference on Image Processing (ICIP). External Links: ISBN 9781479970612, Link, Document Cited by: §1.2.
  • S. Gur, T. Shaharabany, and L. Wolf (2019) End to end trainable active contours via differentiable rendering. arXiv:1912.00367 [cs]. Note: arXiv: 1912.00367 External Links: Link Cited by: §1.2, §1.3.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    ,
    Vol. , pp. 770–778. External Links: Document, ISSN 1063-6919 Cited by: item 1.
  • K. He, G. Gkioxari, P. Dollar, and R. Girshick (2017) Mask r-cnn. In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988. External Links: ISBN 978-1-5386-1032-9, Link, Document Cited by: §1.2, §2.4, item 2.
  • S. Jetley, M. Sapienza, S. Golodetz, and P. H. S. Torr (2017) Straight to shapes: real-time detection of encoded shapes. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4207–4216. External Links: ISBN 978-1-5386-0457-1, Link, Document Cited by: §1.2, §1.2, §1.3.
  • F. P. Kuhl and C. R. Giardina (1982) Elliptic fourier features of a closed contour. Computer Graphics and Image Processing 18, pp. 236–258. Cited by: §1.3, §2.2, §3.3.
  • T. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie (2017) Feature Pyramid Networks for Object Detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp. 936–944. External Links: ISBN 978-1-5386-0457-1, Link, Document Cited by: item 1.
  • T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar (2017) Focal loss for dense object detection. In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007. External Links: ISBN 978-1-5386-1032-9, Link, Document Cited by: §1.2, §2.4.
  • T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick (2014) Microsoft coco: common objects in context. In Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars (Eds.), Cham, pp. 740–755. External Links: ISBN 978-3-319-10602-1 Cited by: §1.2.
  • W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg (2016) SSD: single shot multibox detector. In Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling (Eds.), Lecture Notes in Computer Science, Vol. 9905, pp. 21–37. External Links: ISBN 978-3-319-46447-3, Link, Document Cited by: §1.2.
  • V. Ljosa, K. L. Sokolnicki, and A. E. Carpenter (2012) Annotated high-throughput microscopy image sets for validation. Nature Methods 9 (7), pp. 637–637 (en). External Links: ISSN 1548-7091, 1548-7105, Link, Document Cited by: §3.1, §3.1.
  • B. Merker (1983) Silver staining of cell bodies by means of physical development. Journal of Neuroscience Methods 9 (3), pp. 235–241 (en). External Links: ISSN 0165-0270, Document Cited by: §3.1.
  • L. Miksys, S. Jetley, M. Sapienza, S. Golodetz, and P. H. S. Torr (2019) Straight to shapes++: real-time instance segmentation made more accurate. External Links: 1905.11358 Cited by: §1.2.
  • J. Redmon, S. Divvala, R. Girshick, and A. Farhadi (2016) You only look once: unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788. External Links: ISBN 978-1-4673-8851-1, Link, Document Cited by: §1.2, §2.4.
  • J. Redmon and A. Farhadi (2018) YOLOv3: an incremental improvement. External Links: 1804.02767 Cited by: §1.2.
  • S. Ren, K. He, R. Girshick, and J. Sun (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (6), pp. 1137–1149. External Links: ISSN 0162-8828, 2160-9292, Document Cited by: §1.2.
  • O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi (Eds.), Lecture Notes in Computer Science, Vol. 9351, pp. 234–241. External Links: ISBN 978-3-319-24573-7, Link, Document Cited by: §1.2, §1.2, item 1.
  • S. Sabour, N. Frosst, and G. E. Hinton (2017) Dynamic Routing Between Capsules. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, USA, pp. 3859–3869. External Links: ISBN 978-1-5108-6096-4 Cited by: §4.
  • U. Schmidt, M. Weigert, C. Broaddus, and G. Myers (2018) Cell detection with star-convex polygons. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2018, A. F. Frangi, J. A. Schnabel, C. Davatzikos, C. Alberola-López, and G. Fichtinger (Eds.), Lecture Notes in Computer Science, Vol. 11071, pp. 265–273. External Links: ISBN 978-3-030-00933-5, Link, Document Cited by: §1.2, §3.2.
  • K. Stacke, G. Eilertsen, J. Unger, and C. Lundström (2019) A closer look at domain shift for deep learning in histopathology. CoRR abs/1909.11575. External Links: Link, 1909.11575 Cited by: §1.1.
  • K. Thierbach, P. Bazin, W. d. Back, F. Gavriilidis, E. Kirilina, C. Jäger, M. Morawski, S. Geyer, N. Weiskopf, and N. Scherf (2018) Combining deep learning and active contours opens the way to robust, automated analysis of brain cytoarchitectonics. In Machine Learning in Medical Imaging, Y. Shi, H. Suk, and M. Liu (Eds.), Lecture Notes in Computer Science, Vol. 11046, pp. 179–187. External Links: ISBN 978-3-030-00918-2, Link, Document Cited by: §1.2.
  • E. Xie, P. Sun, X. Song, W. Wang, X. Liu, D. Liang, C. Shen, and P. Luo (2020) PolarMask: single shot instance segmentation with polar representation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12190–12199. External Links: ISBN 978-1-72817-168-5, Link, Document Cited by: §1.2.
  • Y. Yagi (2011) Color standardization and optimization in Whole Slide Imaging. Diagnostic Pathology 6 (S1), pp. S15 (en). External Links: ISSN 1746-1596, Link, Document Cited by: §1.1.
  • L. Yang, R. P. Ghosh, J. M. Franklin, S. Chen, C. You, R. R. Narayan, M. L. Melcher, and J. T. Liphardt (2020) NuSeT: A deep learning tool for reliably separating and analyzing crowded cells. PLOS Computational Biology 16 (9), pp. e1008193 (en). External Links: ISSN 1553-7358, Document Cited by: §1.2.
  • L. Zabawa, A. Kicherer, L. Klingbeil, R. Töpfer, H. Kuhlmann, and R. Roscher (2020) Counting of grapevine berries in images via semantic segmentation using convolutional neural networks. ISPRS Journal of Photogrammetry and Remote Sensing 164, pp. 73–83. External Links: ISSN 0924-2716, Link, Document Cited by: §1.2.
  • L. Zhang, M. Bai, R. Liao, R. Urtasun, D. Marcos, D. Tuia, and B. Kellenberger (2018) Learning deep structured active contours end-to-end. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8877–8885. External Links: ISBN 978-1-5386-6420-9, Link, Document Cited by: §1.2.

References

  • A. Bochkovskiy, C. Wang, and H. M. Liao (2020) YOLOv4: optimal speed and accuracy of object detection. External Links: 2004.10934 Cited by: §1.2.
  • J. C. Caicedo, J. Roth, A. Goodman, T. Becker, K. W. Karhohs, M. Broisin, C. Molnar, C. McQuin, S. Singh, F. J. Theis, and et al. (2019)

    Evaluation of deep learning strategies for nucleus segmentation in fluorescence images

    .
    Cytometry Part A 95 (9), pp. 952–965. External Links: ISSN 1552-4922, 1552-4930, Document Cited by: §1.2, item 1.
  • H. Chen, X. Qi, L. Yu, and P. Heng (2016) DCAN: deep contour-aware networks for accurate gland segmentation. External Links: 1604.02677 Cited by: §1.2.
  • N. Dietler, M. Minder, V. Gligorovski, A. M. Economou, D. A. H. L. Joly, A. Sadeghi, C. H. M. Chan, M. Koziński, M. Weigert, A. Bitbol, and S. J. Rahi (2020)

    A convolutional neural network segments yeast microscopy images with high accuracy

    .
    Nature Communications 11 (1), pp. 5723 (en). External Links: ISSN 2041-1723, Link, Document Cited by: §1.2.
  • F. A. Guerrero-Pena, P. D. Marrero Fernandez, T. Ing Ren, M. Yui, E. Rothenberg, and A. Cunha (2018) Multiclass weighted loss for instance segmentation of cluttered cells. 2018 25th IEEE International Conference on Image Processing (ICIP). External Links: ISBN 9781479970612, Link, Document Cited by: §1.2.
  • S. Gur, T. Shaharabany, and L. Wolf (2019) End to end trainable active contours via differentiable rendering. arXiv:1912.00367 [cs]. Note: arXiv: 1912.00367 External Links: Link Cited by: §1.2, §1.3.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    ,
    Vol. , pp. 770–778. External Links: Document, ISSN 1063-6919 Cited by: item 1.
  • K. He, G. Gkioxari, P. Dollar, and R. Girshick (2017) Mask r-cnn. In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988. External Links: ISBN 978-1-5386-1032-9, Link, Document Cited by: §1.2, §2.4, item 2.
  • S. Jetley, M. Sapienza, S. Golodetz, and P. H. S. Torr (2017) Straight to shapes: real-time detection of encoded shapes. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4207–4216. External Links: ISBN 978-1-5386-0457-1, Link, Document Cited by: §1.2, §1.2, §1.3.
  • F. P. Kuhl and C. R. Giardina (1982) Elliptic fourier features of a closed contour. Computer Graphics and Image Processing 18, pp. 236–258. Cited by: §1.3, §2.2, §3.3.
  • T. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie (2017) Feature Pyramid Networks for Object Detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp. 936–944. External Links: ISBN 978-1-5386-0457-1, Link, Document Cited by: item 1.
  • T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar (2017) Focal loss for dense object detection. In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007. External Links: ISBN 978-1-5386-1032-9, Link, Document Cited by: §1.2, §2.4.
  • T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick (2014) Microsoft coco: common objects in context. In Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars (Eds.), Cham, pp. 740–755. External Links: ISBN 978-3-319-10602-1 Cited by: §1.2.
  • W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg (2016) SSD: single shot multibox detector. In Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling (Eds.), Lecture Notes in Computer Science, Vol. 9905, pp. 21–37. External Links: ISBN 978-3-319-46447-3, Link, Document Cited by: §1.2.
  • V. Ljosa, K. L. Sokolnicki, and A. E. Carpenter (2012) Annotated high-throughput microscopy image sets for validation. Nature Methods 9 (7), pp. 637–637 (en). External Links: ISSN 1548-7091, 1548-7105, Link, Document Cited by: §3.1, §3.1.
  • B. Merker (1983) Silver staining of cell bodies by means of physical development. Journal of Neuroscience Methods 9 (3), pp. 235–241 (en). External Links: ISSN 0165-0270, Document Cited by: §3.1.
  • L. Miksys, S. Jetley, M. Sapienza, S. Golodetz, and P. H. S. Torr (2019) Straight to shapes++: real-time instance segmentation made more accurate. External Links: 1905.11358 Cited by: §1.2.
  • J. Redmon, S. Divvala, R. Girshick, and A. Farhadi (2016) You only look once: unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788. External Links: ISBN 978-1-4673-8851-1, Link, Document Cited by: §1.2, §2.4.
  • J. Redmon and A. Farhadi (2018) YOLOv3: an incremental improvement. External Links: 1804.02767 Cited by: §1.2.
  • S. Ren, K. He, R. Girshick, and J. Sun (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (6), pp. 1137–1149. External Links: ISSN 0162-8828, 2160-9292, Document Cited by: §1.2.
  • O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi (Eds.), Lecture Notes in Computer Science, Vol. 9351, pp. 234–241. External Links: ISBN 978-3-319-24573-7, Link, Document Cited by: §1.2, §1.2, item 1.
  • S. Sabour, N. Frosst, and G. E. Hinton (2017) Dynamic Routing Between Capsules. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, USA, pp. 3859–3869. External Links: ISBN 978-1-5108-6096-4 Cited by: §4.
  • U. Schmidt, M. Weigert, C. Broaddus, and G. Myers (2018) Cell detection with star-convex polygons. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2018, A. F. Frangi, J. A. Schnabel, C. Davatzikos, C. Alberola-López, and G. Fichtinger (Eds.), Lecture Notes in Computer Science, Vol. 11071, pp. 265–273. External Links: ISBN 978-3-030-00933-5, Link, Document Cited by: §1.2, §3.2.
  • K. Stacke, G. Eilertsen, J. Unger, and C. Lundström (2019) A closer look at domain shift for deep learning in histopathology. CoRR abs/1909.11575. External Links: Link, 1909.11575 Cited by: §1.1.
  • K. Thierbach, P. Bazin, W. d. Back, F. Gavriilidis, E. Kirilina, C. Jäger, M. Morawski, S. Geyer, N. Weiskopf, and N. Scherf (2018) Combining deep learning and active contours opens the way to robust, automated analysis of brain cytoarchitectonics. In Machine Learning in Medical Imaging, Y. Shi, H. Suk, and M. Liu (Eds.), Lecture Notes in Computer Science, Vol. 11046, pp. 179–187. External Links: ISBN 978-3-030-00918-2, Link, Document Cited by: §1.2.
  • E. Xie, P. Sun, X. Song, W. Wang, X. Liu, D. Liang, C. Shen, and P. Luo (2020) PolarMask: single shot instance segmentation with polar representation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12190–12199. External Links: ISBN 978-1-72817-168-5, Link, Document Cited by: §1.2.
  • Y. Yagi (2011) Color standardization and optimization in Whole Slide Imaging. Diagnostic Pathology 6 (S1), pp. S15 (en). External Links: ISSN 1746-1596, Link, Document Cited by: §1.1.
  • L. Yang, R. P. Ghosh, J. M. Franklin, S. Chen, C. You, R. R. Narayan, M. L. Melcher, and J. T. Liphardt (2020) NuSeT: A deep learning tool for reliably separating and analyzing crowded cells. PLOS Computational Biology 16 (9), pp. e1008193 (en). External Links: ISSN 1553-7358, Document Cited by: §1.2.
  • L. Zabawa, A. Kicherer, L. Klingbeil, R. Töpfer, H. Kuhlmann, and R. Roscher (2020) Counting of grapevine berries in images via semantic segmentation using convolutional neural networks. ISPRS Journal of Photogrammetry and Remote Sensing 164, pp. 73–83. External Links: ISSN 0924-2716, Link, Document Cited by: §1.2.
  • L. Zhang, M. Bai, R. Liao, R. Urtasun, D. Marcos, D. Tuia, and B. Kellenberger (2018) Learning deep structured active contours end-to-end. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8877–8885. External Links: ISBN 978-1-5386-6420-9, Link, Document Cited by: §1.2.