The use of superpixels has become a very popular technique for many computer vision and image processing applications such as: object localization (Fulkerson et al., 2009), contour detection (Arbelaez et al., 2011), face labeling (Kae et al., 2013), data associations across views (Sawhney et al., 2014), or multi-class object segmentation (Giraud et al., 2017b; Gould et al., 2008, 2014; Tighe and Lazebnik, 2010; Yang et al., 2010). Superpixel decomposition methods group pixels into homogeneous regions, providing a low-level representation that tries to respect the image contours. For image segmentation, where the goal is to split the image into similar regions according to object, color or texture priors, the decomposition into superpixels may improve the segmentation accuracy and decrease the computational burden. (Gould et al., 2014). Contrary to multi-resolution approaches, that decrease the image size, superpixels preserve the image geometry, since their boundaries follow the image contours. Hence, the results obtained at the superpixel level may be closer to the ground truth result at the pixel level.
Many superpixel methods have been proposed using various techniques. Although the definition of an optimal decomposition depends on the tackled application, most methods tend to achieve the following properties. First, the boundaries of the decomposition should adhere to the image contours, and superpixels should not overlap with multiple objects. Second, the superpixel clustering must group pixels with homogeneous colors. Third, the superpixels should have compact shapes and consistent sizes. The shape regularity helps to visually analyze the image decomposition and has been proven to impact application performances (Reso et al., 2013; Veksler et al., 2010; Strassburg et al., 2015). Finally, since superpixels are usually used as a pre-processing step, the decomposition should be obtained in limited computational time and allow the control of the number of produced elements.
To achieve the aforementioned properties, most state-of-the-art methods compute a trade-off between color homogeneity and shape regularity of the superpixels. Nevertheless, some approaches less consider the regularity property and can produce superpixels of highly irregular shapes and sizes. In the following, we present an overview of the most popular superpixel methods, defined as either irregular or regular ones. Note that although some methods can include terms into their models to generate for instance more regular results, e.g., Van den Bergh et al. (2012), we here consider methods in their default settings, as described by the authors.
The regularity criteria can be seen as the behavior to frequently produce irregular regions, in terms of both shapes and sizes (Giraud et al., 2017c). Methods such as Felzenszwalb and Huttenlocher (2004); Vedaldi and Soatto (2008) generate very irregular regions in terms of both size and shape while SLIC can generate a few irregular shapes but their sizes are constrained into a fixed size window.
Irregular Superpixel Methods
With irregular methods, superpixels can have very different sizes and stretched shapes. For instance, small superpixels can be produced, without enough pixels to compute a significant descriptor. Too large superpixels may also overlap with several objects contained in the image. First segmentation methods, such as the watershed approach, e.g., Vincent and Soille (1991), compute decompositions of highly irregular size and shape. Methods such as Mean shift (Comaniciu and Meer, 2002) or Quick shift (Vedaldi and Soatto, 2008) consider an initial decomposition and perform a histogram-based segmentation. However, they are very sensitive to parameters and are obtained with high computational cost (Vedaldi and Soatto, 2008). Another approach considers pixels as nodes of a graph to perform a faster agglomerative clustering (Felzenszwalb and Huttenlocher, 2004). These methods present an important drawback: they do not allow to directly control the number of superpixels. This is particularly an issue when superpixels are used as a low-level representation to reduce the computational time.
The SEEDS method (Van den Bergh et al., 2012) proposes a coarse-to-fine approach starting from a regular grid. However, this method may provide superpixels with irregular shapes. Although a compactness constraint can be set to compute regular superpixels, the authors report degraded results of decomposition accuracy with such approach.
Regular Superpixel Methods
For superpixel-based object recognition methods, e.g., Gould et al. (2008, 2014), or video tracking, e.g., Reso et al. (2013); Wang et al. (2011), the use of regular decompositions is mandatory, i.e., decompositions with superpixels having approximately the same size and compact shapes. For instance, for superpixel-based video tracking applications, the tracking of object trajectories within a scene is improved with consistent decompositions over time (Chang et al., 2013; Reso et al., 2013).
Most of the regular methods consider an initial regular grid, allowing to set the number of superpixels, and update superpixels boundaries while applying spatial constraints. Classical methods are based on region growing, such as Turbopixels (Levinshtein et al., 2009) using geometric flows, or eikonal-based methods, e.g., ERGC (Buyssens et al., 2014), while other approaches use graph-based energy models (Liu et al., 2011; Veksler et al., 2010). In Machairas et al. (2015), a watershed algorithm is adapted to produce regular decompositions using a spatially regularized image gradient. Similarly to SEEDS (Van den Bergh et al., 2012), a coarse-to-fine approach has recently been proposed in Yao et al. (2015), producing highly regular superpixels.
The SLIC method (Achanta et al., 2012) performs an iterative accurate clustering, while providing regular superpixels, in order of magnitude faster than graph-based approaches (Liu et al., 2011; Veksler et al., 2010). The SLIC method has been extended in several recent works, e.g., Chen et al. (2017); Huang et al. (2016); Rubio et al. (2016); Zhang et al. (2016); Zhang and Zhang (2017). However, it can fail to adhere to image contours, as for other regular methods, e.g., Levinshtein et al. (2009); Yao et al. (2015), since it is based on simple local color features and globally enforces the decomposition regularity using a fixed trade-off between color and spatial distances.
In the literature, several works have attempted to improve the decomposition performances in terms of contour adherence by using gradient or contour prior information. In Mori et al. (2004), a contour detection algorithm is used to compute a pre-segmentation using the normalized cuts algorithm (Shi and Malik, 2000). The segmentation may accurately guide the superpixel decomposition, but such approaches based on normalized cuts are computationally expensive (Mori et al., 2004). Moreover, the contour adherence of the produced decompositions are far from state-of-the-art results (Achanta et al., 2012). In Moore et al. (2008), the superpixel decomposition is constrained to fit to a grid, also called superpixel lattice. The decomposition is then refined using graph cuts. However, this method is very dependent on the used contour prior. Moreover, although the superpixels have approximately the same sizes, they have quite irregular shapes and may appear visually unsatisfactory.
In Machairas et al. (2015)
, the image gradient information is used to constrain the superpixel boundaries, but the results on superpixel evaluation metrics are lower than the ones of SLIC(Achanta et al., 2012). In Zhang et al. (2016), the local gradient information is considered to improve the superpixel boundaries evolution. However, the computational cost of the method is increased by a order of magnitude compared to SLIC.
Segmentation from Contour Detection
Contour detection methods generally do not enforce the contour closure. To produce an image segmentation, a contour completion step is hence necessary. Many contour completion methods have been proposed (see for instance Arbelaez et al. (2011) and references therein). This step may improve the accuracy of the contour detection, since objects are generally segmented by closed curves.
, propose a hierarchical image segmentation based on contour detection. This can be considered as a probability contour map, that produces a set of closed curves for any threshold. Although such methods enable to segment an image from a contour map, they do not allow to control the size, the shape and the number of the produced regions, while most superpixel methods enable to set the number of superpixels which approximately have the same size. Moreover, the performances of the contour detection is extremely dependent on the fixed threshold parameter, which depends on the image content(Arbelaez et al., 2009). Hence, they are mainly considered as segmentation methods and cannot be considered as relevant frameworks to compute superpixel decompositions.
Robustness to Noise
Superpixel decompositions are usually used as a pre-processing step in many computer vision applications. Therefore, they tend to be applied to heterogeneous images that can suffer from noise. Moreover, image textures and high local gradients may also mislead the superpixel decomposition. Most of the state-of-the-art superpixel methods are not robust to noise, and provide degraded decompositions when applied to slightly noised images or images with low resolution. With such approaches, a denoising step is necessary to compute a relevant decomposition. For instance, the watershed approach of Machairas et al. (2015) uses a pre-filtering step to smooth local gradients according to the given size of superpixels. Nevertheless, this step is only designed to smooth local gradients of initial images and the impact of this filtering is not reported (Machairas et al., 2015).
In this paper, we propose a method that produces accurate, regular and robust Superpixels with Contour Adherence using Linear Path (SCALP)111An implementation of the proposed SCALP method is available at: www.labri.fr/~rgiraud/research/scalp.php. Our decomposition approach aims to jointly improve all superpixel properties: color homogeneity, respect of image objects and shape regularity. In Figure 1, we compare the proposed approach to state-of-the-art methods on an example result. SCALP provides a more satisfying result that respects the image contours. Moreover, contrary to most state-of-the-art methods, SCALP is robust to noise, since it provides accurate and regular decompositions on the noisy part of the image.
Most state-of-the-art methods have very degraded performances when applied to even slightly noised images (see Figure 1). We propose to consider the neighboring pixels information during the decomposition process. We show that these features can be integrated at the same computational complexity, while they improve the decomposition accuracy and the robustness to noise.
To further enforce the color homogeneity within a regular shape, we define the linear path between the pixel and the superpixel barycenter, and we consider color features along the path. Contrary to geodesic distances that can allow irregular paths leading to non convex shapes, our linear path naturally enforces the decomposition regularity. A contour prior can also be used to enforce the respect of image objects and prevent the crossing of image contours when associating a pixel to a superpixel.
We propose a framework to generate superpixels within an initial segmentation computed from a contour prior completion. The produced superpixels are regular in terms of size and shape although they are constrained by the segmentation to obtain higher contour adherence performances.
We provide an extensive evaluation of SCALP on the Berkeley segmentation dataset (BSD). Our results outperform recent state-of-the-art methods, on initial and noisy images, in terms of superpixel and contour detection metrics.
Finally, we naturally extend SCALP to supervoxel decomposition and provide results on magnetic resonance imaging (MRI) segmentation.
This paper is an extension of the work proposed in Giraud et al. (2016), with substantial new improvements such as the use in constant time of the neighboring pixels information, the use of contour prior by considering the maximum intensity on the linear path, or the extension to supervoxels. We show that these new contributions improve the decomposition performances, and by performing the clustering in a high dimensional feature space (Chen et al., 2017), SCALP substantially outperforms Giraud et al. (2016) and the recent state-of-the-art methods.
2 SCALP Framework
The SCALP framework is based on the simple linear iterative clustering framework (SLIC) (Achanta et al., 2012), and is summarized in Figure 2. In this section, we first present SLIC and then propose several improvements: a robust distance on pixel neighborhood, the use of features along the linear path to the superpixel barycenter and a framework considering an initial segmentation as constraint while producing regular superpixels.
2.1 Iterative Clustering Framework
The iterative clustering framework introduced in Achanta et al. (2012) proposes a fast iterative framework using simple color features (average in CIELab colorspace). The decomposition is initialized by a regular grid with blocks of size . This size is computed by the ratio between the number of pixels and the number of desired superpixels , such that . A color clustering is then iteratively performed into fixed windows of size pixels centered on the superpixel barycenter. The superpixel is thus constrained into this window, which limits its size. Each superpixel is described by a cluster , that contains the average CIELab color feature on pixels , , and , the spatial barycenter of such that . The iterative clustering consists, for each cluster , in testing all pixels within a pixels window centered on , by computing a spatial distance , and a color distance :
with the regularity parameter that sets the trade-off between spatial and color distances. High values of produce more regular superpixels, while small values allow better adherence to image boundaries, producing superpixels of more variable sizes and shapes. The pixel is associated to the superpixel minimizing (3).
Nevertheless, since a parameter is set to enforce the regularity in (3), SLIC can fail to both produce regular superpixels and to adhere to the image contours. In the following, we show how the decomposition accuracy can be improved with a more robust distance, by considering neighboring color features and information of pixels along the linear path to the superpixel barycenter.
2.2 Robust Distance on Pixel Neighborhood
Natural images may present high local image gradients or noise, that can highly degrade the decomposition into superpixels. In this section, we propose to consider the pixel neighborhood to improve both accuracy and robustness, and we give a method to integrate this information in the decomposition process at a constant complexity.
2.2.1 Distance on Neighborhood
We propose to integrate the neighboring pixels information in our framework when computing the clustering distance between a pixel and a cluster . Similarly to patch-based approaches, the pixels in a square area centered on , of size pixels, are considered in the proposed color distance :
To be robust to high local gradients while preserving the image contours, we define such that , with the normalization factor such that , and .
2.2.2 Fast Distance Computation
The complexity of the proposed distance (4) is , with , the number of pixels in the neighborhood. We propose a method that drastically reduces the computational burden of (4). Since the distance is computed between a set of pixels and a cluster, it can be decomposed and partially pre-computed.
Eq. (4) can be computed at complexity .
2.3 Color and Contour Features on Linear Path
A superpixel decomposition is considered as satisfying according to the homogeneity of the color clustering and the respect of image contours. To enforce these aspects, we propose to consider color and contour features on the linear path between the pixel and the superpixel barycenter. We define the linear path , that contains the pixels starting from , the position of a pixel , to , the barycenter of a superpixel .
2.3.1 Linear Path between Pixel and Superpixel Barycenter
The considered linear path between a pixel and the barycenter of a superpixel is illustrated in Figure 3. The pixels (red) are those that intersect with the segment (arrow) between , the position of pixel (black), and , the barycenter of the superpixel (green). Pixels are selected such that each one only has neighbors belonging to the path within a pixels neighborhood.
Other works consider a geodesic distance to enforce the color homogeneity (Rubio et al., 2016; Wang et al., 2013) or the respect of object contours (Zhang and Zhang, 2017). The colors along the geodesic distance must be close to the average superpixel color to enable the association of the pixel to the superpixel, leading to potential irregular shapes. We illustrate this aspect in Figure 4. We compare a geodesic distance and average color distance on the linear path. While the geodesic can find a sinuous path to connect distant pixels, our linear path penalizes the crossing of regions with different colors.
A decomposition example for SCALP and a method based on a geodesic color distance (Rubio et al., 2016) is given in Figure 5. By considering the proposed linear path, we limit the computational cost, that can be substantial for geodesic distances, and we enforce the decomposition compactness, since features are considered on the direct path to the superpixel barycenter. More precisely, our linear path encourages the star-convexity property (Gulshan et al., 2010), i.e., for a given shape, it exists a specific point, in our case, the superpixel barycenter, from which each point of the shape can be reached by a linear path that does not escape from the shape.
Finally, note that despite the large number of pixel information considered during the decomposition process, the computational cost can be very limited. In practice, at a given iteration, for a given superpixel, the distance between a pixel and the superpixel has only to be computed once. The color distance can indeed be stored for each pixel and directly used for another linear path containing this pixel. Moreover, a very slight approximation can be made by directly storing for each pixel the average distance on the linear path to the superpixel barycenter, and using it when crossing an already processed pixel on a new linear path.
|(a) Image||(b) Geodesic distance||(c) Linear path distance|
|(a) Image||(b) Rubio et al. (2016)||(c) SCALP|
2.3.2 Color Distance to Cluster
The distance to minimize during the decomposition is composed of a color and a spatial term. Nevertheless, the color distance is now also computed on , i.e, between the cluster and the pixels on the linear path to the superpixel barycenter. We define the new color distance as:
where weights the influence of the color distance along the path. With the proposed distance (6), colors on the path to the barycenter should be close to the superpixel average color.
The distance (6) naturally enforces the regularity and also prevents irregular shapes to appear. Figure 6 shows two examples of irregular shapes that can be computed with SLIC (Achanta et al., 2012), for instance in areas of color gradation. The barycenters of these irregular superpixels are not contained within the shapes. The linear path hence capture pixels with colors that are far from the average one of . Therefore, (6) penalizes the clustering of all pixels to this superpixel during the current iteration, so they are associated to neighboring superpixels.
|(a) SLIC irregular shapes||(b) SCALP regular shapes|
2.3.3 Adherence to Contour Prior
Since the optimal color homogeneity may be not in line with the respect of image objects, or fail to catch thin edges, we propose to consider the information of a contour prior map on the linear path. Such map sets to if a contour is detected at pixel , and to otherwise. We propose a fast and efficient way to integrate a contour prior by weighting the distance between a pixel and a superpixel cluster by , considering the maximum of contour intensity on :
Figure 7 illustrates the selection of maximum
contour intensity on the linear path.
When a high contour intensity is found on the path between a pixel and the barycenter of ,
such term prevents this pixel
to be associated to the superpixel,
and all superpixel boundaries will follow more accurately the image contours.
The proposed framework can consider either soft contour maps, i.e.,
maps having values between and , or binary maps.
It also adapts well to thick contour prior since only the maximum intensity on the path is considered.
Finally, we multiply this term to the color and spatial distances to ensure the respect of the images contours, and the proposed distance to minimize during the decomposition is defined as:
|(a) Initial grid decomposition||(b) Contour prior|
|(c) Linear path||(d) Maximum contour on|
2.4 Initialization Constraint from Contour Prior
In this section, we propose a framework to use an initial segmentation computed from a contour prior completion to constrain the superpixel decomposition. To generate an image segmentation into regions from a contour map requires additional steps but may help to improve the decomposition accuracy. As stated in the introduction, although methods such as Arbelaez and Cohen (2008); Arbelaez et al. (2009) enable to segment an image into partitions considering a contour map, they do not allow to control the size, the shape and the number of the produced regions. We here propose a framework that uses an initial segmentation and produces a regular superpixel decomposition within pre-segmented regions, with control on the number of elements. This way, we take advantage of the initial segmentation accuracy while providing an image decomposition into superpixels of regular sizes and shapes. By initializing the decomposition within the computed regions, the initial superpixels better fit to the image content. For instance, small regions can be initially segmented into one or several superpixels, while they may fall between two initial superpixel barycenters, and would not be accurately segmented during the decomposition process.
2.4.1 Hierarchical Segmentation from Contour Detection
In order to adapt an initial segmentation to produce regular superpixels, we propose to use a hierarchical segmentation, that can be computed from a contour map with methods such as Arbelaez and Cohen (2008); Arbelaez et al. (2009).
Let be a hierarchical segmentation that defines a contour probability map. For any threshold, produces a set of closed curves. Regions segmented with low probability, i.e., with low intensity contours in can be deleted with a thresholding step. The thresholded closed contour map is denoted , for a threshold , and its corresponding decomposition into regions is denoted . Figure 8, illustrates the result obtained from a hierarchical segmentation for several thresholds.
2.4.2 Regular Decomposition into Superpixels from a Hierarchical Segmentation
Once the hierarchical segmentation is obtained and thresholded, a merging step can be performed to remove the smallest areas. Such small regions should be merged to an adjacent one to respect the size regularity of the decomposition. With the number of superpixels and the number of pixels of an image , the superpixel average size is . A threshold is set to merge regions containing less pixels than . The segmentation probability of a region is , i.e., the lowest intensity among its boundary pixels . The region is hence merged to its adjacent region that shares the boundary with the lowest segmentation probability:
These steps are illustrated in Figure 9, where the thresholding removes areas segmented with low probability and the merging prevents the segmentation of small regions.
|(a) Image||(b) Contour map||(c) Hierarchical segmentation||(d) Thresholding||(e) Merging|
A partition step then adds initial superpixels in the remaining regions. If the resulting number of regions is lower than the number of superpixels , superpixels are added according to the region size . In a region ,
sub-regions are initialized by a spatial K-means approach(Lloyd, 1982), regardless of the color information.
The proposed approach thus adapts well to the superpixel size, and is not sensitive to threshold settings. The framework using the contour prior as a hard constraint is illustrated in Figure 10, and will be denoted SCALP+HC in the following. Note that although we here consider the segmentation as a hard constraint to enforce the respect of image objects, the image partition can be used to only initialize the superpixel repartition, instead of using a regular grid.
3.1 Validation Framework
We use the standard Berkeley segmentation dataset (BSD) (Martin et al., 2001) to evaluate our method and compare to state-of-the-art ones. This dataset contains 200 various test images of size pixels. At least 5 human ground truth decompositions are provided per image to compute evaluation metrics in terms of consistency to the image objects, and contour adherence.
To evaluate our method and compare to other state-of-the-art frameworks, we use standard superpixel evaluation metrics. The achievable segmentation accuracy (ASA) measures the consistency of the decomposition to the image objects. Boundary recall (BR) and contour density (CD) are used to measure the detection accuracy according to the ground truth image contours. We also propose to evaluate the contour detection performance of the superpixel methods by computing the precision-recall (PR) curves (Martin et al., 2004). Finally, we report the shape regularity criteria (SRC) (Giraud et al., 2017a) that measures the regularity of the produced superpixels.
For each image of the dataset, human ground truth segmentations are provided. The reported results are averaged on all segmentations. A ground truth decomposition is denoted , with a segmented region, and we consider a superpixel segmentation .
Respect of image objects
For each superpixel of the decomposition result, the largest possible overlap with a ground truth region can be computed with ASA, which computes the average overlap percentage for all superpixels:
The BR metric measures the detection of ground truth contours by the computed superpixel boundaries . If a ground truth contour pixel has a decomposition contour pixel at an -pixel distance, it is considered as detected, and BR is defined as the percentage of detected ground truth contours:
with when is true and otherwise, and set to as in, e.g., Van den Bergh et al. (2012). However, this measure only considers true positive detection, and does not consider the number of produced superpixel contours. Therefore, methods that produce very irregular superpixels are likely to have high BR results. To overcome this limitation, as in Machairas et al. (2015); Zhang et al. (2016), the contour density (CD) can be considered to penalize a large number of superpixel boundaries . In the following, we report CD over BR results, with CD defined as:
When considering decompositions with the same CD, i.e., the same number of superpixel boundaries, BR results can be relevantly compared. Higher BR with the same CD indicates that the produced superpixels better detect image contours.
The PR framework (Martin et al., 2004) enables to measure the contour detection performances. PR curves consider both boundary recall (BR) (11), i.e., true positive detection, or percentage of detected ground truth contours, and precision , i.e., percentage of accurate detection on produced superpixel boundaries. They are computed from an input map, where the intensity in each pixel represents the confidence of being on an image boundary. As in Van den Bergh et al. (2012), we consider the average of superpixel boundaries obtained at different scales, ranging from 25 to 1000 superpixels, to provide a contour detection. In the following, to summarize the contour detection performances, we report the maximum F-measure defined as:
To evaluate the regularity of a decomposition in terms of superpixel shape, we use the shape regularity criteria (SRC) introduced in Giraud et al. (2017a), and defined for a decomposition as follows:
where , evaluates the balanced repartition of the shape with and
the square root of standard deviations of pixel positionsand in , is the convex hull containing , and CC measures the ratio between the perimeter and the area of the considered shape. The SRC measure has been proven to be more robust and accurate than the circularity metric (Schick et al., 2012) used in several superpixel works.
3.1.3 Parameter Settings
SCALP was implemented with MATLAB using single-threaded C-MEX code, on a standard Linux computer. We consider in and more advanced spectral features introduced in Chen et al. (2017). They are designed in a high dimensional space ( for color, and for spatial features). The linear path between a pixel and the barycenter of a superpixel is computed with Bresenham (1965). In (4), the parameter is empirically set to and is defined as a pixel neighborhood around a pixel , so . In the proposed color distance (6), is set to , and to in (7). The compactness parameter is set to in the final distance (8), as in Chen et al. (2017). This parameter offers a good trade-off between adherence to contour prior and compactness. The number of clustering iterations is set to , contrary to Chen et al. (2017) that uses iterations, since SCALP converges faster. Unless mentioned, when used, the contour prior is computed with Dollár and Zitnick (2013). Finally, when using the contour prior as a hard constraint (SCALP+HC), we respectively set parameters and during the region fusion (9) to and , and compute a hierarchical segmentation with Arbelaez et al. (2009)
. In the following, when reporting results on noisy images, we use a white additive Gaussian noise of variance 20.
|Initial image||=, =, =||=, =, =|
|=, =, =||=, =, =|
3.2 Influence of Parameters
3.2.1 Distance Parameters
We first measure the influence of the distance parameters in (8) on SCALP performances. In Figure 11, we report results on PR, ASA, CD over BR and SRC curves for different distance settings, on both initial and noisy BSD images. First, we note that the neighboring pixels ( in (4)) increase the decomposition accuracy. The color features ( in (6)) also improve the results, in terms of respect of image objects and regularity. Finally, the contour prior ( in (7)) along the linear path enables to reach high contour detection (PR) and also increases the performances on superpixel metrics. On noisy images, the accuracy of the contour prior is degraded, but it still provides higher ASA performances on respect of image objects. Note that if in (4), in (6) and in (7), the method is reduced to the framework of Chen et al. (2017).
Figure 12 illustrates the decomposition result for these distance parameters on a BSD image. With only the features used in Chen et al. (2017), i.e., with , , , the decomposition boundaries are very irregular. The neighborhood information greatly reduces the noise at the superpixel boundaries. The color distance on the linear path improves the superpixel regularity and provides more compact shapes. Finally, the contour information enables to more efficiently catch the object structures and to respect the image contours.
3.2.2 Contour Prior
We also investigate the influence of the contour prior. The computation of the contour information should not be sensitive to textures and high local image gradients, and many efficient methods have been proposed in the literature (see for instance references in Arbelaez et al. (2011)). The performances of our method are correlated to the contour detection accuracy, but we demonstrate that improvements are obtained even with basic contour detections.
A fast way to obtain such basic contour detection, which would be robust to textures and high gradients, is to average the boundaries of superpixel decompositions obtained at multiple scales. We propose to consider the same set of scales used for computing the PR curves. All resulting superpixels boundaries of a decomposition , computed at scale are averaged:
The average can then be thresholded to remove low confidence boundaries and provide an accurate contour prior . Figure 13 illustrates the computation of the contour prior from superpixel boundaries. Note that the decompositions at multiple scales are independent and can be computed in parallel.
In Figure 14, we provide results obtained by using different contour prior: the contour detection from multiple scale decompositions, using Achanta et al. (2012) with a threshold of the boundary map (15) set to , from the globalized probability of boundary algorithm (Maire et al., 2008), a method using learned sparse codes of patch gradients (Xiaofeng and Bo, 2012), and from a structured forests approach (Dollár and Zitnick, 2013). The results on all metrics are improved with the accuracy of the provided contour detection. Nevertheless, we note that even simple contour priors enable to improve the superpixel decomposition adherence to boundaries. In the following, reported results are computed using Dollár and Zitnick (2013).
|Initial image||Boundary average||Contour map|
3.3 Comparison with State-of-the-Art Methods
We compare the proposed SCALP approach to the following state-of-the-art methods: ERS (Liu et al., 2011), SLIC (Achanta et al., 2012), SEEDS (Van den Bergh et al., 2012), ERGC (Buyssens et al., 2014), Waterpixels (WP) (Machairas et al., 2015), ETPS (Yao et al., 2015) and LSC (Chen et al., 2017). Reported results are computed with codes provided by the authors, in their default settings.
In Figure 15, we provide PR curves with the maximum F-measure, and report the standard ASA (10), CD (12) over BR (11) and SRC (14) metrics on both initial (top) and noisy (bottom) images. SCALP outperforms the compared methods on the respect of image objects and contour detection metrics, providing for instance higher F-measure (), while producing regular superpixels. The regularity is indeed increased compared to SLIC and LSC, and is among the highest of state-of-the-art methods. The ASA evaluates the consistency of a superpixel decomposition with respect to the image objects, enhancing the largest possible overlap. Therefore, best ASA results obtained with SCALP indicate that the superpixels are better contained in the image objects. Using the contour prior as a hard constraint (SCALP+HC), our method even reaches higher performances, for instance with . Moreover, SCALP results obtained without using a contour prior, i.e., setting to in (7), still outperform the ones of the most accurate compared methods LSC and ERS. Finally, we can underline the fact that SCALP results outperform the ones of all the compared state-of-the-art methods on contour detection and respect of image objects metrics while producing regular superpixels. The gain of performances is further assessed by the result of a paired Student test on the ASA result sets. A very low -value () is obtained by comparing the result set of SCALP to the one of ERS (Liu et al., 2011), the best compared method in terms of accuracy, which demonstrates the significant increase of performances obtained with SCALP. Generally, to enforce the regularity may reduce the contour adherence (Van den Bergh et al., 2012), but SCALP succeeds in providing regular but accurate superpixels. This regularity property has been proven crucial for object recognition (Gould et al., 2014), tracking (Reso et al., 2013) and segmentation and labeling applications (Strassburg et al., 2015). Therefore, the use of SCALP may increase the accuracy of such superpixel-based methods.
|Method||F (13)||ASA (10)||CD/BR (12)||SRC (14)|
The gain over state-of-the-art methods is largely increased when computing superpixels on noisy images. Methods such as Buyssens et al. (2014); Chen et al. (2017); Liu et al. (2011) obtain very degraded performances when applied to slightly noised images, while Van den Bergh et al. (2012) is the only method that is robust to noise on all evaluated aspects. The state-of-the-art methods can indeed have very different behavior when applied to noisy images. They generally produce very noisy superpixel boundaries (see Figure 1). This aspect is expressed by the lower performances of CD over BR in the bottom part of Figure 15. The regularity is also degraded for all methods, except Buyssens et al. (2014), that tends to generate more regular superpixels, failing at grouping homogeneous pixels. Finally, on the ASA metric, SCALP provides slightly higher results than SCALP+HC for these images. The presence of noise may mislead the contour detection that should not be considered as a hard constraint to ensure the respect of object segmentation. These results are summarized in Table 1, where we report the performances of all compared methods on both initial and noisy images for superpixels.
Despite the large number of features used in SCALP, the computational time remains reasonable, i.e., less than s on BSD images, on a single CPU, without any multi-threading architecture, contrary to implementations of methods such as ETPS (Yao et al., 2015). This computational time corresponds to standard ones of superpixel methods, and SCALP is even faster than methods such as Levinshtein et al. (2009); Liu et al. (2011), whose computational time can be up to s.
In this work, we focus on the decomposition performances and do not extensively compare the processing times, since this measurement is highly dependent on the implementation and optimization, and does not necessarily reflect the computational potential of each method (Stutz et al., 2017). Nevertheless, our method is based on the iterative clustering framework (Achanta et al., 2012), and recent works have demonstrated that such algorithm could be implemented to perform in real-time (Ban et al., 2016; Choi and Oh, 2016; Neubert and Protzel, 2014). Therefore, since SCALP have the same complexity as SLIC, our method can reach such computational time with optimized implementation or multi-threading architectures.
Finally, Figures 16 and 17 respectively illustrate the superpixel decomposition results obtained with SCALP and the best compared methods on initial and noisy images. SCALP provides more regular superpixels while tightly following the image contours. SCALP+HC enables to more accurately guide the decomposition by constraining superpixels to previously segmented regions. While most of the compared methods produce inaccurate and irregular results with slightly noised images (see Figure 17), SCALP is robust to noise and produces regular superpixels that adhere well to the image contours.
3.4 Extension to Supervoxels
Finally, we naturally extend the SCALP method to the computation of supervoxels on 3D volumes, for the segmentation of 3D objects or medical images. Many supervoxel methods are dedicated to video segmentation, see for instance Xu and Corso (2012), and references therein. These methods segment the volume into temporal superpixel tubes and are therefore only adapted to the context of video processing. Other methods propose to perform superpixel tracking, e.g., Chang et al. (2013); Reso et al. (2013); Wang et al. (2011), which can result in similar tubular supervoxel segmentation, and may require the computation of optical flow to be efficient (Chang et al., 2013). Contrary to other methods that necessitate substantial adaptations for 3D data, we naturally extend SCALP to compute 3D volume decompositions. We start from a 3D regular grid and perform the decomposition by adding one dimension to the previous equations presented in Section 2.
To validate our extension to supervoxels, we consider 3D magnetic resonance imaging (MRI) data from the Brain Tumor Segmentation (BRATS) dataset (Menze et al., 2015). This dataset contains 80 brain MRI of patients suffering from tumors. The images are segmented into three labels: background, tumor and edema, surrounding the tumor. We illustrate examples of SCALP supervoxel segmentation with the ground truth segmentation in Figure 18, where the tumor and edema are respectively segmented in green and red color. This dataset is particularly challenging since the resolution of images is very low and the ground truth segmentation is not necessarily in line with the image gradients. Finally, note that SCALP obtains an average 3D ASA measure of , and outperforms state-of-the-art methods with available implementations SLIC (Achanta et al., 2012) and ERGC (Buyssens et al., 2014), that respectively obtain a 3D ASA of and .
|Image||Ground truth||Supervoxels||Image||Ground truth||Supervoxels|
In this work, we generalize the superpixel clustering framework proposed in Achanta et al. (2012); Giraud et al. (2016), by considering color features and contour intensity on the linear path from the pixel to the superpixel barycenter. Our method is robust to noise and the use of features along such path improves the respect of image objects and the shape regularity of the superpixels. The considered linear path naturally enforces the superpixel convexity while other geodesic distances would provide irregular superpixels. Our fast integration of these features within the framework enables to compute the decomposition in a limited computational time. SCALP obtains state-of-the-art results, outperforming the most recent methods of the literature on superpixel and contour detection metrics. Image processing and computer vision pipelines would benefit from using such regular, yet accurate decompositions.
This study has been carried out with financial support from the French State, managed by the French National Research Agency (ANR) in the frame of the GOTMI project (ANR-16-CE33-0010-01) and the Investments for the future Program IdEx Bordeaux (ANR-10-IDEX-03-02) with the Cluster of excellence CPU.
- Achanta et al. (2012) Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S., 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. and Mach. Intell. 34, 2274–2282.
Arbelaez and Cohen (2008)
Arbelaez, P., Cohen, L.,
Constrained image segmentation from hierarchical boundaries, in: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1–8.
- Arbelaez et al. (2009) Arbelaez, P., Maire, M., Fowlkes, C., Malik, J., 2009. From contours to regions: an empirical evaluation, in: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2294–2301.
- Arbelaez et al. (2011) Arbelaez, P., Maire, M., Fowlkes, C., Malik, J., 2011. Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. and Mach. Intell. 33, 898–916.
- Ban et al. (2016) Ban, Z., Liu, J., Fouriaux, J., 2016. GLSC: LSC superpixels at over 130 FPS. J. of Real-Time Image Process. , 1–12.
- Bresenham (1965) Bresenham, J.E., 1965. Algorithm for computer control of a digital plotter. IBM Syst. J. 4, 25–30.
- Buyssens et al. (2014) Buyssens, P., Gardin, I., Ruan, S., Elmoataz, A., 2014. Eikonal-based region growing for efficient clustering. Image and Vision Computing 32, 1045–1054.
- Chang et al. (2013) Chang, J., Wei, D., Fisher, J.W., 2013. A video representation using temporal superpixels, in: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2051–2058.
Chen et al. (2017)
Chen, J., Li, Z., Huang,
Linear spectral clustering superpixel.IEEE Trans. Image Process. (2017) .
- Choi and Oh (2016) Choi, K.S., Oh, K.W., 2016. Subsampling-based acceleration of simple linear iterative clustering for superpixel segmentation. Comput. Vis. Image Underst. 146, 1–8.
- Comaniciu and Meer (2002) Comaniciu, D., Meer, P., 2002. Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. and Mach. Intell. 24, 603–619.
- Dollár and Zitnick (2013) Dollár, P., Zitnick, C.L., 2013. Structured forests for fast edge detection, in: Proc. of IEEE International Conference on Computer Vision, pp. 1841–1848.
- Felzenszwalb and Huttenlocher (2004) Felzenszwalb, P.F., Huttenlocher, D.P., 2004. Efficient graph-based image segmentation. Int. J. Comput. Vis. 59, 167–181.
- Fulkerson et al. (2009) Fulkerson, B., Vedaldi, A., Soatto, S., 2009. Class segmentation and object localization with superpixel neighborhoods, in: Proc. of IEEE International Conference on Computer Vision, pp. 670–677.
- Giraud et al. (2016) Giraud, R., Ta, V.T., Papadakis, N., 2016. SCALP: Superpixels with contour adherence using linear path, in: Proc. of International Conference on Pattern Recognition, pp. 2374–2379.
- Giraud et al. (2017a) Giraud, R., Ta, V.T., Papadakis, N., 2017a. Robust shape regularity criteria for superpixel evaluation, in: Proc. of IEEE International Conference on Image Processing.
- Giraud et al. (2017b) Giraud, R., Ta, V.T., Bugeau, A., Coupé, P., Papadakis, N., 2017b. SuperPatchMatch: An algorithm for robust correspondences using superpixel patches. IEEE Trans. Image Process. (in press).
- Giraud et al. (2017c) Giraud, R., Ta, V.T., Papadakis, N., 2017c. Evaluation framework of superpixel methods with a global regularity measure. J. of Electronic Imaging 26, 26–18.
- Gould et al. (2008) Gould, S., Rodgers, J., Cohen, D., Elidan, G., Koller, D., 2008. Multi-class segmentation with relative location prior. Int. J. Comput. Vis. 80, 300–316.
- Gould et al. (2014) Gould, S., Zhao, J., He, X., Zhang, Y., 2014. Superpixel graph label transfer with learned distance metric, in: Proc. of European Conference on Computer Vision, pp. 632–647.
- Gulshan et al. (2010) Gulshan, V., Rother, C., Criminisi, A., Blake, A., Zisserman, A., 2010. Geodesic star convexity for interactive image segmentation, in: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 3129–3136.
- Huang et al. (2016) Huang, C.R., Wang, W.A., Lin, S.Y., Lin, Y.Y., 2016. USEQ: Ultra-fast superpixel extraction via quantization, in: International Conference on Pattern Recognition.
Kae et al. (2013)
Kae, A., Sohn, K., Lee,
H., Learned-Miller, E., 2013.
Augmenting CRFs with Boltzmann machine shape priors for image labeling, in: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2019–2026.
- Levinshtein et al. (2009) Levinshtein, A., Stere, A., Kutulakos, K.N., Fleet, D.J., Dickinson, S.J., Siddiqi, K., 2009. Turbopixels: Fast superpixels using geometric flows. IEEE Trans. Pattern Anal. and Mach. Intell. 31, 2290–2297.
- Liu et al. (2011) Liu, M.Y., Tuzel, O., Ramalingam, S., Chellappa, R., 2011. Entropy rate superpixel segmentation, in: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2097–2104.
- Lloyd (1982) Lloyd, S., 1982. Least squares quantization in PCM. IEEE Trans. on Information Theory 28, 129–137.
- Machairas et al. (2015) Machairas, V., Faessel, M., Cárdenas-Peña, D., Chabardes, T., Walter, T., Decencière, E., 2015. Waterpixels. IEEE Trans. Image Process. 24, 3707–3716.
- Maire et al. (2008) Maire, M., Arbelaez, P., Fowlkes, C., Malik, J., 2008. Using contours to detect and localize junctions in natural images, in: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1–8.
- Martin et al. (2001) Martin, D., Fowlkes, C., Tal, D., Malik, J., 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, in: Proc. of IEEE International Conference on Computer Vision, pp. 416–423.
- Martin et al. (2004) Martin, D., Fowlkes, C., Malik, J., 2004. Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans. Pattern Anal. and Mach. Intell. 26, 530–549.
- Menze et al. (2015) Menze, B., Jakab, A., Bauer, S., et al., 2015. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34, 1993–2024.
- Moore et al. (2008) Moore, A.P., Prince, S.J.D., Warrell, J., Mohammed, U., Jones, G., 2008. Superpixel lattices, in: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1–8.
- Mori et al. (2004) Mori, G., Ren, X., Efros, A.A., Malik, J., 2004. Recovering human body configurations: combining segmentation and recognition, in: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 326–333.
- Neubert and Protzel (2012) Neubert, P., Protzel, P., 2012. Superpixel benchmark and comparison, in: Forum Bildverarbeitung, pp. 1–12.
- Neubert and Protzel (2014) Neubert, P., Protzel, P., 2014. Compact watershed and preemptive SLIC: on improving trade-offs of superpixel segmentation algorithms, in: Proc. of International Conference on Pattern Recognition, pp. 996–1001.
- Reso et al. (2013) Reso, M., Jachalsky, J., Rosenhahn, B., Ostermann, J., 2013. Temporally consistent superpixels, in: Proc. of IEEE International Conference on Computer Vision, pp. 385–392.
- Rubio et al. (2016) Rubio, A., Yu, L., Simo-Serra, E., Moreno-Noguer, F., 2016. BASS: Boundary-aware superpixel segmentation, in: Proc. of International Conference on Pattern Recognition.
- Sawhney et al. (2014) Sawhney, R., Li, F., Christensen, H.I., 2014. GASP: Geometric association with surface patches, in: Proc. of International Conference on 3D Vision.
- Schick et al. (2012) Schick, A., Fischer, M., Stiefelhagen, R., 2012. Measuring and evaluating the compactness of superpixels, in: Proc. of International Conference on Pattern Recognition, pp. 930–934.
- Shi and Malik (2000) Shi, J., Malik, J., 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. and Mach. Intell. 22, 888–905.
- Strassburg et al. (2015) Strassburg, J., Grzeszick, R., Rothacker, L., Fink, G.A., 2015. On the influence of superpixel methods for image parsing, in: Proc. of the Inf. Conf. on Computer Vision Theory and Applications (VISAPP), pp. 518–527.
- Stutz et al. (2017) Stutz, D., Hermans, A., Leibe, B., 2017. Superpixels: An evaluation of the state-of-the-art. Comput. Vis. Image Underst. (in press).
- Tighe and Lazebnik (2010) Tighe, J., Lazebnik, S., 2010. SuperParsing: Scalable nonparametric image parsing with superpixels, in: Proc. of European Conference on Computer Vision, pp. 352–365.
- Van den Bergh et al. (2012) Van den Bergh, M., Boix, X., Roig, G., de Capitani, B., Van Gool, L., 2012. SEEDS: Superpixels extracted via energy-driven sampling, in: Proc. of European Conference on Computer Vision, pp. 13–26.
- Vedaldi and Soatto (2008) Vedaldi, A., Soatto, S., 2008. Quick shift and kernel methods for mode seeking, in: Proc. of European Conference on Computer Vision, pp. 705–718.
- Veksler et al. (2010) Veksler, O., Boykov, Y., Mehrani, P., 2010. Superpixels and supervoxels in an energy optimization framework, in: Proc. of European Conference on Computer Vision, pp. 211–224.
- Vincent and Soille (1991) Vincent, L., Soille, P., 1991. Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Trans. Pattern Anal. and Mach. Intell. 13, 583–598.
- Wang et al. (2013) Wang, P., Zeng, G., Gan, R., Wang, J., Zha, H., 2013. Structure-sensitive superpixels via geodesic distance. Int. J. Comput. Vis. 103, 1–21.
- Wang et al. (2011) Wang, S., Lu, H., Yang, F., Yang, M.H., 2011. Superpixel tracking, in: Proc. of IEEE International Conference on Computer Vision, pp. 1323–1330.
- Xiaofeng and Bo (2012) Xiaofeng, R., Bo, L., 2012. Discriminatively trained sparse code gradients for contour detection, in: Proc. of Conf. on Neural Information Processing Systems, pp. 584–592.
- Xu and Corso (2012) Xu, C., Corso, J.J., 2012. Evaluation of super-voxel methods for early video processing, in: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1202–1209.
- Yang et al. (2010) Yang, Y., Hallman, S., Ramanan, D., Fowlkes, C., 2010. Layered object detection for multi-class segmentation, in: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 3113–3120.
- Yao et al. (2015) Yao, J., Boben, M., Fidler, S., Urtasun, R., 2015. Real-time coarse-to-fine topologically preserving segmentation, in: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2947–2955.
- Zhang and Zhang (2017) Zhang, N., Zhang, L., 2017. SSGD: Superpixels using the shortest gradient distance, in: Proc. of the IEEE International Conference on Image Processing.
- Zhang et al. (2016) Zhang, Y., Li, X., Gao, X., Zhang, C., 2016. A simple algorithm of superpixel segmentation with boundary constraint. IEEE Trans. Circuits and Syst. for Video Technol. PP, 1–1.