The constant increase of image data may highly impact the computational cost of computer vision pipelines. Among the approaches used to reduce the processing load, dimension reduction and multi-resolution methods have been widely used over the past years. In this context, decompositions into superpixels appear to be very interesting, since the created regions tend to respect the boundaries of the image objects. The relations between these irregular regions at different resolution levels can still be inferred , and superpixel neighborhoods can be used as for standard regular patch-wise and multi-resolution processing . Therefore, many superpixel-based methods have been proposed in the literature, for different image processing and analysis applications, e.g., semantic segmentation , tracking with optical flow 
, depth estimation, and color  or style transfer .
Since their introduction and popularization with , the majority of superpixel methods decompose the image into regions approximately containing the same number of pixels with homogeneous colors. To compute this clustering, most methods such as [9, 10, 8, 11, 12, 13, 14, 15], consider a trade-off distance between spatial and color spaces at the pixel scale. The spatial distance enables to provide a relatively regular decomposition of the image domain, while the color distance associates pixels to a superpixel with the same average color. Most state-of-the-art methods only use the pixel spatial and color features in their clustering model, since information at the pixel scale may be sufficient to detect the object boundaries in a natural color image. Consequently, recent works such as [16, 17] have highlighted the non robustness of pixel-wise state-of-the-art methods to noise or texture for instance. All methods relying on pixel-wise information may indeed highly fail at grouping textures and may provide very inconsistent decompositions . Even recent methods using advanced feature spaces [18, 13] or additional information such as features on the path to the superpixel barycenter [19, 12, 15] do not explicitly capture texture patterns and fail to detect texture changes.
In the recent method TASP , a straightforward extension of the SLIC framework  is proposed to compute texture-aware superpixels. Patch comparisons are performed within the superpixel to provide a texture term in the clustering model. Nevertheless, such method presents an important computational complexity. The SLIC framework  begin based on a K-means clustering, each superpixel iteratively computes its distance to all pixels in a restricted area. In TASP , the texture term must be computed for each superpixel at each iteration, by a nearest neighbor search performed for all pixels in this area, leading to an important computational burden, i.e., more than s for images of pixels. Hence, it appears necessary to propose a more efficient approach in terms of complexity, also able to accurately cluster textures.
In this paper, we propose a new Nearest Neighbor-based Superpixel Clustering (NNSC) method to generate accurate and texture-aware superpixels. We introduce a new clustering framework using patch-based nearest neighbor matching, while most existing methods are based on a K-means clustering. Hence, we directly group pixels in the patch space while previous methods such as , combine both approaches at the expense of an important computational load. We also propose a new method to merge several decomposition estimations obtained from different nearest neighbor searches.
In the following, we first show the interest of considering patches for texture clustering, and the limitations of their use in a K-means-based clustering algorithm . Then, we present our new decomposition method relying on a patch-based nearest neighbor clustering, with much lower complexity. Finally, we study our method parameters and compare its segmentation performances to the ones of the state-of-the-art methods on both natural color and texture datasets.
Ii Texture Superpixels using Patches
In this section, we first demonstrate the ability of patches to easily cluster image textures. Then, we present the standard pixel-wise K-means-based clustering algorithm  and the limitations of its extension using patches to generate texture-aware superpixels in .
Ii-a Texture Clustering using Patches
Patches enable to capture the neighborhood of each image pixel. Non-local patch-based approaches, that have first become popular for texture synthesis  and denoising applications , use this structure to find similar patterns in the same or other images. The distance between fixed size patches enables to reflect both the similarity in terms of intensity and texture patterns. This distance between two patches and describing the neighborhood of two pixels and , is generally computed with a -2 norm such that:
In the context of texture-aware superpixel clustering, the texture homogeneity between a pixel and a superpixel is not easy to measure since a pixel neighborhood must be compared to a superpixel having a variable size. Texture classification approaches could necessitate prior information on the image type, or additional parameter settings to be consistent with the pixel-wise color information that must also be taken into account in the clustering model . Moreover, such approaches can be computationally costly. Therefore, using patches appears to be an interesting solution but requires a selection strategy to determine which patches to compare.
Ii-B Texture Superpixels from K-means-based Framework
The recent TASP method  proposes to generate texture-aware superpixels using the K-means-based framework of , which is very popular due to its simplicity of use and understanding. The image is first split into regular blocks of size , depending of the input number of desired superpixels. Superpixels are then sequentially processed, and try to gather neighboring pixels in a restricted area of size . The clustering distance between a pixel and a superpixel is composed of a spatial and color distance. The pixel features are compared to the average features over all pixels in the superpixel. At the end of each iteration, pixels are associated to the superpixel providing the lowest distance.
The TASP method  adds a texture homogeneity term to the distance of . It uses fixed size patches as descriptors to easily capture texture patterns while staying in the same feature space as the color distance between pixels and superpixels. Figure 1 shows that patch distances may be high even within the same texture area. Therefore, comparing a pixel neighborhood described by a patch to a reference one, for instance at the superpixel barycenter would not guarantee a relevant texture measure. Hence,  performs a patch-based nearest neighbor (NN) search to find similar patches in the superpixel. Similar patches then implies texture homogeneity, and favor the association of the pixel to the superpixel.
With such approach, the NN search must be performed for all pixels in the pixels area for each superpixel at each iteration, leading to overlapping pixels and repetition of the NN matching process. In the K-means-based clustering, a pixel is indeed approximately considered by superpixels at each iteration. Therefore, TASP complexity depends on the number of image pixels , number of K-means iterations , and number of NN search iterations such that .
These limitations motivate the introduction of our new clustering framework, significantly reducing this complexity while preserving the ability to generate texture-aware superpixels.
Iii Texture Superpixels from Patch-based Nearest Neighbor Matching
In this section, we first introduce our new clustering framework directly based on NN matching. Then, we present in detail the algorithm used to perform the search of similar patches. Finally, we propose a method to merge several superpixel decompositions obtained from different NN matching.
Iii-a Nearest Neighbor Superpixel Clustering Framework
Iii-A1 Clustering Algorithm
The proposed NNSC method directly clusters pixels using a patch-based NN matching process, that we prove to be necessary to provide texture-aware superpixels. The search is sequentially performed for all image pixels, to iteratively refine the initial superpixel grid decomposition. Therefore, it differs from the standard K-means-based framework that sequentially processes superpixels, leading the same pixel to be considered several times at the same iteration.
The NNSC decomposition process to obtain a label map is illustrated in Figure 2. At a given iteration, the label of the superpixel containing the patch correspondence is assigned to the considered pixel position for next iteration. The complexity of NNSC with its clustering framework directly based on NN matching reduces to . Note that the search for similar patches can be performed by any NN method, and we present the proposed search strategy in section III-B.
Iii-A2 Patch-based Clustering Distance
To capture both the similarity in terms of intensity and texture patterns, patch intensities in the feature space (e.g., colors in CIELab color space) are considered in the patch-based distance computed between a pixel of patch , and a patch at position such that:
with the regularity parameter, automatically set for each superpixel , and , a spatial weighting function defined such that , favoring the search near to the superpixel barycenter , preventing a superpixel to cluster different textures.
Finally, the global patch-based clustering distance considers the patch distance term (2), but also the standard color and spatial distances at the pixel scale . These terms respectively enable to adapt superpixel borders to object contours and to ensure the shape regularity of superpixels. Hence, patch correspondences are computed according to:
Iii-B Nearest Neighbor Search using PatchMatch
Since computing exact NN would be too costly, we choose to use the approximate NN search algorithm PatchMatch (PM) . PM was initially proposed to provide for each patch of an image , a correspondence in an image . The algorithm starts from random correspondences and iteratively refines the patch associations using fast propagation of good matches from adjacent neighbors, and random tests. We adapt this algorithm to our context, i.e., finding similar patches within the same image and into a restricted area around each patch.
First, to ensure the regularity of the decomposition, we limit the search to a pixels area around the pixel position. For a pixel , this area is denoted in Figure 3, which illustrates the algorithm steps for a given patch . Naturally, to avoid to match the same patch , a -neighborhood is defined where to prevent the selection of patches. Random associations are first computed in these restricted areas (Figure 3(a)) after the grid initialization. Then, the propagation step considers the correspondences of the recently processed adjacent patches to lead to new potential correspondences (Figure 3(b)). Finally, random selections are performed in areas of reducing size around the best current correspondence (Figure 3(c)). This adaptation of PM finds similar patches in the same image while spatially constraining the search area to ensure superpixel regularity.
|(a) Initialization||(b) Propagation||(c) Random search|
Iii-C Aggregation of Multiple Clustering Estimation
PM being partly random, several clustering estimations can be computed, and averaged to improve the performances, as in . These independent estimations can be easily launched in parallel using multi-threading implementation. Variations between estimations being reduced, the aggregation of multiple label maps can be performed as follows:
where equals when , and otherwise.
Finally, as in , a post-processing step ensuring superpixel connectivity is performed on the final label map .
Iv Evaluation of Performances
Iv-a Validation Framework
To evaluate the segmentation performances, we consider a standard composite texture image (CTI) dataset , which is composed of grayscale images containing up to different textures111Dataset available at: http://rgiraud.vvv.enseirb-matmeca.fr/nnsc/. High performances on these images demonstrate the ability to detect texture changes. We also report the performances for the standard natural color Berkeley Segmentation Dataset (BSD) , which contains test images of size pixels.
Iv-A2 Compared methods
NNSC performances are compared to the ones of the recent state-of-the-art methods SLIC , ERGC , ETPS , LSC , SNIC , SCALP , and TASP , used with parameters recommended by the authors. Performances are measured with the standard Achievable Segmentation Accuracy (ASA)  that evaluates the accuracy of superpixels according to a ground truth segmentation.
In NNSC default settings, patches of size pixels are selected outside a neighborhood. label maps are aggregated, and the number of iterations is set to . Features and parameters in (3) are computed as in . These parameters are empirically set, and results in section IV-C are obtained using the same settings. Finally, note that the random sequence of PM is controlled to provide the same decomposition for the same image and parameters.
Iv-B Influence of Parameters
Iv-B1 Patch Size
The influence of the patch size (2) on the performances is shown in Figure 4(a). On the CTI dataset, large patches enable to efficiently capture textures, while on the BSD dataset patches larger than do not provide more information, object contours being mainly detected by color changes. In NNSC default settings, a patch size of is chosen as a good trade-off between accuracy and computational time. Nevertheless, parameters could be manually optimized.
Iv-B2 Number of Clustering Estimations
Iv-C Comparison to the State-of-the-Art Methods
Iv-C1 Segmentation Performances
Performances are reported for several superpixel scales in Figure 5. NNSC obtains performances similar to TASP  on the CTI dataset Figure 5(a), showing its capacity to produce texture-aware superpixels. while performing as well or better than the best compared methods on the BSD Figure 5(b). Note that these results are obtained using the same parameters.
NNSC is also visually compared to the most recent state-of-the-art approaches in Figure 6. On the natural color image, NNSC provides relevant superpixels that accurately detect structures, e.g., the tree or the bear’s arm. On the complex composite texture image, NNSC provides more accurate segmentation, with much less fuzzy superpixel shapes.
|Initial image||LSC ||SNIC ||SCALP ||TASP ||NNSC|
Iv-C2 Computational Complexity
NNSC presents a significantly reduced complexity compared to the TASP texture-aware superpixel approach , whose complexity depends on the number of image pixels , number of K-means iterations , and number of NN search iterations such that, , while , since it is directly based on a NN clustering framework.
NNSC takes around s in its default settings, while TASP requires in average s to decompose a BSD image of pixels on a linux computer with 4 cores at 1.90GHz and 16GB of RAM. With costly patch-based distances to handle textures, and without advanced code optimizations, NNSC achieves computational times similar to the ones of accurate methods such as [9, 15]. Finally, NNSC could reach real-time performances since several works have proposed such PM implementations using GPU architectures .
In this work, we propose a new superpixel method considering information at the patch scale to cluster pixels having similar local texture properties. The proposed approach iteratively clusters pixels using a locally constrained patch-based nearest neighbor matching. This way, it significantly reduces the complexity of existing texture-aware approaches, while preserving the accuracy of segmentation. Future works will focus on the extension of the proposed method to 3D supervoxel decomposition, with real-time processing, for applications such as object tracking on video.
-  K. Nakamura, and B.-W. Hong, “Hierarchical image segmentation via recursive superpixel with adaptive regularity,” in International Conference on Computer Vision, vol. 26, 2017.
-  R. Giraud, V.-T. Ta, A. Bugeau, P. Coupé, and N. Papadakis, “SuperPatchMatch: an algorithm for robust correspondences using superpixel patches,” in Trans. on Image Processing, vol. 26, pp. 4068–4078, 2017.
M. Mostajabi, P. Yadollahpour, and G. Shakhnarovich,
“Feedforward semantic segmentation with zoom-out features,” in
Int. Conf. on Computer Vision and Pattern Recognition, pp. 3376–3385, 2015.
-  M. Menze, and A. Geiger, “Object scene flow for autonomous vehicles,” in Int. Conf. on Computer Vision and Pattern Recognition, pp. 3061–3070, 2015.
-  C. L. Zitnick, and S. B. Kang, “Stereo for image-based rendering using image over-segmentation,” in International Journal of Computer Vision, pp. 49–65, 2007.
-  J. Rabin, and N. Papadakis, “Non-convex relaxation of optimal transport for color transfer,” in Int. Conf. on Neural Information Processing Systems, 2014.
-  J. Liu, W. Yang, X. Sun, and W. Zeng, “Photo stylistic brush: robust style transfer via superpixel-based bipartite graph,” in Trans. on Multimedia, vol. 20, pp. 1724–1737, 2018.
-  R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, “SLIC superpixels compared to state-of-the-art superpixel methods,” Trans. on Pattern Analysis and Machine Intelligence, vol. 34, pp. 2274–2282, 2012.
-  M. Y. Liu, O. Tuzel, S. Ramalingam, and R. Chellappa, “Entropy rate superpixel segmentation,” in Int. Conf. on Computer Vision and Pattern Recognition, pp. 2097–2104, 2011.
-  M. Van den Bergh, X. Boix, G. Roig, B. de Capitani, and L. Van Gool, “SEEDS: superpixels extracted via energy-driven sampling,” in European Conference on Computer Vision, pp. 13–26, 2012.
-  J. Yao, M. Boben, S. Fidler, and R. Urtasun, “Real-time coarse-to-fine topologically preserving segmentation,” in Int. Conf. on Computer Vision and Pattern Recognition, pp. 2947-2955, 2015.
-  N. Zhang, and L. Zhang, “SSGD: superpixels using the shortest gradient distance,” in International Conference on Image Processing, pp. 3869–3873, 2017.
J. Chen, Z. Li, and B. Huang, “Linear spectral clustering superpixel,” inTrans. on Image Processing, vol. 26, pp. 3317–3330, 2017.
-  R. Achanta, and S. Süsstrunk, “Superpixels and polygons using simple non-iterative clustering,” in Int. Conf. on Computer Vision and Pattern Recognition, pp. 4895–4904, 2017.
-  R. Giraud, V.-T. Ta, and N. Papadakis, “Robust superpixels using color and contour features along linear path,” in Computer Vision and Image Understanding, vol. 170, pp. 1–13, 2018.
-  D. Stutz, A. Hermans, and B. Leibe, “Superpixels: an evaluation of the state-of-the-art,” in Computer Vision and Image Understanding, vol. 166, pp. 1–27, 2018.
-  R. Giraud, V.-T. Ta, N. Papadakis, and Y. Berthoumieu, “Texture-aware superpixel segmentation,” International Conference on Image Processing, 2019.
-  Y.-J. Liu, C.-C. Yu, M.-J. Yu, and Y. He, “Manifold SLIC: a fast method to compute content-sensitive superpixels,” in Int. Conf. on Computer Vision and Pattern Recognition, pp. 651–659, 2016.
-  P. Buyssens, I. Gardin, S. Ruan, and A. Elmoataz, “Eikonal-based region growing for efficient clustering,” in Image and Vision Computing, vol. 32, pp. 1045–1054, 2014.
-  A. Efros, and T. Leung, “Texture synthesis by non-parametric sampling,” in International Conference on Computer Vision, vol. 2, pp. 1033–1038, 1999.
-  A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” in Int. Conf. on Computer Vision and Pattern Recognition, vol. 2, pp. 60-65, 2005.
-  C. Barnes, E. Shechtman, A. Finkelstein, and D. B. Goldman, “PatchMatch: a randomized correspondence algorithm for structural image editing,” in Trans. on Graphics, vol. 28, 2009.
-  R. Giraud, V.-T. Ta, N. Papadakis, J. V. Manjón, L. Collins, and P. Coupé, “An optimized PatchMatch for multi-scale and multi-feature label fusion,” in Neuroimage, vol. 124, pp. 770–782, 2016.
-  T. Randen, and J. H. Husoy, “Filtering for texture classification: a comparative study,” in Trans. on Pattern Analysis and Machine Intelligence, vol. 21, pp. 291–310, 1999.
-  D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A Database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in International Conference on Computer Vision, vol. 2, pp. 416–423, 2001.
-  H. Nover, S. Achar, and D. Goldman, “ESPReSSo: efficient slanted PatchMatch for real-time spacetime stereo,” in Int. Conf. on 3D Vision, pp. 578–586, 2018.