Superpixel segmentation approaches that locally group pixels into regions have become very popular in image processing and computer vision applications. The aim is to exploit the local redundancy of information to lower the computational burden and to potentially improve the performances by reducing the noise of a processing at the pixel level. Superpixels can also be considered as a multi-resolution approach that preserves image contours, contrary to standard regular downsampling methods. It is thus a very interesting pre-processing for applications such as visual saliency estimation[liu2014superpixel, he2015supercnn], data association across views [sawhney2014], segmentation and classification [gould2014, gadde2016superpixel, giraud2017_spm] and object detection [arbelaez2011, shu2013improving, yan2015object] or tracking [chang2013, reso2013].
For the past years, most superpixel methods have tended to produce equally-sized regions with homogeneous pixels in terms of color. This paradigm is usually in line with the segmentation of a natural image objects, whose contours can be detected by color changes. Hence, to cluster the pixels into regions, state-of-the-art methods such as [liu2011, vandenbergh2012, achanta2012, buyssens2014, achanta2017superpixels], only use distance terms in spatial and color (e.g., CIELab) spaces. In [liu2016manifold, chen2017], more advanced feature spaces are defined to improve the segmentation performances. More recently, [giraud2018_scalp] proposes to consider contour information in the clustering to ensure the respect of the object boundaries. Nevertheless, such framework requires the need for prior contour detection, at the expense of a global higher complexity.
Most of these state-of-the-art methods only use the pixel information as clustering feature. Therefore, they can be severely impacted by high frequency contrast variations and fail to produce equally-sized regions having the same textural properties. The proposed method TASP is compared in Figure 1 to the state-of-the-art approaches on a synthetic texture image. While TASP produces a relevant segmentation, all other methods are highly misleaded by the texture patterns. We used the regularity parameters recommended by the authors for tunable methods [achanta2012, buyssens2014, yao2015, chen2017, giraud2018_scalp], but no other setting would enable to capture texture information. Superpixel methods are indeed generally optimized and evaluated on noise-free natural color images, although specific tasks require to decompose highly textured or low resolution grayscale images, for instance in medical applications [tian2016superpixel].
|Initial image||ERS [liu2011]||SEEDS [vandenbergh2012]||SLIC [achanta2012]||ERGC [buyssens2014]|
|ETPS [yao2015]||LSC [chen2017]||SNIC [achanta2017superpixels]||SCALP [giraud2018_scalp]||TASP|
To overcome the limitations of handcrafted color spaces, deep learning approaches have been proposed[jampani2018superpixel, liu2018learning]. Nevertheless, the gain obtained with learned features on a training dataset may come at the expense of usual deep learning limitations, i.e., important learning time, need for a substantial training database and material resources, and direct applicability limited to similar images. Moreover, these approaches do not directly extend to supervoxels and prevent from setting the shape regularity, which can highly impact the performances of superpixel-based tasks. It is thus still necessary to increase the robustness of non-deep learning superpixel methods to textures, while preserving their desired properties: adaptability, low complexity and limited parameter settings.
Contributions. In this work, we propose a new Texture-Aware SuperPixel (TASP) clustering method able to accurately segment highly textured images, but also any input image, e.g., natural color ones, using the same parameters.
To be able to generate relevant superpixels on textured images (see Figure 1), TASP adjusts its spatial constraint, according to the feature variance within the superpixel. This way, TASP also addresses the need for fine manual regularity setting. Most recent state-of-the-art methods globally set this parameter according to the image nature, leading default or sub-optimal settings to highly impact the results [giraud2017_gef].
Then, to ensure the texture homogeneity, we introduce a new patch-based framework enabling to easily evaluate the similarity of a pixel neighborhood to a superpixel.
We validate TASP on natural color images from a standard segmentation dataset [martin2001], and on two new datasets proposed to evaluate texture segmentation performances. TASP significantly outperforms the state-of-the-art methods on texture segmentation performances, while performing as well, or better, on natural images, using the same parameters.
2 Texture-Aware SuperPixels
The TASP method improves the superpixel decomposition approach used in [achanta2012, chen2017, giraud2018_scalp], that is first presented in this section. Then, we propose a method to locally set the spatial regularity of superpixels, to automatically adapt to the image content. Finally, we introduce a new pixel to superpixel texture homogeneity measure to group pixels in terms of texture.
2.1 K-means-based Iterative Clustering
The standard framework of [achanta2012]
only requires the number of superpixels to produce and a regularity parameter. The algorithm is based on an iteratively constrained K-means clustering of pixels. Superpixelsare first regularly set over the image domain as blocks of size , and are described by their average intensity feature (CIELab colors for [achanta2012]) and their spatial barycenter of pixels in . The clustering relies on a feature =, and a spatial distance term =. At each iteration, each superpixel is compared to all pixels , of feature at position , within a region around its barycenter . A pixel is associated to the superpixel minimizing the distance defined as:
with the parameter setting the superpixel shape regularity. A post-processing step finally ensures region connectivity.
Although this method can accurately gather pixels having similar colors, is globally set and cannot adapt to all local image contours. It also highly fails to capture texture patterns, as it only considers feature information at the pixel level.
2.2 Local Adaptation of Superpixel Regularity
For most methods, including [achanta2012, chen2017, giraud2018_scalp],
the regularity parameter must be manually set, according to the dynamic of the feature term .
Hence, default parameters for natural color images may lead model (1) to generate highly irregular clustering on textures, and the post-processing step enforcing connectivity to irrelevantly merge regions
(see Figure 1).
We address this issue by using for each superpixel ,
a regularity parameter defined
the feature variance of all pixels such that:
with a scaling parameter . Such regularity term is able to increase the spatial constraint in the TASP model (5) for superpixels having high feature variances, and to reduce it in smooth areas, so the superpixel boundaries can capture image objects that are perceptible from limited feature variations.
This way, without manually adapting in (2), TASP can compute relevant superpixels on both highly textured images (see Figure 2(b)-top), and natural color ones (see Figure 2(b)-bottom), since (2) automatically adjusts the trade-off between and in (5). Nevertheless, the clustering accuracy still has to be improved to capture texture information.
2.3 Texture Homogeneity Measure
2.3.1 Pixel to Superpixel Patch-based Distance
In this section, we propose a method to measure the texture similarity between a pixel neighborhood and the content of a superpixel, thus between two regions of different sizes. A texture descriptor at patch and superpixel levels would yield higher complexity and additional parameter settings. Moreover, texture cannot be preserved as well as for color and spatial information with a global average over the whole superpixel. The framework must preserve its limited complexity, and to be able to adapt to any image content without any prior information. Such constraints also prevent from using costly dictionary or learning-based approaches.
To address these issues,
we propose a new framework using square patches
to naturally capture texture information. For a pixel , of patch , and a superpixel ,
a nearest neighbor algorithm (see section 2.3.2) is used to find similar patches
such that , and
outside a -neighborhood around (see Figure 3).
The new term computes the average distance
to the selected :
with the set of selected pixels , compared with a patch distance in the feature space, such that , with the patch size.
Any feature can be used in term (3). This way, we propose a general model that can easily evaluate the texture compliance of a pixel neighborhood to a superpixel, while leveraging the need for complex texture classification approaches.
2.3.2 Patch-based Nearest Neighbor Search
The search for similar patches can be performed by any nearest neighbor (NN) method. We choose to use PatchMatch, a fast iterative approximate-NN algorithm based on the propagation of good matches from adjacent neighbors [barnes2009]. The computation of can be directly performed for all pixels in the area around the barycenter of . The algorithm being partly random, patches in can be selected in parallel for each pixel , to increase the robustness of the texture homogeneity term (3).
2.3.3 Texture Unicity within Superpixels
In the texture term (3), the patch similarity is computed regardless of any spatial information. Hence, a pixel to cluster may find similar local textures in restricted areas, leading a superpixel to potentially group several textures (Figure 2(c)).
To ensure the texture unicity within a superpixel ,
we consider in the spatial distance between the selected patches , at position , and , the spatial barycenter of the superpixel such that:
with , a scaling function defined such that . Such term iteratively contributes to restrict the search area and the diversity of textures within by highly penalizing similar patches found far from . Hence, the barycenter is encouraged to move to a homogeneous textured area and to be contained within the superpixel (see Figure 2(d)). This also increases the shape regularity, which is a desirable property [giraud2017_gef].
The clustering distance in TASP is finally computed as:
This way, TASP becomes a very general method, able to handle textures, and efficient on various image types with the exact same parameters, as demonstrated in the next section.
|Initial image||LSC [chen2017]||SNIC [achanta2017superpixels]||SCALP [giraud2018_scalp]||TASP|
3.1 Validation Framework
Similarly to [randen1999filtering], we create two new datasets to evaluate texture segmentation performances. A highly challenging synthetic stripe (mix-Stripes) dataset of 10 images of size pixels, is created by putting stripes similar to the ones in Figure 1, in variable shaped regions of minimum size pixels. Natural textures with normalized intensity are also taken from the Brodatz dataset [brodatz1966], to create 100 composite images (mix-Brodatz), that can contain up to different textures. Finally, we consider the standard Berkeley Segmentation Dataset (BSD) [martin2001], containing natural color test images of size pixels.
Most parameters are empirically set once and for all, and their tuning has a moderate impact on performances. For the patch search, patches of size pixels are selected outside a neighborhood. In the clustering model, is set to and to in (2). Color features are computed as in [giraud2018_scalp]. Finally, the whole clustering process of TASP is performed in iterations as in [achanta2012].
TASP is compared to the recent state-of-the-art methods SLIC [achanta2012], ERGC [buyssens2014], ETPS [yao2015], LSC [chen2017], SNIC [achanta2017superpixels], and SCALP [giraud2018_scalp]. Performances are evaluated with standard Achievable Segmentation Accuracy (ASA), and contour detection metric F-measure (F) as defined in [giraud2017_gef], and we report quantitative results for an average number of superpixels.
3.2 Influence of Contributions
The visual impact of contributions is shown in Figure 2, and ASA and F measures are reported in Table 1 on the three considered datasets. Our new texture homogeneity term (3) and constraint of texture unicity within superpixels (4) both significantly improve performances on each data type.
3.3 Comparison to the State-of-the-Art Methods
A quantitative evaluation is also performed on the three datasets in Table 2. TASP significantly increases the performances on the synthetic (mix-Stripes) and natural (mix-Brodatz) texture datasets, demonstrating its ability to provide texture-aware superpixels. TASP also obtains the best results on natural color images (BSD), using the same parameters, while other methods fail at providing accurate results on the three data types at the same time. Note that compared methods are used with parameters recommended by the authors. Nevertheless, no other approach explicitly captures texture information, so TASP with default parameters still outperform state-of-the-art methods manually optimized for each dataset.
In this paper, we address the severe non-robustness of superpixel approaches to texture images by proposing a texture-aware decomposition method. A new patch-based framework is introduced to gather pixels having both similar color and local textural properties. The proposed method is general, removes the need for manual setting of regularity constraint, and also naturally can extend to generate supervoxels.
We outperform the segmentation of state-of-the-art methods on color, and synthetic and natural texture datasets, by using the same parameters. This work opens the way for larger use of superpixels and efficient application of our approach to medical image segmentation or video object tracking.