Texture-Aware Superpixel Segmentation

01/30/2019 ∙ by Remi Giraud, et al. ∙ 0

Most superpixel methods are based on spatial and color measures at the pixel level. Therefore, they can highly fail to group pixels with similar local texture properties, and need fine parameter tuning to balance the two measures. In this paper, we address these issues with a new Texture-Aware SuperPixel (TASP) segmentation method. TASP locally adjusts its spatial regularity constraint according to the feature variance to accurately segment both smooth and textured areas. A new pixel to superpixel patch-based distance is also proposed to ensure texture homogeneity within created regions. TASP substantially outperforms the segmentation accuracy of state-of-the-art methods on both natural color and texture images.



There are no comments yet.


page 2

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Superpixel segmentation approaches that locally group pixels into regions have become very popular in image processing and computer vision applications. The aim is to exploit the local redundancy of information to lower the computational burden and to potentially improve the performances by reducing the noise of a processing at the pixel level. Superpixels can also be considered as a multi-resolution approach that preserves image contours, contrary to standard regular downsampling methods. It is thus a very interesting pre-processing for applications such as visual saliency estimation

[liu2014superpixel, he2015supercnn], data association across views [sawhney2014], segmentation and classification [gould2014, gadde2016superpixel, giraud2017_spm] and object detection [arbelaez2011, shu2013improving, yan2015object] or tracking [chang2013, reso2013].

For the past years, most superpixel methods have tended to produce equally-sized regions with homogeneous pixels in terms of color. This paradigm is usually in line with the segmentation of a natural image objects, whose contours can be detected by color changes. Hence, to cluster the pixels into regions, state-of-the-art methods such as [liu2011, vandenbergh2012, achanta2012, buyssens2014, achanta2017superpixels], only use distance terms in spatial and color (e.g., CIELab) spaces. In [liu2016manifold, chen2017], more advanced feature spaces are defined to improve the segmentation performances. More recently, [giraud2018_scalp] proposes to consider contour information in the clustering to ensure the respect of the object boundaries. Nevertheless, such framework requires the need for prior contour detection, at the expense of a global higher complexity.

Most of these state-of-the-art methods only use the pixel information as clustering feature. Therefore, they can be severely impacted by high frequency contrast variations and fail to produce equally-sized regions having the same textural properties. The proposed method TASP is compared in Figure 1 to the state-of-the-art approaches on a synthetic texture image. While TASP produces a relevant segmentation, all other methods are highly misleaded by the texture patterns. We used the regularity parameters recommended by the authors for tunable methods [achanta2012, buyssens2014, yao2015, chen2017, giraud2018_scalp], but no other setting would enable to capture texture information. Superpixel methods are indeed generally optimized and evaluated on noise-free natural color images, although specific tasks require to decompose highly textured or low resolution grayscale images, for instance in medical applications [tian2016superpixel].

Initial image ERS [liu2011] SEEDS [vandenbergh2012] SLIC [achanta2012] ERGC [buyssens2014]
ETPS [yao2015] LSC [chen2017] SNIC [achanta2017superpixels] SCALP [giraud2018_scalp] TASP
Figure 1: Comparison of the TASP method to state-of-the-art approaches on a synthetic texture image for superpixels. Only TASP succeeds in capturing the textures while other methods are highly misleaded by high frequency contrast variations.

To overcome the limitations of handcrafted color spaces, deep learning approaches have been proposed

[jampani2018superpixel, liu2018learning]. Nevertheless, the gain obtained with learned features on a training dataset may come at the expense of usual deep learning limitations, i.e., important learning time, need for a substantial training database and material resources, and direct applicability limited to similar images. Moreover, these approaches do not directly extend to supervoxels and prevent from setting the shape regularity, which can highly impact the performances of superpixel-based tasks. It is thus still necessary to increase the robustness of non-deep learning superpixel methods to textures, while preserving their desired properties: adaptability, low complexity and limited parameter settings.

Contributions. In this work, we propose a new Texture-Aware SuperPixel (TASP) clustering method able to accurately segment highly textured images, but also any input image, e.g., natural color ones, using the same parameters.

To be able to generate relevant superpixels on textured images (see Figure 1), TASP adjusts its spatial constraint, according to the feature variance within the superpixel. This way, TASP also addresses the need for fine manual regularity setting. Most recent state-of-the-art methods globally set this parameter according to the image nature, leading default or sub-optimal settings to highly impact the results [giraud2017_gef].

Then, to ensure the texture homogeneity, we introduce a new patch-based framework enabling to easily evaluate the similarity of a pixel neighborhood to a superpixel.

We validate TASP on natural color images from a standard segmentation dataset [martin2001], and on two new datasets proposed to evaluate texture segmentation performances. TASP significantly outperforms the state-of-the-art methods on texture segmentation performances, while performing as well, or better, on natural images, using the same parameters.

2 Texture-Aware SuperPixels

The TASP method improves the superpixel decomposition approach used in [achanta2012, chen2017, giraud2018_scalp], that is first presented in this section. Then, we propose a method to locally set the spatial regularity of superpixels, to automatically adapt to the image content. Finally, we introduce a new pixel to superpixel texture homogeneity measure to group pixels in terms of texture.

2.1 K-means-based Iterative Clustering

The standard framework of [achanta2012]

only requires the number of superpixels to produce and a regularity parameter. The algorithm is based on an iteratively constrained K-means clustering of pixels. Superpixels

are first regularly set over the image domain as blocks of size , and are described by their average intensity feature (CIELab colors for [achanta2012]) and their spatial barycenter of pixels in . The clustering relies on a feature =, and a spatial distance term =. At each iteration, each superpixel is compared to all pixels , of feature at position , within a region around its barycenter . A pixel is associated to the superpixel minimizing the distance defined as:


with the parameter setting the superpixel shape regularity. A post-processing step finally ensures region connectivity.

Although this method can accurately gather pixels having similar colors, is globally set and cannot adapt to all local image contours. It also highly fails to capture texture patterns, as it only considers feature information at the pixel level.

2.2 Local Adaptation of Superpixel Regularity

For most methods, including [achanta2012, chen2017, giraud2018_scalp], the regularity parameter must be manually set, according to the dynamic of the feature term . Hence, default parameters for natural color images may lead model (1) to generate highly irregular clustering on textures, and the post-processing step enforcing connectivity to irrelevantly merge regions (see Figure 1). We address this issue by using for each superpixel , a regularity parameter defined according to the feature variance of all pixels such that:


with a scaling parameter . Such regularity term is able to increase the spatial constraint in the TASP model (5) for superpixels having high feature variances, and to reduce it in smooth areas, so the superpixel boundaries can capture image objects that are perceptible from limited feature variations.

This way, without manually adapting in (2), TASP can compute relevant superpixels on both highly textured images (see Figure 2(b)-top), and natural color ones (see Figure 2(b)-bottom), since (2) automatically adjusts the trade-off between and in (5). Nevertheless, the clustering accuracy still has to be improved to capture texture information.

(a) SLIC [achanta2012] (b) w/ (2) (c) w/ (2), (3) (d) w/ (2), (3), (4)
Figure 2: SLIC [achanta2012] with optimal regularity for color images (a) vs TASP contributions (b)-(d), that accurately decomposes both texture and color images with the same parameters (d).

2.3 Texture Homogeneity Measure

2.3.1 Pixel to Superpixel Patch-based Distance

In this section, we propose a method to measure the texture similarity between a pixel neighborhood and the content of a superpixel, thus between two regions of different sizes. A texture descriptor at patch and superpixel levels would yield higher complexity and additional parameter settings. Moreover, texture cannot be preserved as well as for color and spatial information with a global average over the whole superpixel. The framework must preserve its limited complexity, and to be able to adapt to any image content without any prior information. Such constraints also prevent from using costly dictionary or learning-based approaches.

To address these issues, we propose a new framework using square patches to naturally capture texture information. For a pixel , of patch , and a superpixel , a nearest neighbor algorithm (see section 2.3.2) is used to find similar patches such that , and outside a -neighborhood around (see Figure 3). The new term computes the average distance to the selected :


with the set of selected pixels , compared with a patch distance in the feature space, such that , with the patch size.

Any feature can be used in term (3). This way, we propose a general model that can easily evaluate the texture compliance of a pixel neighborhood to a superpixel, while leveraging the need for complex texture classification approaches.

2.3.2 Patch-based Nearest Neighbor Search

The search for similar patches can be performed by any nearest neighbor (NN) method. We choose to use PatchMatch, a fast iterative approximate-NN algorithm based on the propagation of good matches from adjacent neighbors [barnes2009]. The computation of can be directly performed for all pixels in the area around the barycenter of . The algorithm being partly random, patches in can be selected in parallel for each pixel , to increase the robustness of the texture homogeneity term (3).

Figure 3: Selection of similar patches in a superpixel of barycenter , outside a -neighborhood, to compute the texture homogeneity term (3) for a patch .

2.3.3 Texture Unicity within Superpixels

In the texture term (3), the patch similarity is computed regardless of any spatial information. Hence, a pixel to cluster may find similar local textures in restricted areas, leading a superpixel to potentially group several textures (Figure 2(c)).

To ensure the texture unicity within a superpixel , we consider in the spatial distance between the selected patches , at position , and , the spatial barycenter of the superpixel such that:


with , a scaling function defined such that . Such term iteratively contributes to restrict the search area and the diversity of textures within by highly penalizing similar patches found far from . Hence, the barycenter is encouraged to move to a homogeneous textured area and to be contained within the superpixel (see Figure 2(d)). This also increases the shape regularity, which is a desirable property [giraud2017_gef].

The clustering distance in TASP is finally computed as:


This way, TASP becomes a very general method, able to handle textures, and efficient on various image types with the exact same parameters, as demonstrated in the next section.

Initial image LSC [chen2017] SNIC [achanta2017superpixels] SCALP [giraud2018_scalp] TASP
Figure 4: Comparison between TASP and most recent state-of-the-art methods. TASP produces a more relevant result on a natural texture composite image (top) and on a natural color image example from the BSD (bottom), for superpixels.

3 Results

3.1 Validation Framework

Similarly to [randen1999filtering], we create two new datasets to evaluate texture segmentation performances. A highly challenging synthetic stripe (mix-Stripes) dataset of 10 images of size pixels, is created by putting stripes similar to the ones in Figure 1, in variable shaped regions of minimum size pixels. Natural textures with normalized intensity are also taken from the Brodatz dataset [brodatz1966], to create 100 composite images (mix-Brodatz), that can contain up to different textures. Finally, we consider the standard Berkeley Segmentation Dataset (BSD) [martin2001], containing natural color test images of size pixels.

Most parameters are empirically set once and for all, and their tuning has a moderate impact on performances. For the patch search, patches of size pixels are selected outside a neighborhood. In the clustering model, is set to and to in (2). Color features are computed as in [giraud2018_scalp]. Finally, the whole clustering process of TASP is performed in iterations as in [achanta2012].

TASP is compared to the recent state-of-the-art methods SLIC [achanta2012], ERGC [buyssens2014], ETPS [yao2015], LSC [chen2017], SNIC [achanta2017superpixels], and SCALP [giraud2018_scalp]. Performances are evaluated with standard Achievable Segmentation Accuracy (ASA), and contour detection metric F-measure (F) as defined in [giraud2017_gef], and we report quantitative results for an average number of superpixels.

3.2 Influence of Contributions

The visual impact of contributions is shown in Figure 2, and ASA and F measures are reported in Table 1 on the three considered datasets. Our new texture homogeneity term (3) and constraint of texture unicity within superpixels (4) both significantly improve performances on each data type.

mix-Stripes mix-Brodatz BSD
TASP w/o (3),(4)
TASP w/o (4)
Table 1: Influence of contributions in the TASP method.

3.3 Comparison to the State-of-the-Art Methods

TASP is compared to state-of-the-art approaches on an image similar to the mix-Stripes dataset in Figure 1, and to the most recent methods on mix-Brodatz and BSD images in Figure 4.

A quantitative evaluation is also performed on the three datasets in Table 2. TASP significantly increases the performances on the synthetic (mix-Stripes) and natural (mix-Brodatz) texture datasets, demonstrating its ability to provide texture-aware superpixels. TASP also obtains the best results on natural color images (BSD), using the same parameters, while other methods fail at providing accurate results on the three data types at the same time. Note that compared methods are used with parameters recommended by the authors. Nevertheless, no other approach explicitly captures texture information, so TASP with default parameters still outperform state-of-the-art methods manually optimized for each dataset.

mix-Stripes mix-Brodatz BSD
SLIC [achanta2012]
ERGC [buyssens2014]
ETPS [yao2015]
LSC [chen2017]
SNIC [achanta2017superpixels]
SCALP [giraud2018_scalp]
Table 2: TASP compared to the state-of-the-art methods. Best and second results are respectively bold and underlined.

4 Conclusion

In this paper, we address the severe non-robustness of superpixel approaches to texture images by proposing a texture-aware decomposition method. A new patch-based framework is introduced to gather pixels having both similar color and local textural properties. The proposed method is general, removes the need for manual setting of regularity constraint, and also naturally can extend to generate supervoxels.

We outperform the segmentation of state-of-the-art methods on color, and synthetic and natural texture datasets, by using the same parameters. This work opens the way for larger use of superpixels and efficient application of our approach to medical image segmentation or video object tracking.