I Introduction
Image segmentation is a useful tool to analyze the image content. The goal of segmentation is to decompose the image into meaningful segments, for instance, to separate objects from the background. A segmentation is computed with respect to some priors such as shape, color or texture. To reduce the computational cost, superpixel decomposition methods have been developed for grouping pixels into homogeneous regions, while respecting the image contours (for instance see [1] and references therein). Superpixels are able to drastically decrease the number of elements to process while keeping all the geometrical information that is lost with multiresolution approaches. Small objects disappear at low resolution levels, whereas they can still be represented with one or several superpixels. Nevertheless, superpixels remain underexploited due to their irregular decomposition of the image content.
Many image processing and computer vision methods use reference images. For instance, for labeling applications, these images can be provided with their ground truth segmentation, labels, or semantic information that are used to process the input image. In this context, matching algorithms can be useful to find associations between the considered elements. In most frameworks, patchbased approximate nearest neighbor (ANN) search methods are used to find correspondences. Numerous methods have been proposed to find ANN [2, 3, 4, 5] within the same image, and between an image and one or several reference ones. Among these methods, the PatchMatch (PM) method [2] was designed to compute correspondences between pixelbased patches.
When applying PM to large images, or when looking for ANN in a database, the search for good ANN may require many iterations. Therefore, multiresolution PM [6]
can be considered to initialize the ANN correspondence map. However, as usually observed with such coarsetofine frameworks, details are lost and a poor ANN is estimated for small scale patterns. A regular decomposition of the image could decrease the problem dimension, but it would not respect the object contours, leading to non accurate processing. In this context, the use of superpixels may be interesting to preserve the image geometry and the respect of the image object contours. Local superpixelbased matching models have been proposed for many applications,
e.g., video tracking [7, 8]. However, superpixelbased ANN search algorithms have been little investigated in the literature, and recent works such as [9, 10] that compute superpixel correspondences between the decompositions of two images, use complex models that report prohibitive computational times.Finally, for ANN matching, the neighborhood information greatly helps in finding good correspondences, as demonstrated in the patchbased literature. Therefore, to jointly decrease the number of elements to process, keep the geometrical information, and find accurate matches, it appears necessary to consider superpixels and to describe them using their neighborhoods in a structure that includes spatial information. Nevertheless, the lack of regularity between two superpixel decompositions makes difficult the use of neighborhood for computing relevant correspondences. Some attempts to use superpixel neighborhood information have been proposed [11, 12]. However, these methods are not adapted to the search of ANN, since they perform a regularization on a graph built from superpixel neighbors but do not include the relative spatial information between superpixels in a dedicated structure.
(a)  (b)  (c)  (d) 
(e)  (f)  (g)  (h) 
(a)  (b)  (c)  (d) 
(e)  (f)  (g)  (h) 
Ia Contributions
In this paper, we propose a novel structure of superpixel neighborhood called SuperPatch. Since the superpixel neighborhoods of two superpatches are not necessarily the same (in terms of shape or number of elements), a generic framework for comparing superpatches is introduced. A novel method, called SuperPatchMatch (SPM), that generalizes the PM algorithm [2], is proposed to perform fast and accurate searches of ANN superpatches within images.
To the best of our knowledge, the specific combination of PM with superpixels has been proposed in [14, 15] that propose to match single superpixels using moves similar to PM. For instance in [14], the superpixel features are precomputed using a learned distance metric, while the reported labeling results do not reach the ones of stateoftheart methods. In [16], a more restricted framework is considered for optical flow estimation: PM is used to refine the results within selected superpixel bounding boxes. The purpose of our work is thus completely different since we compare neighborhood of structures defined on irregular image subdomains.
To emphasize the interest of our method, we propose a framework to perform fast segmentation and labeling from an image database. SuperPatchMatch is well adapted to deal with huge and constantly growing databases since no learning phase is required, contrary to most existing approaches based on supervised machine learning
[17, 18], or recent neural network methods
[19, 20]. We apply SuperPatchMatch to the challenging Labeled Faces in the Wild (LFW) database [21], where the goal is to extract hair, face, and background within images decomposed into superpixels, and to the segmentation of tumors on nonregistered Magnetic Resonance Images (MRI). Finally, SuperPatchMatch outperforms, in terms of computational cost and accuracy, stateoftheart methods.Fig. 1 and 2 consider several experiments to demonstrate that superpatches enable to find more reliable superpixel ANN than the ones obtained with single superpixel matching [14]. In Fig. 1, two decompositions are computed on the same image using [1] and [13] (Fig. 1(a) and (b)). The aim is to find the best superpixel match between (a) and (b) in terms of superpixel feature (here norm on normalized color histograms in RGB space). We display the displacement magnitude of matches with optical flow representation (Fig. 1(h)). When matching only superpixels, as in [14]
, many outliers are obtained (Fig.
1(c)), while the matching of superpatches provides very accurate ANN (Fig. 1(d)). The same experiment on a sheared image decomposed with [1] (Fig. 1(e)), provides a uniform displacement (Fig. 1(g)) that indicates relevant superpatch matching, and robustness of the proposed structure to geometrical deformations. In Fig. 2, two different parts of two close textures are combined in Fig. 2(a) and (b), and we represent wrong matched texture with red superpixels in Fig. 2(c) and (d). Finally, in Fig. 2, we show that for a natural image containing texture (Fig. 2(e) and (f)), the combination of color and texture features (histogram of oriented gradients [22]) can provide more accurate matching (Fig. 2(h)).IB Outline
In this paper, we first present related works in Section II. Then, we define the new superpatch structure and a comparison framework between superpatches in Section III. The SuperPatchMatch algorithm is next designed to perform superpixelbased ANN search in Section IV. We further emphasize the interest of our method by proposing in Section V a framework to perform labeling from an image database. Finally, we present experiments of face labeling and segmentation of medical images, and SuperPatchMatch results outperform the ones of stateoftheart methods.
Ii Related Works
Iia Superpixel Methods
Superpixel decomposition approaches try to group the pixels of an image into meaningful homogeneous regions. They were progressively introduced, for instance, from watershed [23] to Quick shift [24] approaches. In the past years, most decomposition methods start from an initial regular grid and refine the superpixel boundaries by computing a tradeoff between color distance and superpixel shape regularity, e.g., [1, 25]. Recently, works such as [26, 27] propose to use gradient and contour information in the process to further increase the superpixel decomposition accuracy with respect to the image content. Finally, the computational cost is considered since superpixels are mainly used as preprocessing, and recent implementations report realtime performances, e.g., [28].
By considering features at the superpixel scale, the computational complexity of computer vision and image processing tasks can be drastically reduced, while still considering the image geometry and content. Superpixels have therefore become key building blocks of many recent image processing and computer vision pipelines such as multiclass object segmentation [29, 30, 31, 32], body model estimation [33], face and hair labeling [18], data associations across views [12], object localization [34] or contour detection [35]. With these considerations, we propose in this work to use the superpixel representation as the basis of our framework.
IiB Including Spatial Information within Image Features
Pixelbased patches enable to describe the pixel neighborhood and to find similar patterns with the same geometric structure. They have progressively proven their efficiency for several applications such as texture synthesis [36] and image denoising [37], and in the design of computer vision descriptors [38, 39] that include spatial information.
Recent works in object retrieval have demonstrated that describing the objects with spatial information enables to reach higher detection accuracy. In [40, 41], ForceHistogram Decomposition descriptors are used to encode the pairwise spatial relations between objects. Deformable part models [42, 43] or adaptive bounding boxes of poselets [44]
have also been successfully applied to image retrieval, segmentation or recognition. Finally, the necessity for including spatial information is also studied in
[45] that investigates fuzzy approaches to define spatial relationships.The superpixel itself is not sufficient to provide a robust image descriptor, since the consistency of its neighborhood is not considered. The superpixel neighborhood has been used in [11] for saliency detection based on energy minimization. For each superpixel, the two first adjacent neighbor rings are used in a regularization term. However, the superpixel features are separately included in a data term, leading to a lack of spatial information consistency. The approach is thus dependent on the superpixel decomposition and poorly robust to very irregular decomposition. Consequently, we propose to go further in this work and to take advantage of the superpixel neighborhood to construct a novel representation, namely the superpatch, that naturally includes spatial information.
IiC Patch Matching Methods
Patchbased methods have demonstrated stateoftheart results over various computer vision and image processing applications such as: texture synthesis [36], denoising [37]
[46]. These approaches rely on the search of ANN, i.e., similar patches. Many methods were proposed to find ANN within the image itself, between two images or in an entire database [2, 3, 4, 5]. When facing huge databases, dimension reduction methods are usually considered to have fast computation of ANN, but they depend on the size of the data. In this context, the PatchMatch (PM) algorithm [2] is an efficient tool to compute ANN. Within an image itself, the found ANN enable to perform several processings such as image retargeting or completion [2]. Nevertheless, PM can also find matches between several images, and easily handles large databases, since its complexity only depends on the size of the image to process, as shown in [47, 48] where the ANN are used for exemplarbased segmentation of 3D medical images.In this work, we introduce the SuperPatchMatch method (SPM), that combines both the advantages of the PM algorithm, and the superpixel decomposition of an image, to compute robust correspondences of superpixels using superpatches. The proposed superpatch structure enables to match similar patterns at the superpixel level since it considers the geometrical information between the contained superpixels, which are described by image features such as color or texture.
Iii Superpatch
Iiia Superpatch Definition
Similarly to a patch of pixels, a superpatch is a patch (a set) of neighboring superpixels. Let be an image, decomposed by any superpixel decomposition method, into superpixels such that , where denotes the cardinality, and for two superpixels , . A superpatch is centered on a superpixel and is composed of its neighboring superpixels such that: , with the spatial barycenter of the pixels contained in . In other words, the superpatch centered on a superpixel is defined by considering all superpixels within a fixed radius . Note that each superpatch contains at least the superpixel . Fig. 3 illustrates the superpatch definition.
For the sake of clarity, we denote = , the index set of superpixels . Each superpixel is described by a set of features . These features can be, for instance, the coordinates of , the mean color, or any superpixel descriptors that can be found in the literature.
IiiB Superpatch Comparison Framework
The comparison between two regular square patches is commonly performed using the sum of squared differences (SSD), computed in a scan order. When considering two superpatches, their number of elements and geometry are generally different, which makes difficult their comparison. In the following, we consider two superpatches and , in different images and . We propose to first register the relative positions of all superpixels within the superpatches. To overlap two superpatches, all positions of superpixels
are registered with the vector
, where and are the spatial barycenters of and , respectively. Contrarily to the classical pixel setting, the number of elements and geometry of two superpatches are likely to differ since their construction depends on the initial superpixel decomposition. Therefore, a registered superpixel can overlap with several superpixels , and this information has to be considered.To compute a distance between irregular structures, such as superpixels, [12] proposes to use the editing distance. However, such distance computes onetoone matching between the structure elements and cannot accurately deal with the overlap of superpixels that requires a onetomany mapping. Another limitation is that it mixes two different information: superpixel similarities and the cost of removing or adding superpixels. Therefore, this distance should be carefully tuned with respect to the considered application. Consequently, to define a relevant metric between superpatches, it is necessary to consider the geometry of the superpatches within the distance. We propose to define the symmetric distance between two superpatches and as:
(1) 
where is the Euclidean distance between the superpixel features and , for instance, average superpixel color or normalized cumulative color histogram, and is a weight depending on the relative position of the superpixels and . Note that we consider an Euclidean distance, but any distance on superpixel features can be computed with .
IiiB1 Fast distance between superpixels
To compute the weight between two superpixels and , we would ideally like to measure their relative overlapping area, i.e., setting . Nevertheless, this computation requires the expensive count of overlapping pixels that cancels the computational advantage of the superpixel representation. A fast method would be to compare a superpixel to the spatially closest , but we propose a more robust framework that considers a spatial distance between the superpixel barycenters. We define the symmetric spatial weight between two superpixels and as:
(2) 
where is the relative distance between and , weights the influence of according to its spatial distance to such that , and and are two scaling parameters. The setting of depends of the superpixel decomposition scale. Since the superpatches have been registered, and the aim is to compare a superpixel to the closest ones , can be set to half the average superpixel size, i.e., half the average distance between superpixel barycenters, such that , for an image of size pixels decomposed into superpixels. Finally, depends on the superpatch size and can be set to to weight the contribution of closest superpixels.
IiiB2 Generalization of pixelbased patches
In the limit case where each superpixel only contains one pixel, i.e., =, =, and have the same regular structure and the same number of elements. With and in (2), if and otherwise, and the proposed distance (1) is a generalization of the distance between patches, since it reduces to a normalized standard SSD between two pixelbased patches:
(3) 
where when is true and otherwise.
Iv SuperPatchMatch
Iva The SuperPatchMatch Algorithm
We propose the SuperPatchMatch method (SPM), an extension of the PatchMatch (PM) algorithm [2] dedicated to our superpatch framework, for fast matching of irregular structures from superpixel decompositions. In this section, only direct adjacent neighborhood relationship needs to be considered to design our algorithm. Nevertheless, as for pixels described by regular patches, we demonstrate in Section V that the proposed framework is significantly more efficient using superpatches. In the following, as in Section IIIB, we illustrate the proposed method by considering two superpatches and , in different images and , but our approach can be applied to an entire database, as demonstrated in Section V.
PM is a method that computes pixelbased patch correspondences between two images. The key point of this method is that good correspondences can be propagated to the adjacent patches within an image. The algorithm has three steps: initialization, propagation and random search. The initialization consists in randomly associating each patch of the image with a patch of the image , leading to an initial ANN field. The following steps are then iteratively performed to improve the correspondences. The propagation uses the assumption that when a patch in corresponds to a patch in , then the adjacent patches in should also match the adjacent patches in . The random search consists in a sampling around the current ANN to escape from possible local minima.
The lack of regular geometry between superpatches is the main issue for adapting the PM algorithm. The notion of adjacent patches has to be defined for an irregular superpixel decomposition. For the sake of clarity, the current best ANN of a superpixel , is denoted as , with the ANN map which stores, for superpixels in , the index in of their corresponding ANN.
IvA1 SPM initialization step
For each superpixel , we assign a random superpixel . Fig. 4 shows initialization examples. After this step, propagation and random search are iteratively performed to improve the initial matches.
IvA2 SPM propagation step
In [2], the propagation step tries to improve the current ANN by using the ones of the directly adjacent neighbors in . Pixels are processed according to a scan order, e.g., from topleft to bottomright. The propagation only considers previously processed and directly adjacent neighbors, e.g., top and left. Their ANN are shifted to respect the relative position of pixels in , providing ANN candidates to the currently processed pixel. With superpixels, the selection of top, left, bottom and right adjacent superpixels is not direct, since there is no regular geometry between them. We propose to define the superpixel scan order from the raw pixel order on (left to right, top to bottom), and to consider in the propagation step all adjacent superpixels that were processed during the current iteration. The selection of candidates from adjacent neighbors is illustrated in Fig. 5.
When an adjacent superpixel and its ANN are considered, a neighbor of is selected as a candidate to improve the correspondence of . However, the ANN cannot be shifted as done for regular patches, since the superpixels are defined on irregular domains. Therefore, to improve the ANN of , one particular neighbor of , denoted as is tested. It is given by the superpixel whose relative position to is the most similar to , the angle between and . Hence, the ANN candidate to test is obtained as:
(4) 
with the angle between and its set of adjacent superpixels . Note that all angles are computed from superpixel barycenters. The selection of the candidate for , which is on topleft position of , is illustrated in Fig. 6. To keep the same relative position, the new superpixel to test as ANN for is the neighbor of that is the closest to its bottomright position, according to the angle .
IvA3 SPM random search step
The random search step consists in a sampling around the current ANN to escape from possible local minima [2]. Candidates are selected at an exponentially decreasing distance from the barycenter of the best current match. A random pixel position is computed within decaying boxes, and the superpixel containing this pixel is the candidate to test. Fig. 7 illustrates the random search step, where the boxes are depicted in dotted lines.
IvB Library of Training Images
One advantage of the SPM algorithm is that its complexity only depends on the size of the image to process and not on the size of the compared image database. This important fact enables SPM to perform fast ANN searches within a large database with no increase on the computational time.
All example images within the database are grouped into a single library . In the case of a training database, SPM steps are adapted so the ANN can be found within all images. The initialization is extended: the ANN is randomly selected within . The propagation step still tests the shifted ANN of the neighbors, that are not necessarily in the same training image. Finally, the random search is performed within the current best image, and within a random image in , as in [47].
IvC Multiple SPM
Contrary to PM, that only estimates one ANN, SPM computes ANN matches in the library , since the diversity of information from multiple ANN may help to perform more accurate processing. In the literature, an extension of the original PM algorithm to the ANN case has been proposed in [6]. The suggested strategy is to build a constantly updated data structure of the best visited correspondences. However, to parallelize such an approach, the current test image must be split into several parts which leads to boundary issues. Therefore, we chose to implement the ANN search by fully independent SPM, leading to a simpler scheme.
V Application to Image Labeling
To demonstrate the interest of the superpatch structure and the SPM algorithm, we adapt our approach to exemplarbased labeling. We consider two experiments: face labeling on the LFW dataset [21], and segmentation and labeling on nonregistered medical images from the BRATS dataset [49].
Va Label Fusion Method
The proposed algorithm is particularly interesting for labeling applications. The superpixel decomposition segments the image into homogeneous regions that try to respect existing contours, and SPM finds superpixelbased correspondences whose labels can be transfered. In this application, a library of training images with their label ground truths is considered, and SPM provides ANN matches. We denote as the label of the training superpixel contained in . The labels of the selected ANN within are merged by a patchbased label fusion [50], inspired from [37].
At the end of the ANN search, ANN are estimated for all superpixels in the test image . To obtain the final labeling, for a superpixel and the set of its ANN matches with label , its label fusion map is defined by:
(5) 
where is the weight contributing to label , and depends on the similarity between the superpatch , and the ANN superpatch . This label map
gives the probability of assigning the label
to the superpixel .Some applications can also deal with registered images, where structures of interest between and images of are spatially close. Therefore, good superpatch matches should not be spatially too far in the image domain. In this case, to enforce the spatial coherency of the selected ANN, each labeling contribution is weighted by the spatial distance between the central superpixels barycenters and :
(6) 
where , with , and and are scaling parameters. With the function , the distance of the current contribution is divided by the minimal distance among all ANN contributions. For each superpixel , the final labeling map is obtained with the label of highest probability:
(7) 
The relation (7) gives a superpixelwise decision that may have some irregularities. As in [11], we can use (5) as a multilabel data term and consider the following regularization problem, that consists in minimizing the energy , defined on the graph built from adjacent superpixels:
(8) 
where is a regularization parameter, the data term is close to (respectively ) when the probability of label is high (respectively low), and when and otherwise.
(a)  (b)  (c)  (d)  (e) 
VB Face Labeling Experiments
Face segmentation and labeling are challenging tasks due to several issues such as the diversity of hair styles, background, color skins, or occlusions. We evaluate the proposed SPM approach for face labeling on the funneled version of the Labeled Faces in the Wild (LFW) dataset [21]. The dataset contains 2927 images of size pixels, that have been coarsely aligned [51], and segmented into 225 to 250 superpixels. LFW is a widely used database for validating new methods based on superpixels since it contains decompositions with associated superpixelwise ground truths, and comparisons with stateoftheart methods are not biased by the ground truth superpixel decomposition one would have to compute.
VB1 Parameter settings
SPM was implemented with MATLAB using CMEX code. Our experiments are performed on a standard Linux server of 16 cores at 2.6 GHz with 100 GB of RAM. To compare to [18, 19], we use the same 1 500 training images, and the same 927 images for testing. Nevertheless, we could use all images in a leaveoneout procedure since our method does not need any training step.
The number of SPM iterations is set to 5, as in [2]. We only use a norm between histogram of oriented gradients (HoG) [22] as distance in (1). In Eq. (2), since the images are pixels, and decomposed into approximately superpixels, is set to . In Eq. (6), parameters and are respectively set to 2 and 4. Finally, we set to 0.5 and use the expansion algorithm [52] to minimize (8). The reported times for SPM in Fig. 8 (b) include ANN searches, label fusion and the complete labeling with regularization.
VB2 Influence of the superpatch size
We first investigate the influence of the superpatch size and number of ANN. Fig. 8 represents the superpixelwise labeling accuracy and computational time. The labeling accuracy is increased with our superpatch structure. Best results are obtained with = pixels ( with = ANN). Such superpatch size corresponds in average to the capture of the three neighboring rings of superpixels, since superpixels are approximately of size pixels. Fig. 8 also represents the corresponding ROC curves obtained with = ANN for the three classes (face, background, hair). Without the superpatch structure, i.e., only computing the distance on central superpixels (= pixels), worse ANN are found, decreasing the labeling accuracy ( with = ANN). The superpatch size must be large enough to capture the information contained within the superpixel neighborhood. However, with too large superpatches, i.e., pixels, too many neighboring superpixels contribute, leading to less relevant ANN and less accurate labeling. Note that we propose in (6) a slight improvement of the label fusion step to take into account the LFW database registration but we obtain very comparable results ( instead of ) without any position a priori, i.e., = in (6).
Fig. 9 illustrates the regularization process. Labeling probabilities (5) obtained from SPM are displayed for each label (Fig. 9(d)). The spatial regularization (8) gives more consistent results (Fig. 9(f)) than taking the label of highest probability (Fig. 9(e)). Finally, Fig. 10 shows the superpatch influence on labeling for various examples. Labeling failures are mostly due to high similarity between hair and background, or inaccurate superpixel segmentation.
(a)  (b)  (c)  (d) Labeling probabilities  (e)  (f)  
Image  Superpixels  Ground truth  Face  Background  Hair  Highest probability  Regularization 
Superpixels  Ground truth  SPM =  SPM = 

Superpixels  Ground truth  SPM =  SPM = 

Superpixels  Ground truth  SPM =  SPM = 

Method  Superpixel  Pixel  Computational 

accuracy  accuracy  time  
PatchMatch  s  
Spatial CRF [18]  not reported  not reported  
CRBM [18]  not reported  not reported  
GLOC [18]  not reported  s  
DCNN [19]  not reported  not reported  
SuperPatchMatch  s 

Computational times are given per subject. SPM results are obtained with = ANN, and =
pixels. The presented values are the published results, therefore, some evaluation metrics could not be reported.
VB3 Comparison with the stateoftheart methods
SPM is compared to the recent methods applied to the LFW database in Table I. In [18]
, the GLOC (GLObal and LOCal) method uses a restricted Boltzmann machine as complement of a conditional random field labeling
[53]. This combination reduces the error in face labeling of single models which do not use global shape priors, at the expense of a higher computational cost. In [19], a method based on a deep convolutional neural network (DCNN) is proposed. For all compared methods, learning steps, that can be up to several hours, are necessary to train the models. Moreover, they consider priors learned from semantic information into the process,
e.g., hair label should be on top of face label in the segmentation. We also provide the results of a pixelwise PM applied with the same framework, where a SSD between patches of size pixels in RGB color space is used as distance.To compare to all methods, we provide in Table I superpixel and pixelwise accuracy results. The presented values are the results published by the authors, therefore, all the evaluation metrics could not be reported. SPM superpixelwise labeling accuracy outperforms the ones of the compared methods (), while being performed on basic features, and faster (s per subject) than the best compared method with reported computational time. The pixelwise accuracy of SPM () also outperforms the reported result of the DCNN architecture [19], that has been optimized to perform on the LFW dataset. Note that the increase of pixel accuracy over superpixel accuracy demonstrates that our method mostly fails at labeling small and stretched superpixels. This comes from the initial LFW segmentation that may produce inaccurate color clustering and allows irregular superpixel shapes.
The global computational time is another important comparison point. SPM outperforms the compared methods in term of labeling accuracy without any training step. Contrary to other methods, with SPM, computational efforts needed for learning are canceled, and new training images are directly considered in the library. To illustrate this point, for each processed image, we add the remaining test ones to the library. This way, SPM reaches of superpixelwise labeling accuracy. This result highlights the impact of the image diversity within the database, which leads to find more accurate ANN. Moreover, results are obtained with no computational time increase, since the algorithm complexity only depends on the test image size. Hence, SPM easily integrates new images in the database, and provides very competitive results in limited computational time, without model or shape priors.
VB4 Robustness to superpixel decomposition method
To emphasize the robustness of our method to the used superpixel method, we have segmented the test images with another method [1] that produces more regular superpixels (see an example in Fig. 11). The new decompositions are computed with respect to the ground truth label mask of each image. Hence, they are still constrained by the initial segmentations provided with LFW but only on the edges of each class (hair, face, background). Even with test and training decompositions computed with different methods, we get similar superpixelwise labeling accuracy (), showing that our method can compare superpixel neighborhoods of various shapes.
(a)  (b)  (c)  (d) 

VC NonRegistered MRI Segmentation Experiments
To demonstrate the robustness of the superpatch structure and the proposed framework, we apply SPM to brain tumor segmentation on multimodal nonregistered Magnetic Resonance Images (MRI). Classical patchbased and multiatlas structure segmentation methods are based on registered subjects. Consequently, they cannot be efficiently applied in this nonregistered context, due to the substantial variation in tumor shape and locations. Superpixels enable to better capture the tumor geometry, thus increasing the segmentation accuracy. Superpixel and supervoxelbased approaches have been applied to tumor segmentation [54]. However, in this work, the neighborhood is not considered and the ANN search is exhaustive, and computed on a large multimodal histogram descriptor, leading to prohibitive computational time.
SPM can be efficiently applied to tumor segmentation since it quickly finds good correspondences without image registration, and uses the superpixel neighborhood to improve the matching. In this application, the segmentation is computed from a superpixel decomposition [1], then each region (tumor or background) is labeled with SPM.
We present results obtained on the MICCAI multimodal Brain Tumor Segmentation (BRATS) dataset [49]. This challenging dataset contains real and simulated patient data, with overall poor resolution and large variation of tumor shape and position. For both types, high grade (HG) and low grade (LG) tumors are provided with four modalities: T1, contrast enhanced T1 (T1C), T2, and FLAIR. Overall, there are 20 and 10 real patient data with respectively HG and LG tumors, and 25 images for both HG and LG simulated tumor data. We use the same SPM parameters as in Section VB, taking a multimodal histogram, containing the levels of gray intensity on all MRI modalities as descriptor for superpatch matching, and performing the regularization (8) at the pixel scale to compare with pixelwise ground truths. Each subject is segmented by the remaining of its type in a leaveoneout procedure.
Method  Simulated Data  Real Data  Computational  

HG  LG  HG  LG  time  
Superpixelbased  s  
Patchbased  s  
Superpatchbased  s 
Simulated HG  

FLAIR  T1  T1C  T2  
Ground truth  Patchbased  SPM =  SPM = 
Simulated LG 


Real HG 

Real LG 

FLAIR  Ground truth  Patchbased  SPM =  SPM = 
In Fig. 12, we show several tumor segmentation results for all data types.
In Table II, we compare results obtained using different descriptor structures:
patchbased [48], superpixelbased [54], and superpatchbased (= pixels).
We use the Dice coefficient [55] as evaluation metric, measuring the overlap between the automatically segmented structure and the ground truth.
The superpixelbased approach appears very limited since it fails at capturing the tumor context and their location in other images.
Regular patches are also limited in this context, due to the variations in the structure shapes.
Superpatches provide a robust descriptor, since they follow image intensities and capture the superpixel neighborhood,
leading to more accurate segmentation.
These experiments demonstrate that superpatches within the SPM framework provide fast and accurate segmentation results
even on nonregistered multimodal images with poor resolution.
Vi Conclusion and Perspectives
In this paper, we propose a new structure based on patches of superpixels that can use irregular and non stable image decompositions. These superpatches include neighborhood information and lead to more accurate matching. We also introduce SuperPatchMatch, a general and novel correspondence algorithm of superpatches.
We have demonstrated the interest of our framework by obtaining stateoftheart results for face labeling and tumor segmentation on nonregistered MRI. SuperPatchMatch does not need any learning phase, that can be up to several hours for many methods of the literature. By including spatial consistency, superpatches are able to reach the accuracy of highly tuned approaches, and provide more reliable descriptors than single superpixels.
Our work opens new insights for future adaptations to superpixelbased methods, e.g., segmentation [34, 43], labeling [14], saliency detection [11], or color and style transfer [10]. For instance, SuperPatchMatch can be considered for defining good ANN initializations at the pixel level, when the size of the database is too large. A possible application is the optical flow initialization, instead of mutliresolution schemes, to better capture large displacements of small objects.
References
 [1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, “SLIC superpixels compared to stateoftheart superpixel methods,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 11, pp. 2274–2282, 2012.
 [2] C. Barnes, E. Shechtman, A. Finkelstein, and D. Goldman, “PatchMatch: A randomized correspondence algorithm for structural image editing,” ACM Trans. Graph., vol. 28, no. 3, 2009.
 [3] M. Muja and D. G. Lowe, “Fast approximate nearest neighbors with automatic algorithm configuration,” vol. 2, 2009, pp. 331–340.
 [4] S. Korman and S. Avidan, “Coherency sensitive hashing,” in Proc. IEEE ICCV, 2011, pp. 1607–1614.
 [5] I. Olonetsky and S. Avidan, “TreeCANN  kd tree coherence approximate nearest neighbor algorithm,” in Proc. ECCV, 2012, pp. 602–615.
 [6] C. Barnes, E. Shechtman, D. B. Goldman, and A. Finkelstein, “The generalized PatchMatch correspondence algorithm,” in Proc. ECCV, 2010, pp. 29–43.
 [7] S. Wang, H. Lu, F. Yang, and M. H. Yang, “Superpixel tracking,” in Proc. IEEE ICCV, 2011, pp. 1323–1330.
 [8] M. Reso, J. Jachalsky, B. Rosenhahn, and J. Ostermann, “Temporally consistent superpixels,” in Proc. IEEE ICCV, 2013, pp. 385–392.
 [9] J. Rabin, S. Ferradans, and N. Papadakis, “Adaptive color transfer with relaxed optimal transport,” in Proc. IEEE ICIP, 2014, pp. 4852–4856.
 [10] J. Liu, W. Yang, X. Sun, and W. Zeng, “Photo stylistic brush: Robust style transfer via superpixelbased bipartite graph,” arXiv preprint arXiv:1606.03871, 2016.
 [11] S.C. Pei, W.W. Chang, and C.T. Shen, “Saliency detection using superpixel belief propagation,” in Proc. IEEE ICIP, 2014, pp. 1135–1139.
 [12] R. Sawhney, F. Li, and H. I. Christensen, “GASP: Geometric association with surface patches,” in Proc. 3DV, 2014, pp. 107–114.
 [13] P. Buyssens, M. Toutain, A. Elmoataz, and O. Lézoray, “Eikonalbased vertices growing and iterative seeding for efficient graphbased segmentation,” in Proc. IEEE ICIP, 2014, pp. 4368–4372.
 [14] S. Gould, J. Zhao, X. He, and Y. Zhang, “Superpixel graph label transfer with learned distance metric,” in Proc. ECCV, 2014, pp. 632–647.
 [15] J. Zheng and Z. Li, “Superpixel based patch match for differently exposed images with moving objects and camera movements,” in IEEE ICIP, 2015, pp. 4516–4520.
 [16] J. Lu, H. Yang, D. Min, and M. N. Do, “PatchMatch filter: Efficient edgeaware filtering meets randomized search for fast correspondence field estimation,” in Proc. IEEE CVPR, 2013, pp. 1854–1861.
 [17] X. He, R. Zemel, and M. CarreiraPerpiñán, “Multiscale conditional random fields for image labeling,” in Proc. IEEE CVPR, vol. 2, 2004.
 [18] A. Kae, K. Sohn, H. Lee, and E. LearnedMiller, “Augmenting CRFs with Boltzmann machine shape priors for image labeling,” in Proc. IEEE CVPR, 2013, pp. 2019–2026.
 [19] S. Liu, J. Yang, C. Huang, and M. Yang, “Multiobjective convolutional learning for face labeling,” in Proc. IEEE CVPR, 2015, pp. 3451–3459.
 [20] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE CVPR, 2015, pp. 3431–3440.

[21]
G. Huang, M. Ramesh, T. Berg, and E. LearnedMiller, “Labeled faces in the wild: A database for studying face recognition in unconstrained environments,”
Tech. Rep. 0749, Univ. of Massachusetts, Amherst, vol. 1, no. 2, 2007.  [22] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE CVPR, 2005, pp. 886–893.
 [23] L. Vincent and P. Soille, “Watersheds in digital spaces: An efficient algorithm based on immersion simulations,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, no. 6, pp. 583–598, 1991.
 [24] A. Vedaldi and S. Soatto, “Quick shift and kernel methods for mode seeking,” in Proc. ECCV, 2008, pp. 705–718.

[25]
Z. Li and J. Chen, “Superpixel segmentation using linear spectral clustering,” in
Proc. IEEE CVPR, 2015, pp. 1356–1363.  [26] V. Machairas, M. Faessel, D. CárdenasPeña, T. Chabardes, T. Walter, and E. Decencière, “Waterpixels,” IEEE Trans. Image Process., vol. 24, no. 11, pp. 3707–3716, 2015.
 [27] R. Giraud, V.T. Ta, and N. Papadakis, “SCALP: Superpixels with contour adherence using linear path,” in Proc. ICPR, 2016, pp. 2374–2379.
 [28] Z. Ban, J. Liu, and J. Fouriaux, “GLSC: LSC superpixels at over 130 FPS,” J. RealTime Image Process., pp. 1–12, 2016.
 [29] S. Gould, J. Rodgers, D. Cohen, G. Elidan, and D. Koller, “Multiclass segmentation with relative location prior,” Int. J. Comput. Vis., vol. 80, no. 3, pp. 300–316, 2008.
 [30] J. Tighe and S. Lazebnik, “SuperParsing: Scalable nonparametric image parsing with superpixels,” in Proc. ECCV, 2010, pp. 352–365.
 [31] Y. Yang, S. Hallman, D. Ramanan, and C. Fowlkes, “Layered object detection for multiclass segmentation,” in Proc. IEEE CVPR, 2010, pp. 3113–3120.
 [32] M. Mostajabi, P. Yadollahpour, and G. Shakhnarovich, “Feedforward semantic segmentation with zoomout features,” in Proc. IEEE CVPR, 2015, pp. 3376–3385.
 [33] G. Mori, “Guiding model search using segmentation,” in Proc. IEEE ICCV, 2005, pp. 1417–1423.
 [34] B. Fulkerson, A. Vedaldi, and S. Soatto, “Class segmentation and object localization with superpixel neighborhoods,” in Proc. IEEE ICCV, 2009, pp. 670–677.
 [35] P. Arbeláez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection and hierarchical image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 898–916, 2011.
 [36] A. Efros and T. Leung, “Texture synthesis by nonparametric sampling,” in Proc. IEEE ICCV, 1999, pp. 1033–1038.
 [37] A. Buades, B. Coll, and J.M. Morel, “A nonlocal algorithm for image denoising,” in Proc. IEEE CVPR, 2005, pp. 60–65.
 [38] D. G. Lowe, “Distinctive image features from scaleinvariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.
 [39] H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded up robust features,” in Proc. ECCV, 2006, pp. 404–417.
 [40] M. Garnier, T. Hurtut, and L. Wendling, “Object description based on spatial relations between levelsets,” in Proc. DICTA, 2012, pp. 1–7.
 [41] M. Clément, M. Garnier, C. Kurtz, and L. Wendling, “Color object recognition based on spatial relations between image layers,” in Proc. VISAPP, 2015, pp. 427–434.
 [42] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part based models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1627–1645, 2010.
 [43] E. Trulls, S. Tsogkas, I. Kokkinos, A. Sanfeliu, and F. MorenoNoguer, “Segmentationaware deformable part models,” in Proc. IEEE CVPR, 2014, pp. 168–175.
 [44] G. Sharma, F. Jurie, and C. Schmid, “Expanded parts model for human attribute and action recognition in still images,” in Proc. IEEE CVPR, 2013, pp. 652–659.
 [45] I. Bloch, “Fuzzy spatial relationships for image processing and interpretation: A review,” Image and Vision Comp., vol. 23, no. 2, pp. 89–110, 2005.
 [46] W. Freeman, T. Jones, and E. Pasztor, “Examplebased superresolution,” IEEE Trans. Comp. Graph. App., vol. 22, no. 2, pp. 56–65, 2002.
 [47] W. Shi, J. Caballero, C. Ledig, X. Zuang, W. Bai, K. Bhatia, A. Marvao, T. Dawes, D. O’Regan, and D. Rueckert, “Cardiac image superresolution with global correspondence using multiatlas PatchMatch,” in Proc. MICCAI, 2013, pp. 9–16.
 [48] R. Giraud, V.T. Ta, N. Papadakis, J. V. Manjón, D. L. Collins, P. Coupé, and the Alzheimer’s Disease Neuroimaging Initiative, “An optimized PatchMatch for multiscale and multifeature label fusion,” NeuroImage, vol. 124, pp. 770–782, 2016.
 [49] B. H. Menze, A. Jakab, S. Bauer, J. KalpathyCramer, K. Farahani, J. Kirby, Y. Burren, N. Porz, J. Slotboom, R. Wiest et al., “The multimodal brain tumor image segmentation benchmark (BRATS),” IEEE Trans. Med. Imaging, vol. 34, no. 10, pp. 1993–2024, 2015.
 [50] P. Coupé, J. V. Manjón, V. Fonov, J. Pruessner, M. Robles, and D. L. Collins, “Patchbased segmentation using expert priors: application to hippocampus and ventricle segmentation,” NeuroImage, vol. 54, no. 2, pp. 940–954, 2011.
 [51] G. Huang, V. Jain, and E. LearnedMiller, “Unsupervised joint alignment of complex images,” in Proc. IEEE ICCV, 2007, pp. 1–8.
 [52] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 11, pp. 1222–1239, 2001.
 [53] J. Lafferty, A. McCallum, and F. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” in Proc. ICML, 2001, pp. 282–289.
 [54] H. Wang and P. A. Yushkevich, “Multiatlas segmentation without registration: A supervoxelbased approach,” in Proc. MICCAI, 2013, pp. 535–542.
 [55] A. Zijdenbos, B. Dawant, R. Margolin, and A. Palmer, “Morphometric analysis of white matter lesions in MR images: method and validation,” IEEE Trans. Med. Imaging, vol. 13, no. 4, pp. 716–724, 1994.
Comments
There are no comments yet.