Superpixel Segmentaion Using Gaussian Mixture Model
Superpixel segmentation algorithms are to partition an image into perceptually coherence atomic regions by assigning every pixel a superpixel label. Those algorithms have been wildly used as a preprocessing step in computer vision works, as they can enormously reduce the number of entries of subsequent algorithms. In this work, we propose an alternative superpixel segmentation method based on Gaussian mixture model (GMM) by assuming that each superpixel corresponds to a Gaussian distribution, and assuming that each pixel is generated by first randomly choosing one distribution from several Gaussian distributions which are defined to be related to that pixel, and then the pixel is drawn from the selected distribution. Based on this assumption, each pixel is supposed to be drawn from a mixture of Gaussian distributions with unknown parameters (GMM). An algorithm based on expectation-maximization method is applied to estimate the unknown parameters. Once the unknown parameters are obtained, the superpixel label of a pixel is determined by a posterior probability. The success of applying GMM to superpixel segmentation depends on the two major differences between the traditional GMM-based clustering and the proposed one: data points in our model may be non-identically distributed, and we present an approach to control the shape of the estimated Gaussian functions by adjusting their covariance matrices. Our method is of linear complexity with respect to the number of pixels. The proposed algorithm is inherently parallel and can get faster speed by adding simple OpenMP directives to our implementation. According to our experiments, our algorithm outperforms the state-of-the-art superpixel algorithms in accuracy and presents a competitive performance in computational efficiency.READ FULL TEXT VIEW PDF
Superpixel Segmentaion Using Gaussian Mixture Model
Partitioning image into superpixels can be used as a preprocessing step for complex computer vision tasks, such as segmentation [1, 2, 3], visual tracking , image matching [5, 6], etc. Sophisticated algorithms benefit from working with superpixels, instead of just pixels, because superpixels reduce input entries and enable feature computation on more meaningful regions.
Like many terminologies in computer vision, there is no rigorous mathematical definition for superpixel. The commonly accepted description of a superpixel is “a group of connected, perceptually homogeneous pixels which does not overlap any other superpixel.” For superpixel segmentation, the following properties are generally desirable.
Prop. 1. Accuracy. Superpixels should adhere well to object boundaries. Superpixels crossing object boundaries arbitrarily may lead to bad or catastrophic result for subsequent algorithms. [7, 8, 9, 10]
Prop. 2. Regularity. The shape of superpixels should be regular. Superpixels with regular shape make it easier to construct a graph for subsequent algorithms. Moreover, these superpixels are visually pleasant which is helpful for algorithm designers’ analysis. [11, 12, 13]
Prop. 3. Similar size. Superpixels should have a similar size. This property enables subsequent algorithms to deal with each superpixel without bias [14, 15, 16]. As pixels have the same “size” and the term of “superpixel” is originated from “pixel”, this property is also reasonable intuitively. This is a key property to distinguish between superpixel and other over-segmented regions.
Under the constraint of Prop. 3, the requirements on accuracy and regularity are to a certain extent oppositional. Intuitively, if a superpixel, with a limited size, needs to adhere well to object boundaries, the superpixel has to adjust its shape to that object which may be irregular. A satisfactory compromise between regularity and accuracy has not yet been found by existing superpixel algorithms. As four typical algorithms shown in Fig. 6LABEL:sub@fig:vc5:NC-6LABEL:sub@fig:vc5:ERS, the shape of superpixels generated by NC [17, 18] (Fig. 6LABEL:sub@fig:vc5:NC) and LRW  (Fig. 6LABEL:sub@fig:vc5:LRW) is more regular than that of superpixels extracted by SEEDS  (Fig. 6LABEL:sub@fig:vc5:SEEDS) and ERS  (Fig. 6LABEL:sub@fig:vc5:ERS). Nonetheless, the superpixels generated by SEEDS  and ERS  adhere object boundaries better than those of NC  and LRW . In this work, A Gaussian mixture model (GMM) and an algorithm derived from the expectation-maximization algorithm  are built. It turns out the proposed method can strike a balance between regularity and accuracy. An example is displayed in Fig. 6LABEL:sub@fig:vc5:GMMSP, the compromise is that superpixels at regions with complex textures have an irregular shape to adhere object boundaries, while at homogeneous regions, the superpixels are regular.
Computational efficiency is a matter of both algorithmic complexity and implementation. Our algorithm has a linear complexity with respect to the number of pixels. As an algorithm has to read all pixels, linear time theoretically is the best time complexity for superpixel problem. Generally, algorithms can be categorized into two major groups: parallel algorithms that are able to be implemented with parallel techniques and its performance scales with the number of parallel processing units, and serial algorithms whose implementations are usually executed sequentially and only part of the system resources can be used on a parallel computer. Modern computer architectures are parallel and applications can benefit from parallel algorithms because parallel implementations generally run faster than serial implementations for the same algorithm. The proposed algorithm is inherently parallel and our serial implementation can easily achieve speedups by adding few simple OpenMP directives.
The proposed method is constructed by associating each superpixel to one Gaussian distribution; modeling each pixel with a mixture of Gaussian distributions, which are related to the given pixel; and estimating unknown parameters in the proposed mixtures via an approach modified from the expectation-maximization algorithm; The superpixel of a pixel is determined by a post probability. The proposed approach was tested on the Berkeley Segmentation Data Set and Benchmarks 500 (BSDS500) . It is shown that the proposed method outperforms state-of-the-art methods in accuracy and presents a competitive performance in computational efficiency. Our main contributions are summarized as follows:
Our model is novel for superpixel segmentation, as GMM has not yet been well explored for the superpixel problem.
We present a pixel-related GMM for each individual pixel, in which case pixels may be non-identically distributed, meaning that two pixels may have different GMMs.
The proposed algorithm offers an option for controlling the regularity of superpixel shapes.
Our algorithm is a parallel algorithm.
The proposed approach give a better accuracy than state-of-the-art algorithms.
Our method strike a balance between superpixel regularity and accuracy (see Fig. 6LABEL:sub@fig:vc5:GMMSP).
The concept of superpixel was first introduced by Xiaofeng Ren and Jitendra Malik in 2003 . During the last decades, the superpixel problem has been well studied[22, 23]. Existing superpixel algorithms extract superpixels either by optimizing superpixel boundaries, such as finding paths and evolving curves, or by grouping pixels, e.g. the most well-known SLIC . We will give a brief review on how existing algorithms solve the superpixel problem in the two aspects in this section.
Optimize boundaries. Algorithms extract superpixels not by labeling pixels directly but by marking superpixel boundaries, or by only updating the label of pixels on superpixel boundary is in this category. Rohkohl et al. present a superpixel method that iteratively assigns superpixel boundaries to their most similar neighboring superpixel . A superpixel is represented with a group of pixels that are randomly selected from that superpixel. The similarity between a pixel and a super-pixel is defined as the average similarities from the pixel to all the selected representatives. Aiming to extract lattice-like superpixels, or “superpixel lattices”,  partitions an image into superpixels by gradually adding horizontal and vertical paths in strips of a pre-computed boundary map. The paths are formed by two different methods: s-t min-cut and dynamic programming. The former finds paths by graph cuts and the latter constructs paths directly. The paths have been designed to avoid parallel paths crossing and guarantee perpendicular paths cross only once. The idea of modeling superpixel boundaries as paths (or seam carving ) and the use of dynamic programming were borrowed by later variations or improvements [26, 27, 28, 29, 30, 31]. In TurboPixels , Levinshtein et al. model the boundary of each superpixel as a closed curve. So, the connectivity is naturally guaranteed. Based on level-set evolution, the curves gradually sweep over the unlabeled pixels to form superpixels under the constraints of two velocities. In VCells 
, a superpixel is represented as a mean vector of color of pixels in that superpixel. With the designed distance, VCells iteratively updates superpixel boundaries to their nearest neighboring superpixel. The iteration stops when there are no more pixels need to be updated. SEEDS [32, 8] exchanges superpixel boundaries using a hierarchical structure. At the first iteration, the biggest blocks on superpixel boundary are updated for a better energy. The size of pixel blocks becomes smaller and smaller as the number of iterations increases. The iteration stops after the update of boundary exchanges in pixel level. Improved from SLIC ,  and  present more complex energy. To minimize their corresponding energy,  and  update boundary pixels instead of assigning a label for all pixels in each iteration. Based on ,  adds the connectivity and superpixel size into their energy. For the pixel updating,  uses a hierarchical structure like SEEDS , while  exchanges labels only in pixel level. Zhu et al. propose a speedup of SLIC  by only moving unstable boundary pixels, the label of which changed in the previous iteration . Besides, based on pre-computed line segments or edge maps of the input image,  and  extract superpixels by aligning superpixel boundaries to the lines or the edges.
. Superpixels algorithms that assign labels for all pixels in each iteration is in this category. With an affinity matrix constructed based on boundary cue, the algorithm developed in , which is usually abbreviated as NC , uses normalized cut  to extract superpixels. In Quick shift (QS) , the pixel density is estimated on a Parzen window with a Gaussian kernel. A pixel is assigned to the same group with its parent which is the nearest pixel with a greater density and within a specified distance. QS does not guarantee connectivity, or in other words, pixels with the same label may not be connected. Veksler et al. propose an approach that distributes a number of overlapping square patches on the input image and extracts superpixels by finding a label for each pixel from patches that cover the present pixel . The expansion algorithm in  is gradually adapted to modify pixel label within local regions with a fixed size in each iteration. A similar solution in  is to formulate the superpixel problem as a two-label problem and build an algorithm through grouping pixels into vertical and horizontal bands. By doing this, pixels in the same vertical and horizontal group form a superpixel. Starting from an empty graph edge set, ERS  sequentially adds edges to the set until the desired number of superpixels is reached. At each adding, ERS  takes the edge that results in the greatest increase of an objective function. The number of generated superpixels is exactly equal to the desired number. SLIC  is the most well-known superpixel algorithm due to its efficiency and simplicity. In SLIC , a pixel corresponds to a five dimensional vector including color and spatial location, and -means is employed to cluster those vectors locally, i.e. each pixel only compares with superpixels that fall into a specified spatial distance and is assigned to the nearest superpixel. Many variations follow the idea of SLIC in order to either decrease its run-time [41, 42, 43] or improve its accuracy [44, 33]. LSC  also uses a -means method to refine superpixels. Instead of directly using the 5D vector used in SLIC , LSC [10, 45] maps them to a feature space and a weighted -means is adopted to extract superpixels. Based on marker-based watershed transform,  and  incorporate spatial constraints to an image gradient in order to produce superpixels with regular shape and similar size. LRW  groups pixels using an improved random walk algorithm. By using texture features to optimize an initial superpixel map, this method can produce regular superpixels in regions with complex texture. However, this method suffers from a very slow speed.
Although FH , mean shift  and watersheds , have been refereed to as “superpixel” algorithms in the literature, they are not covered in this paper as the sizes of the regions produced by them vary enormously. This is mainly because these algorithms do not offer direct control to the size of the segmented regions. Structure-sensitive or content-sensitive superpixels in [49, 50] are also not considered to be superpixels, as they do not aim to extract regions with similar size (see Prop. 3 in section I).
A large number of superpixel algorithms have been proposed, however, few models have been presented and most of the existing energy functions are variation of the objective function of -means. In our work, we propose an alternative model to tackle the superpixel problem. With an elaborately designed algorithm, the underlying segmentation from the model is well revealed.
Let stands for the pixel index of an input image with its width and height in pixels. Hence, the total number of pixels of image is , and . Let denotes pixel ’s position on the image plane, where and , and denotes pixel ’s intensity or color. If color image is used, is a vector, otherwise, is a scalar. The number of elements in is ignored for now and it will be discussed later. We use vector to represent pixel .
Most existing superpixel algorithms require the desired number of superpixels as an input. However, instead of using directly, we use and as essential inputs. If is specified, and are obtained by the following equation.
If and are preferred, it is encouraged to assign the same value to the two variables. Using equation (2), the desired number of superpixels is computed when and are directly specified, or re-computed in the case when and are obtained by equation (1).
For simplicity of discussion, we assume that and . We define the superpixel set as .
Each superpixel corresponds to a Gaussian distribution with p.d.f. , where and
in which is the number of components in .
If pixel is drawn from superpixel , we assume that pixel can be only in pixel set which is defined in equation (III-A). Fig. 7 gives an visual illustration for . The definition of is one of the key points in our method.
and for any given superpixel , we have
For each pixel , the possible superpixels from which pixel may be generated form a superpixel set . Let stand for the unknown superpixel label of pixel , and
are treated as random variables whose possible values are in, . We now treat as observations of random variables of each random variables is defined as a mixture of Gaussian functions, known as Gaussian mixture model (GMM).
in which , the probability that takes value , are defined to be for , where is the number of elements in a given set. Therefore, become
Note that pixels may have different distributions when which is the most common case. This is the main difference between our GMM and the traditional GMM. The usage of results in superpixels with similar size.
Once an estimator of is found, superpixel label of pixel can be obtained by
By Bayes’ theorem, we have the posterior probability of each,
Therefore, superpixel labels can be obtained by
Maximum likelihood estimation is used to estimate the parameters in . Suppose that , , are independently distributed. For all observed vectors , , the logarithmic likelihood function will be
Because is constant, the value of that maximizes will be the same as the value of that maximizes
According to Jensen’s inequality, is greater than or equal to as shown below.
where , for and , and . We now use the expectation-maximization algorithm to iteratively find the value of that maximizes to approach the maximum of with two steps: the expectation step (E-step) and the maximization step (M-step).
E-step: once a guess of is given, is expected to be tightly attached to . To this end, is required to ensure . Equation (13) is a sufficient condition for Jensen’s inequality to hold the equality of inequality .
where is a constant. Since , can be eliminated and hence can be updated by equation (14) to hold the equality to be true.
M-step: in this step, is derived by maximizing with a given . To do this, we first calculate the derivatives of with respect to mean vectors and covariance matrices , and set the derivatives to zero, as shown in equations (15)-(17). Then the parameters are obtained by solving equation (17).
Although the estimate of in section III-B supports full covariance matrices, i.e., a covariance matrix with all its elements as shown in equation (19), only block diagonal matrices are used in this work (see equation (20)). This is because computing on block diagonal matrices is more efficient than computing on full matrices, and full matrices will also not bring better performance in accuracy.
where and respectively represent the spatial covariance matrices and the color covariance matrices for . For color images, it is encouraged to split their color covariance matrices into lower dimensional matrices to save computation. For example, if an image with CIELAB color space is inputted, it is better to put color-opponent dimensions and into a 2 by 2 covariance matrix. In this case, in equation (20) will become
However, we will keep using (20) to discuss the proposed algorithm for simplicity.
The covariance matrices will be updated according to equations (22) and (23) which are derived by replacing in equation (III-B) with the block diagonal matrices in equation (20), and by further solving (17).
where and are the spatial components of and , and and are, for grayscale images, the intensity components, or, for color image, the color components of and .
Since and are positive semi-definite in practice, they may be not invertible sometimes. To avoid this trouble, we first compute the eigendecompositions of the covariance matrices as shown in equations (24) and (25
), then eigenvalues on the major diagonals ofand are modified using equations (26) and (27), and finally and are reconstructed via the equations (28) and (29).
where and are diagonal matrices with eigenvalues on their respective major diagonals, and and are orthogonal matrices. We use and to denote the respective eigenvalues on major diagonals of and , where and . If the input image is grayscale, then we will have that , and are scalars, and .
where and are two constants. Although this two constants are originally designed to prevent covariance matrices from being singular, they also give an opportunity to control regularity of the generated superpixels by weighing the relative importance between spatial proximity and color similarity. For instance, a larger produces more regular superpixels, and the opposite is true for a smaller . As and are opposite to each other, we set and leave for detailed description in section IV.
where and are diagonal matrices with and on their respective major diagonals.
In the proposed algorithm, are initialized using center pixels over the input image uniformly at fixed horizontal and vertical intervals and , i.e. , where
We initialize with so that neighboring superpixels can be well overlapped at the beginning. The initialization of is not very straightforward, the basic idea is to set their main diagonal equal to the square of a small color distance with which two pixels are perceptually uniform. The effect of different values for will be discussed in section IV.
Once parameter is initialized, it will finally be estimated by iteratively updating (14), (18), (28), and (29) until converges. As a preprocessing step to subsequent applications, superpixel algorithm should run as fast as possible. We have found that iterating 10 times is sufficient for most images without checking convergence, and we will use this iteration number for all our experiments and will denote it with to avoid confusion.
As the connectivity of superpixels cannot be guaranteed, a postprocessing step is required to enforce connectivity of the generated superpixels. This is done by sorting the isolated superpixels in ascending order according to their sizes, and sequentially merging small isolated superpixels, which are less than one fourth of the desired superpixel size, to their nearest neighboring superpixels, with only intensity or color being taken into account. Once an isolated superpixel (source) is merged to another superpixel (destination), the size of the source superpixel is cleared to zero, and the size of the destination superpixel will be updated by adding the size of the source superpixel. This size updating trick will prevent the size of the produced superpixels from significantly varying.
The proposed algorithm is summarized in Algorithm 1.
As the frequency of a single processor is difficult to improve, modern processors are designed using parallel architectures. If an algorithm is able to be implemented with parallel techniques, its performance generally scales with the number of parallel processing units and its computational efficiency can be significantly improved on multi-core or on many-core systems. Fortunately, the most expensive part of our algorithm, namely the iteration of updating of and , can be parallelly executed as each can be updated independently, and so do and . In our experiments, we will show that our C++ implementation is easy to get speedup on multi-core CPUs with only few OpenMP directives inserted.
By the definition of , we have for . Therefore, the updating of has a complexity of . Because we use as a constant in the proposed algorithm, the complexity of is . By the definition of , we have . Based on equations (18), (22), and (23), the complexity of updating is . Since , the updating of has a complexity of . In the worst case, the sorting procedure in the postprocessing step requires operations, where is the number of isolated superpixels. The merging step needs operations, where is the number of small isolated superpixels and represents the average number of their adjacent neighbors. In practice, , the operations required for the postprocessing step can be ignored. Therefore, the proposed superpixel algorithm is of a linear complexity .
In this section, algorithms are evaluated in terms of accuracy, computational efficiency, and visual effects. Like many state-of-the-art superpixel algorithms, we also use CIELAB color space for our experiments because it is perceptually uniform for small color distance.
Accuracy: three commonly used metrics are adopted: boundary recall (BR), under-segmentation error (UE), and achievable segmentation accuracy (ASA). To assess the performance of the selected algorithms, experiments are conducted on the Berkeley Segmentation Data Set and Benchmarks 500 (BSDS500) which is an extension of BSDS300. These two data sets have been wildly used in superpixel algorithms. BSDS500 contains 500 images, and each one of them has the size of 481321 or 321481 with at least four ground-truth human annotations.
BR measures the percentage of ground-truth boundaries correctly recovered by the superpixel boundary pixels. A true boundary pixel is considered to be correctly recovered if it falls within two pixels from at least one superpixel boundary. A high BR indicates that very few true boundaries are missed.
A superpixel should not cross ground-truth boundary, or, in other words, it should not cover more than one object. To quantify this notion, UE calculates the percentage of superpixels that have pixels “leak” from their covered object as shown in equation (31).
where and are pixel sets of superpixel and ground-truth segment . is generally accepted.
If we assign every superpixel with the label of a ground-truth segment into which the most pixels of the superpixel fall, how much segmentation accuracy can we achieve, or how many pixels are correctly segmented? ASA is designed to answer this question. Its formula is defined in equation (32) in which is the set of ground-truth segments.
Computational efficiency: execution time is used to quantify this property.
As shown in Fig. 11, there is no obvious regularity for the effect of . In Fig. 11, the maximum difference between two lines is around 0.0010.006 which is very small. Although it seems that small will lead to a better BR result, it is not true for UE and ASA. For instance, in the enlarged region of Fig. (b)b, the result of is slightly better than . Visual results with different are plotted in Fig. 17, it is hard for human to distinguish the difference among the five results.
can be used to control the regularity of the generated superpixels. As shown in Fig. 21, small difference of does not present obvious variation for UE and ASA, but it does affect the results of BR. In other words, a small variation of affects the boundary of the produced superpixels much more than the content of the produced superpixels. Generally, a larger leads to more regular superpixels whose boundary is more smooth. Conversely, the shape of superpixels generated with a smaller is relative irregular (see Fig. 27). Because superpixels with irregular shape will produce more boundary pixels, the result of BR with small is better than that with greater .
We will use and in the following experiments. Although this setting does not give the best performance in accuracy, the shape of superpixels using this setting is regular and visually pleasant (see Fig. 27LABEL:sub@fig:ec8). Moreover, it is enough to outperform state-of-the-art algorithms as shown in Fig. 31.
In order to evaluate scalability for the number of processors, we test our implementation on an machine attached with an Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz and 8 GB RAM. The source code is not optimized for any specific architecture. Only two OpenMP directives are added for the updating of , , and , as they can be computed independently (see section III-D). As listed in Table I, for a given image, multiple cores will present a better performance.
|Resolution||1 core||2 cores||4 cores||6 cores|
We compare the proposed algorithm to eight state-of-the-art superpixel segmentation algorithms including LSC111http://jschenthu.weebly.com/projects.html , SLIC222http://ivrl.epfl.ch/research/superpixels , SEEDS333http://www.mvdblive.org/seeds/ , ERS444https://github.com/mingyuliutw/ers , TurboPixels555http://www.cs.toronto.edu/ babalex/research.html , LRW666https://github.com/shenjianbing/lrw14 , VCells777http://www-personal.umich.edu/ jwangumi/software.html , and Waterpixels888http://cmm.ensmp.fr/ machairas/waterpixels.html . The results of the eight algorithms are all generated from implementations provided by the authors on their respective websites with their default parameters except for the desired number of superpixels, which is decided by users.
As shown in Fig. 31, our method outperforms the selected state-of-the-art algorithms especially for UE and ASA. It is not easy to distinguish between our result and LSC in Fig. 31LABEL:sub@fig:abr. However, if we use , our result will obviously outperforms LSC as displayed in Fig. 32.
To compare the run-time of the selected algorithms, we test them on a desktop machine equipped with an Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz and 8 GB RAM. The results are plotted in Fig. 35. According to Fig. 35LABEL:sub@fig:t4, as the size of the input image increases, run-time of our algorithm grows linearly, which proves our algorithm is of linear complexity experimentally.
A visual comparison is displayed in Fig. 40. According to the zooms, only the proposed algorithm can correctly reveal the segmentations. Our superpixel boundaries can adhere object very well. LSC gives a really competitive result, however there are still parts of the objects being under-segmented. The superpixels extracted by SEEDS and ERS are very irregular and their sizes vary tremendously. The remaining five algorithms can generate regular superpixels, but they adhere object boundaries poorly.
This paper presents an alternative method for superpixel segmentation by associating each superpixel to a Gaussian distribution with unknown parameters; then constructing a Gaussian mixture model for each pixel; and finally the superpixel label of a pixel is determined by a posterior probability after that the unknown parameters are estimated by the proposed algorithm derived from the expectation-maximization method. The main difference between the traditional GMM method and the proposed one is that data points in our model are not assumed to be identically distributed. Another important contribution is the application of eigendecomposition used in the updating of covariance matrices.
The proposed algorithm is of linear complexity, which has been proved by both theoretical analysis and experimental results. What’s more, it can be implemented using parallel techniques, and its run-time scales with the number of processors. The comparison with the state-of-the-art algorithms shows that the proposed algorithm outperforms the selected methods in accuracy and presents a competitive performance in computational efficiency.
As a contribution to open source society, we will make our test code public available at https://github.com/ahban.
J. Ma, H. Zhou, J. Zhao, Y. Gao, J. Jiang, and J. Tian, “Robust feature matching for remote sensing image registration via locally linear transforming,”TGRS, vol. 53, no. 12, pp. 6469–6481, 2015.
Z. Li and J. Chen, “Superpixel segmentation using linear spectral clustering,” inCVPR, 2015, pp. 1356–1363.
Joint Pattern Recognition Symposium, 2007, pp. 254–263.