Image segmentation is considered one of the most difficult tasks in image processing [Gonzalez2008]. It is the process of dividing an image into parts, identifying objects or other relevant information [Shapiro2001]. Fully automatic segmentation is still very difficult to accomplish and the existing techniques are usually domain-dependent. Therefore, interactive image segmentation, in which the segmentation process is partially supervised, has experienced increasing interest in the last decades [Boykov2001, Grady2006, Protiere2007, Blake2004, Ducournau2014, Ding2010, Rother2004, Paiva2010, Li2010, Artan2010, Artan2011, Xu2008, Breve2015IJCNN, Breve2015ICCSA].
Semi-supervised learning (SSL) is an important field in machine learning, usually applied when unlabeled data is abundant but the process of labeling is expensive, time consuming and/or requiring intensive work of human specialists[Zhu2005, Chapelle2006]
. This characteristics makes SSL an interesting approach to perform interactive image segmentation, which may be seen as a pixel classification process. In this scenario, there are often many unlabeled pixels to be classified. An human specialist can easily classify some of them, which are away from the borders, but the process of defining the borders manually is difficult and time consuming.
Many interactive image segmentation techniques are, in fact, based on semi-supervised learning. The user may label some pixels from each object, away from the boundaries where the task is easier. Then, the SSL algorithm will iteratively propagate the labels from the labeled pixels to the unlabeled pixels, finding the boundaries. This paper proposes a different SSL-based interactive image segmentation approach. It is simpler than many other techniques, but it still achieves significant classification accuracy in the image segmentation task. In particular, it was applied to some real-world images, including some images extracted from the Microsoft GrabCut dataset [Rother2004]. The segmentation results show the effectiveness of the proposed approach.
1.1 Related work
The approach proposed in this paper may be classified in the category of graph-based semi-supervised learning. Algorithms on this category rely on the idea of building a graph which nodes are data items (both labeled and unlabeled) and the edges represent similarities between them. Label information from the labeled nodes is propagate through the graph to classify all the nodes [Chapelle2006]. Many graph-based methods [Blum2001, Zhu2003, Zhou2004, Belkin2004, Belkin2005, Joachims2003] are similar and share the same regularization framework [Zhu2005]. They usually employ weighted graphs and labels are spread globally, differently from the proposed approach, where the label spreading is limited to neighboring nodes and the graph is undirected and unweighted.
Another graph-based method, known as Label Propagation through Linear Neighborhoods [Wang2008], also uses a -nearest neighbors graph to propagate labels. However, the edges have weights, which require the resolution of quadratic programming problems to be calculated, prior to the iterative label propagation process. On the other hand, the proposed approach uses only unweighted edges.
1.2 Technique overview
In the proposed method, an unweighted and undirected graph is generated by connecting each node (data item) to its -nearest neighbors. Then, in a iterative process, unlabeled nodes will receive contributions from all its neighbors (either labeled or unlabeled) to define their own label. The algorithm usually converges quickly, and each unlabeled node is labeled after the class from which it received most contributions. Differently from many other graph-based methods, no calculation of edge weights or Laplacian matrix are required.
2 The Proposed Model
In this section, the proposed technique will be detailed. Given a bidimensional digital image, the set of pixels are reorganized as , such that is the labeled pixel subset and is the unlabeled pixels subset. is the set containing the labels. is the function associating each to its label
as the algorithm output. The algorithm will estimatefor each unlabeled pixel .
2.1 -NN Graph Generation
A large amount of features may be extracted from each pixel to build the graph. In this paper, features are used. They are shown on Table 1. These are the same features used in [Breve2015WVC].
|1||Pixel row location|
|2||Pixel column location|
|3||Red (R) component of the pixel|
|4||Green (G) component of the pixel|
|5||Blue (B) component of the pixel|
|6||Hue (H) component of the pixel|
|7||Saturation (S) component of the pixel|
|8||Value (V) component of the pixel|
|9||ExR component of the pixel|
|10||ExG component of the pixel|
|11||ExB component of the pixel|
|12||Average of R on the pixel and its neighbors (MR)|
|13||Average of G on the pixel and its neighbors (MG)|
|14||Average of B on the pixel and its neighbors (MB)|
|15||Standard deviation of R on the pixel and its neighbors (SDR)|
|16||Standard deviation of G on the pixel and its neighbors (SDG)|
|17||Standard deviation of B on the pixel and its neighbors (SDB)|
|18||Average of H on the pixel and its neighbors (MH)|
|19||Average of S on the pixel and its neighbors (MS)|
|20||Average of V on the pixel and its neighbors (MV)|
|21||Standard deviation of H on the pixel and its neighbors (SDH)|
|22||Standard deviation of S on the pixel and its neighbors (SDS)|
|23||Standard deviation of V on the pixel and its neighbors (SDV)|
List of features extracted from each image to be segmented
For measures to , the pixel neighbors are the -connected neighborhood, except on the borders where no wraparound is applied. All components are normalized to have mean and standard deviation
. They are also scaled by a vector of weightsin order to emphasize/deemphasize each feature during the graph generation. ExR, ExG, and ExB components are obtained from the RGB components using the method described in [Lichman2013]. The HSV components are obtained from the RGB components using the method described in [Smith1978].
The undirected and unweighted graph is defined as , where is the set of nodes, and is the set of edges . Each node corresponds to a pixel . Two nodes and are connected if is among the -nearest neighbors of , or vice-versa, considering the Euclidean distance between and features. Otherwise, and are disconnected.
2.2 Label Propagation
For each node , a domination vector is created. Each element corresponds to the domination level from the class over the node . The sum of the domination vector in each node is always constant, .
The domination levels are constant in nodes corresponding to labeled pixels, with full domination by the corresponding class. On the other hand, domination levels are variable in nodes corresponding to unlabeled pixels and they are initially set equally among classes. Therefore, for each node , the domination vector is set as follows:
In the iterative phase, at each iteration each unlabeled node will get contributions from all its neighbors to calculate its new domination levels. Thus, for each unlabeled node , the domination levels are updated as follows:
where is the size of , and is the set of the neighbors. In this way, the new dominance vector is the arithmetic mean of all its neighbors dominance vectors, no matter if they are labeled or unlabeled.
The average maximum domination levels is defined as follows:
considering all representing unlabeled nodes. is checked every iterations and the algorithm stops when its increase is below between checkpoints.
At the end of the iterative process, each unlabeled pixel is assigned to the class that has the highest domination level on it:
2.3 The Algorithm
Overall, the proposed algorithm can be outlined as follows:
In order to reduce the computational resources required by the proposed method, the following implementation strategy is applied.
The iterative step of the algorithm is very fast in comparison with the graph generation step, i.e., the graph generation dominates the execution time. Therefore, the graph is generated using the k-d trees method [Friedman1977], so the algorithm runs in linearithmic time ().
In the iterative step, each iteration runs in , where is the amount of unlabeled nodes and is usually proportional to the amount of neighbors each node has (not equal because the graph is undirected). is usually a fraction of in practical problems, and often . By increasing , one also increases each iteration execution time. On the other hand, the amount of iterations required to converge decreases as the graph becomes more connected and the labels propagate faster, as it was empirically observed in computer simulations.
The iterative steps are synchronous, i.e., the contributions any node receives to produce its domination vector in time refer to the domination levels its neighbors had in time . Therefore, parallelization of this step, corresponding to the inner loop in steps and of the Algorithm 1, is possible. Nodes can calculate their new domination vectors in parallel without running into race conditions. Synchronization is only required between iterations of the outer loop (steps to ).
The proposed technique efficacy is first tested using the real-world image shown on Fig. (a)a, extracted from [Breve2015IJCNN], which has pixels. A trimap providing seed regions is presented in Figure (b)b. Black (0) represents the background, ignored by the algorithm; dark gray (64) is the labeled background; light gray (128) is the unlabeled region, which labels will be estimated by the proposed method; and white (255) is the labeled foreground.
The proposed technique efficacy is then verified using a series of computational experiments using nine image selected from the Microsoft GrabCut database [Rother2004] 111Available at http://web.archive.org/web/20161203110733/research.microsoft.com/en-us/um/cambridge/projects/visionimagevideoediting/segmentation/grabcut.htm. The selected images are shown on Fig. 2. The corresponding trimaps providing seed regions are shown on Fig. 3. Finally, the ground truth images are shown on Fig. 4.
For each image, and the vector of weights
were optimized using the genetic algorithm available in Global Optimization Toolbox of MATLAB, with its default parameters.
5 Results and Discussion
First, the proposed method was applied to the image shown on Fig. (a)a. The best segmentation result is shown on Fig. (c)c. By comparing this output with the segmentation result achieved in [Breve2015IJCNN] for the same image, one can notice that the proposed method achieved slightly better results, by eliminating some misclassified pixels and better defining the borders.
Then, the proposed method was applied to the nine images shown on Fig. 2, as described on Section 4. The best segmentation results achieved with the proposed method are shown on Fig. 5. Error rates are computed as the fraction between the amount of incorrectly classified pixels and the total amount of unlabeled pixels (light gray on the trimaps images shown on Fig. 3). Notice that ground truth images (Fig. 4) have a thin contour of gray pixels, which corresponds to uncertainty, i.e., pixels that received different labels by the different persons who did the manual classification. These pixels are not used in the classification error calculation.
Segmentation error rates are also summarized on Table 2. Some results from other methods [Ducournau2014, Ding2008, Breve2015IJCNN] are also included for reference. By analyzing them, one can notice that the proposed method has comparable results. The results from the other methods were extracted from the respective references.
It is also important to notice that the proposed method is deterministic. Given the same parameters, it will always output the same segmentation result on different executions. Other methods, like Particle Competition and Cooperation [Breve2015IJCNN], are stochastic. Therefore, they may output different segmentation results on each execution.
The optimized parameters and features weights () are shown on Table 3. Considering the images evaluated in this paper, pixel location features (Row and Col) are the most important features, followed by the ExB component, intensity (V), and the mean of green (MG). The least important features were hue (H), saturation (S) and all those related to standard deviation. However, no single feature received a high weight in all images. The optimal weights and seem to be highly dependent on image characteristics.
In this paper, a new SSL graph-based approach is proposed to perform interactive image segmentation. It employs undirected and unweighted NN graphs to propagate labels from nodes representing labeled pixels to nodes representing unlabeled pixels. Computer simulations with some real-world images show that the proposed approach is effective, achieving segmentation accuracy similar to those achieved by some state-of-the-art methods.
As future work, the method will be applied on more images and more features may be extracted. Methods to automatically define the parameters and may also be explored. Graph generation may also be improved to provide further increase in segmentation accuracy.
Moreover, the proposed method works for multiple labels simultaneously at no extra cost, which is an interesting property not often exhibited by other interactive image segmentation methods. This feature will also be explored in future works.
The author would like to thank the São Paulo Research Foundation - FAPESP (grant #2016/05669-4) and the National Counsel of Technological and Scientific Development - CNPq (grant #475717/2013-9) for the financial support.