Visual-hint Boundary to Segment Algorithm for Image Segmentation

10/03/2010 ∙ by Yu Su, et al. ∙ 0

Image segmentation has been a very active research topic in image analysis area. Currently, most of the image segmentation algorithms are designed based on the idea that images are partitioned into a set of regions preserving homogeneous intra-regions and inhomogeneous inter-regions. However, human visual intuition does not always follow this pattern. A new image segmentation method named Visual-Hint Boundary to Segment (VHBS) is introduced, which is more consistent with human perceptions. VHBS abides by two visual hint rules based on human perceptions: (i) the global scale boundaries tend to be the real boundaries of the objects; (ii) two adjacent regions with quite different colors or textures tend to result in the real boundaries between them. It has been demonstrated by experiments that, compared with traditional image segmentation method, VHBS has better performance and also preserves higher computational efficiency.



There are no comments yet.


page 18

page 22

page 23

page 34

page 35

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Image segmentation is a vast topic in image analysis. In this chapter, we present a low-level image segmentation method, which has been proposed to segment images in a way that agrees with human perceptions. In recent years, Most of the image segmentation algorithms are designed based on an idea that partitions the images into a set of regions preserving homogeneous intra-regions and inhomogeneous inter-regions. By this idea, these methods segment images in classification or clustering manner. However, human visual intuition does not always follow this manner. Our goal of this research is to define a low-level image segmentation algorithm which is consistent with human visual perceptions.

The proposed new image segmentation method is called Visual-hint Boundary to Segment (VHBS). VHBS abides by two visual hint rules based on human perceptions: (i) the global scale boundaries tend to be the real boundaries of the objects; (ii) two adjacent regions with quite different colors or textures tend to result the real boundaries between them. Compared with other unsupervised segmentation methods, the outputs of VHBS are more consistent to the human perceptions. Beside, reducing complexity is another objective of VHBS since high performance segmentation methods usually are computationally intensive. Therefore, chaos and non-chaos concepts are introduced in VHBS to prevent algorithm going down to details of pixel level. It guarantees that segmentation process stays at a coarse level and keeps the computational efficiency.

VHBS is composed by two phases, Entropy-driven Hybrid Segmentation (EDHS) and Hierarchical Probability Segmentation (HPS.) VHBS starts from EDHS, which produces a set of initial segments by combining local regions and boundaries. These local regions and boundaries are generated by a top-down decomposition process and initial segments are formed by a bottom-up composition process. The top-down decomposition process recursively decomposes the given images into small partitions by setting a stopping condition for each branch of decomposition. We set an entropy measurement as the stopping condition since smaller entropy of local partitions implies lower disorder in the local partitions. To preserve the computational efficiency, we set up a size threshold of the partitions to prevent the decomposition going down to pixel level. Based on this threshold, local partitions are grouped into two types, chaos if the size of a partition is less than the threshold and non-chaos otherwise. Local regions and boundaries are computed in local partitions. Each local region is described by a vector called feature description and the local boundaries are weighted by the probabilities. To calculate the probabilities, we design two scale filter,

and , which are based on the two visual hints (i) and (ii) respectively. The boundaries between two adjacent regions are weighted by the product of and . A bottom-up composition process is followed and combines these local regions and boundaries to form a set of the initial segments, .

The second phase of VHBS is Hierarchical Probability Segmentation (HPS,) which constructs a Probability Binary Tree (PBT) based on these initial segments . PBT presents the hierarchy segments based on boundary probabilities between these initial segments, which forms the leaves of PBT. The root represents the original images and the intern nodes of PBT are the segments combined by their children. Links are labeled by the boundary probabilities. PBT can be visualized in number of segments or even provides the local details. The difference compared with the methods based on MST such as [20, 26, 21] is that these methods generate the tree structure based on the similarities between pixels. Whereas, our method generate the tree structure based on probabilities between regions. It makes the algorithm insensitive to the noise and it greatly reduces the computational complexity. A similar approach is proposed by [1]. Compared with this approach, VHBS is more efficient since VHBS prevents the decomposition process going down to pixel level by setting a chaos threshold. The novel aspects of VHBS include:

  1. Visual-Hint: Algorithm abides by two visual hint rules which force the outputs of VHBS are more consistent to human perceptions;

  2. feature Detection: VHBS outputs a set of feature descriptors, which describe the features for each segment;

  3. computational Efficiency: VHBS has high computational efficiency since the algorithm does not go down to pixel level;

  4. hybrid algorithm: VHBS combines edge-, region-, cluster- and graph-based techniques.

2 Relative work

Image segmentation is one of major research areas in image analysis and is already explored for many years. Regularly, segmentation methods partition the given images into a set of segments, where a segment contains a set of pixels that preserve high similarity within a segment and maximize differences between different segments. Some examples of classical image segmentation algorithms are k-means clustering

[31], histogram threshold, region growth and watershed transformation. These methods are efficient and easy to understand but with obvious weaknesses which become barriers for applications. These weaknesses include sensitivity of image noise and textures; improperly merging two disjoint areas with the similar gray values by histogram threshold methods; improper initial condition resulting in incorrect outputs and tending to produce excessive over-segmentation by watershed transformation methods [27]. All these examples demonstrate that image segmentation is an extensive and difficult research area. In recent years, numerous image segmentation methods have been proposed and greatly overcome those weaknesses. Commonly, segmentation algorithms fall into one or more than one of the following categories: edge-based, region-based, cluster-based and graph-based.

The idea of edge-based segmentation methods is straightforward. Contours and segments highly correlate each other. The closed contours give the segments and segments automatically produce the boundaries. Edge-based segmentation methods rely on contours located in images and then these contours produce the boundaries of the segments. Therefore, much research has focused on contour detection. The classical approaches to detect the edges are to look for the discontinuities of brightness such as in Canny Edge Detection [6]. [36] demonstrates that these approaches by looking for discontinuities of brightness are inadequate models for locating the boundaries in natural images. One reason is that texture is a common phenomenon in natural images and these textures produce some unwanted edges. Another reason is that to locate segments in an image by edges requests closed boundaries. These approaches usually provide incontinuous contours, which is inadequate for locating the segments. In recent years, many high performance contour detections have been proposed such as [42, 36, 32, 55, 33]. One category of contour detections is locating the boundary of an image by measuring the local features. To improve the edge detection performance on the natural images, some approaches consider one or combine more descriptors for each pixel in several feature channels over a different scales and orientations to locate boundaries. [36]

proposes a learning schema approach that considers brightness, color and texture features at each local position and uses a classifier to combine these local features. Based on the research of

[36, 33] combines the spectral component to form the so called globalized probability of boundary to improve the accuracy of the boundary detection. There are many boundary detection and segmentation methods which are oriented energy approach [37]. Examples of these approaches are [51, 40]. To achieve high accuracy, these approaches usually combine more than one local feature channels. The computational complexity becomes a bottleneck if the application requests high computational efficiency. Of course there are many other proposed boundary detection and segmentation algorithms based on rich texture analysis such as [4, 28, 22]. However, highly accurate image contour and segment detection methods are computationally intensive [1].

To locate the segments from the contours, the contours must form closed regions. An example of such research is [45] by bridging the disconnecting contours or contours tracking to locate the regions. Another recent research is [1], which can be divided into two phases: (i) Oriented Watershed Transform (OWT) produces a set of initial regions from a contour detection. Paper selects gPb proposed by [33] as the contour detection algorithm since this contour detector gives high accuracy by the benchmark of BSDB [35]; (ii) Ultrametric Contour Map (UCM) constructs the hierarchical segments. A tree is generated by a greedy graph-based region merging algorithm, where the leaves are those initial regions and the root is the entire images. The segmentation algorithm proposed by [1] has high accuracy segmentation performance. But the disadvantage is obvious. gPb is a expensive contour detection and gPb provides fine gradient initial regions. It can be proved that the time complexity of constructing hierarchical segments over such a fine gradient is also computationally intensive. Other examples of recent contour-segment researches are [22, 23].

Typically, a region-based algorithm is combined with clustering techniques to assemble the sub-regions into final segments and numerous methods fall into these two schemas such as [48, 30, 10, 12, 11, 24, 13]. The common used techniques include region growth, split-merge, supervised and unsupervised classification. [24] proposes a region growth algorithm, called random walk segmentation, which is a multi-label, user interactive image segmentation. This algorithm starts from a small number of seeds with user-predefined labels. Random walk algorithm can determine the probabilities by assuming a random walker starting at each unlabeled pixel that will first reach one of these user-predefined seeds. By this assumption, pixels are assigned to the label which is the greatest probability based on a random walker. Mean shift [13]

is a well known density estimation cluster algorithm and been widely used for image segmentation and object tracking. Based on the domain probability distribution, the algorithm iteratively climbs the gradient to locate the nearest peak.

[13] demonstrates that mean shift provides good segmentation results and is suitable for real data analysis. But the quadratic computational complexity of the algorithm is a disadvantage and the choice of moving window size is not trivial.

In recent years, much research has been built based on graph theoretic techniques. It has been demonstrated by [44, 20, 16, 26, 19, 18, 15, 21, 52, 5, 39, 50, 8, 34, 17] that these approaches support image segmentation as well. As pointed out by [26], graph-based segmentation could be roughly divided into two groups. One is tree-structure segmentation and another is graph-cut segmentation. Assuming a 2D image as space , both of these two approaches view as the collection of a set of subgraphs , where each is an undivided partition, and for all . Commonly, denotes a pixel of the images. Tree-structure [20, 16, 26, 19, 18, 15, 21] expresses the split-merge process in a hierarchical manner. The links between parents and children indicate the including relationship and the nodes of the tree denote the pieces of subgraphs. Graph-cut [52, 5, 39, 50, 8, 34, 17] views each element of as a vertex and the edges of the graph are defined by the similarities between these adjacent vertices. This process forms a weighted undirected graph and relies the graph cutting to process the graph partition.

A common tree-structure approach is minimum spanning tree (MST) [41]. [20, 21] propose an algorithm based on MST. That is using the local variation of intensities to locate the proper granularity of the segments based on the so called Kruskal’s minimum spanning tree (KMST). Another recent example of tree-structure approach is [16]. The purpose of this approach is to find the semantic coherence regions based on -neighbor coherence segmentation criterion by a so called connected coherence tree algorithm (CCTA). Rather than generating tree based on the pixel similarities, [1] generate a tree structure based on the region similarities. Tree structure based on region similarities should provide better computational complexity than the structure based on the pixel similarities since is greatly reduced by replacing pixels by regions.

Graph-cut approaches are also called spectral clustering. The basic idea of the graph-cut approach is partitioning

into disjoint subsets by removing the edges linking subsets. Among these approaches, the normalized cut (Ncut) [44]

is widely used. Ncut proposed a minimization cut criterion which measures the cut cost as a fraction of the total edge connection to all the nodes in the graph. This paper demonstrates that minimizing the normalized cut criterion is equivalent to solving a generalized eigenvector system. Other recent examples of graph-cut approaches are

[52, 5]. Graph-cut approaches have been proved to be NP-complete problems and the computational complexity is expensive. These disadvantages become the main barriers for graph-cut methods.

3 Entropy-driven Hybrid Segmentation

Entropy-driven Hybrid Segmentation, EDHS, begins with a top-down decomposition from the original input images and finishes with a bottom-up composition. Top-down decomposition quarterly partitions a given image and correspondly produces a quadtree based on a stopping condition. EDHS uses an edge detector, such as Canny Detector [6], to locate the boundaries between the local regions in the leaves. These boundaries are weighted by the probabilities computed based on the two visual hint rules.

Bottom-up composition recursively combines the local regions when the two adjacent local regions share a boundary with zero probability. This process forms the initial segments, , and a set of probabilities, , which describes the weights of the boundaries between each pair of the adjacent initial segments, where index and imply two initial segments and , which share a boundary valued by a real number , . For each initial segment , a feature vector, , is generated to describe this segment. The feature descriptor, such as the in BIRCH [47], summarizes the important features of each area (cluster.) Although the specific values used in feature descriptor may vary, in this chapter we assume , where , and are mean values of color channels of red, green and blue.

3.1 Top-down decomposition

Decomposition mechanism is a wildly used technique in hierarchical image segmentation [2, 38]. Our decomposition process recursively decomposes the images into four quadrants. The decomposition process is presented by an unbalance quadtree. The root represents the original image and nodes represent the four partitions. A stopping condition is assigned for each branch of decomposition. Partition process is stopped when the desired stopping condition is reached. Figure 1 demonstrates an example of the data structures. We summarize the top-down decomposition as follows:

  1. Partitioning the images into small pieces reduces the information presented in local images, which helps VHBS conquer the sub-problems at the local position;

  2. Decomposition provides the relative scale descriptor for the scale filter to calculate the probabilities of the boundaries. We will discuss the relative scale descriptor in section 3.2.1;

  3. Divide and conquer schema potentially supports the parallel implementation, which could greatly improve the computational efficiency.

To describe the decomposition, the dyadic rectangle [29] is introduced. A dyadic rectangle is the family of regions , for . The dyadic rectangle of has some nice properties. Every dyadic rectangle is contained in exactly one “parent” dyadic rectangle, where the “child” dyadic rectangles are located inside of a “parent” dyadic rectangle. The area of “paren” is always an integer power of two times of “chil” dyadic rectangle. Mapping the images into Cartesian plane, dyadic rectangle provides a model to uniformly decompose images into sub-images recursively.

Given an , assuming that and are the power of , the set of dyadic rectangles at levels through form a complete quadtree, whose root is the level dyadic rectangle . Each dyadic rectangle with level has four children that are dyadic rectangles at levels and , which are four quadrants of the . Suppose , for , then, the first quadrant of is ; the second quadrant is ; the third quadrant is and the fourth quadrant is .

Figure 1: Top-down decomposition and the quadtree structure

3.1.1 Stopping Condition

In information theory, entropy is a measure of the uncertainty associated with a random variable

[14]. We choose entropy [43] as the stopping condition for the top-down decomposition since entropy provides a measurement of disorder of a data set. Let denote the stopping condition for each branch of the quadtree. If holds, then EDHS stops the partition process of this branch. By decreasing the size of images, the decomposition reduces the information presented in the local positions. Follows give the concept of segment set. Based on this concept, we define the entropy of images and K-Color Rule.

Definition 1 (Segment set:).

Given a partition of the interval , where and are minimum and maximum of feature values of a given image , it gives a segment set , where is a set of pixels that all the pixels in form a connected region and all the feature values of the pixels are located in interval . and for all .

Definition 2 (Entropy :).

Given a segment set based on a partition , then entropy of to the base is


where denote the probability mass function of segment set . To make the analysis simple, assume the logarithm base is . This gives

Theorem 1.

Let be the number of segments of an image, supremum of is strictly increasing function with respect to and the supermum of is .


Let denote the entropy of an image with segment set and . To show supremum of is strictly increasing function respect to , we need to show that for any .

By [14] Theorem 2.6.4, we have

Then we have

Definition 3 (K - Color Rule:).

Using different colors for different segments in segment set , if the image holds no more than segments, which means image can be covered by colors, we say condition ‘K - Color Rule’ (K-CR) is true; else, K-CR is false.

Assume an image . If is a one color (1-CR) image, then it is a zero entropy image by Theorem 1. Consider another case. Assume the image is too complicated that none of the segments holds more than one pixel. This case gives the maximum entropy, ( yields .) Then, the range of for a is . The larger the entropy is, the more information is contained in the images.

Based on this observation of , we choose image entropy as the stopping condition for the top-down decomposition because is highly related to the number of segments. denotes the number of segments and the range of is . If a proper value of is chosen for a given image, then yields to by Theorem 1. That is, for a certain branch of the quadtree, decomposition approach partitions the given images until the local entropy is no larger than .

3.1.2 K as An Algorithm Tuning Parameter

The value of impacts the depth of decomposition. A small value of results a deep quadtree because is small. Small leaves do not contain too much information, which results few boundaries within the leaves. Thus is a key issue since it decides the weights of the edge- and region/clustering-based segmentation techniques used in EDHS. In other words, is a measurement that indicates the degree to which each technique is used. Figure 2 demonstrates that can be viewed as a sliding block ranging from to . If is close to , EDHS is closer to a region/cluster-based segmentation method since few boundaries are detected in the leaves. The weight of edge-based technique increases as long as the value of becomes large.

Figure 2: Sliding block and entropy measurement

Suppose , then the stopping condition yields . To meet this stopping condition, the decomposition process goes down to the pixel level if the neighbor pixels are inhomogeneous. Then, EDHS is a pure region/cluster-based segmentation since there is no necessary to detect the boundaries for the one color images.

Suppose , then the stopping condition yields . By Theorem 1, no decomposition approach is processed since holds for an image by Theorem 1. Then, EDHS is a pure edge-based segmentation since no decomposition approach is employed. EDHS just runs an edge detector locating the boundaries to form the local regions.

For an image, the possible values of range from to . Are all these integers from to valid for ? The answer is no. Let us take a close look at the cases when and .

Points, lines and regions are three essential elements in a two-dimensional image plane. We are looking for a value of which can efficiently recognize the lines and regions (we treat a point as noise if this point is inhomogeneous with its neighbors.) Keep in mind that the aim of decomposition is to reduce the disorder. It suggests that should be a small integer. When , as discussed above, it forces the leaves to be one color. EDHS yields a pure region/cluster-based segmentation. Previous algorithms of this type have proved to be computationally expensive [46].

Consider . The decomposition approach continually partitions the image until the local entropy is less than , which tends to force the leaves holding no more than two colors. Assuming a line passing through the sub-images, to recognize this line, one of the local regions of the leaves needs to be this line or part of a line. It makes the size of leaves quite small and forces the decomposition process to go down to pixel level. Under this circumstance, the time and space complexities are quite expensive. Another fatal drawback is that the small size of leaves makes the EDHS sensitive to noise. Even a pixel, which is inhomogeneous with its neighbors, could cause invalid recognition around this pixel area.

is a good choice because it can efficiently recognize the lines and regions in 2D plane. An example is shown in Figure 3 (a). Top-down decomposition goes down to pixel level to locate the curve if set . But for , no decomposition is needed since the entropy of Figure 3 (a) must be less than by Theorem 1. It suggests that EDHS is stable and reliable when .

Figure 3: Lines in images

There is an extreme (the worst) case that we need to consider. This case is shown in Figure 3 (b). Multiple lines pass through one point, say . Define a closed ball , where is the center and is the radius of the ball. No matter how small is, contains segments divided by lines. In other words, partitioning does not help reduce the number of segments around the area . Therefore, decomposing images into small pieces does not decrease entropy inside . To handle this case, we introduce ‘chaos’ leaves.

As shown in Figure 3 (b), entropy does not decrease inside along with decomposition. To solve this problem, we introduce a threshold 1, which is the smallest size of the leaves. If the size of partition is less than , the top-down decomposition does not continue even though the desired has not been reached. If this case happens, we call these leaves chaos.

3.1.3 Approximate Image Entropy

To calculate the entropy defined by Definition 2, we should know the probability distribution , where is a set of segments of . In most cases, we have no prior knowledge of the distribution of the segments for the given images. In other words, we are not able to directly compute the image entropy defined by Definition 2. Definition 4 gives an alternative calculation called approximate image entropy, which does not require any prior knowledge of the distribution of the segments but provides an approximate entropy value.

Definition 4 (approximate entropy:).

Given a partition of the interval , , where and are minimum and maximum of feature values of a given image , then approximate entropy is defined as


where denotes the probability mass function of the feature value set . Each denotes a collection of pixels whose feature values are located in and , . After setting the logarithm base as , yields

Theorem 2.

Given an and a partition , is less or equal then .


Given an and a partition , where , and are minimum and maximum of pixel feature values of . Let is the number of segments defined in Definition 2. By the Definition of 2, must be greater or equal to . There are two cases need to be considered. One is and another is .

Case I: if , by the Definition 2 and Definition 4, , which induces .

Case II: if , it implies that there must exist at least two segments which locate at the same partition interval. Without loss of generality, assuming and are defined over the partition , where with two segments and , , both feature values of and locate in the interval , . If we can prove , then the theorem can be proved by repeating following proof arbitrary times.

By Definition 2 and 4 respectively,

Assume an and -CR is true. It implies that by Theorem 1. By Theorem 2, must be equal or less than . Therefore, It induces the a logical chain, truth of -CR . Both and are necessary but not sufficient conditions for the truth of -CR.

3.1.4 Noise Segments

The term noise of an image usually refers to unwanted color or texture information. We do not count a small drop of ink on an A4 paper as a valid segment. In most circumstances, it would be considered as noise. How to distinguish those valid and invalid segments is an important issue.

Definition 5 (Dominant and Noise segments:).

Let denote the probability mass function of segment set in . Given a threshold , is a noise segment if and is greatly less than , . Other segments are called dominant segments.

If the segments are small enough and the total area of those segments occupies a small portion of a given image, we call those segments noise segments. The first requirement of noise segment is understandable because the noise segments should be small. The reason of defining the second condition is to avoid the cases that the images are totally composed by small pieces.

The value of of K-CR in Definition 3 refers the number of dominant segments. By Theorem 1 and 2, the supremum of for this given image is no longer . The noise supremum of should be slightly larger than . Assuming the noise redundancy be , then redundancy stopping condition, , yields .

Consider dividing segments into two groups, noise and dominant segments. By Definition 4, yields as follows:

Given an image , let be the total portion of dominant segments. Then and the rest area, , is the portion of noise segments. By Definition 5, . Let and be the number of dominant and noise segments respectively. After applying Theorem 2.6.4 [14], we get the noise supermum of as follows.

The noise redundancy . The redundancy stopping condition, , yields .

Following gives an example to compute the noise redundancy. Suppose a 3-CR application , setting and . The redundancy stopping condition for 3-CR yields , which is slight greatly than . Noise redundancy .

We summarize the top-down decomposition by Algorithm 1 and demonstrate some examples of the top-down decomposition in Figure 4 by varying different values, where , and .

1em boxed

input : : An image
output : : A decomposition quadtree
if size of  then
       // current is chaos
       Create a chaos leaf for and generate a feature descriptor for .
       if  then
             Partition into four partitions: , , , ;
             Append , , and as children of in the ;
             Locate the local regions by detecting the boundaries within ;
             Create a non-chaos leaf and generate a feature descriptor for each local region;
Algorithm 1 Topdowndecomposition


Figure 4: Quarter decomposition by different stopping conditions

3.2 Bottom-up Composition

Bottom-up composition stands at a kernel position of VHBS since this process combines the local regions at the leaves of the quadtree to form the initial segment set . It also calculates the probabilities of the boundaries between these initial segments. At the same time, bottom-up composition process generates the feature descriptors for each initial segment by combining the local region feature descriptors.

The probabilities of the boundaries between these initial segments are computed by two filters, which are designed based on the two visual hint rules (i) and (ii) separately. The first one called scale filter, , abides by rule (i). The probabilities are measured by the length of the boundaries. Longer boundaries result higher probabilities. The second one called similarity filter, , abides by rule (ii). The probabilities of the boundaries are measured by the differences of two adjacent regions. Larger different features of two adjacent regions result higher probability boundaries. The finial weights of the boundaries are the trade-off of two filters by taking the products of these two filters. If the probability of the boundary between two local regions is zero, these two local regions are combined together.

3.2.1 Scale Filter

Scale filter is defined based on the visual hint (i): the global scale boundaries tend to be the real boundaries of the objects. It suggests that these boundaries caused by the local texture are not likely to be the boundaries of our interesting objects because the objects with large size are more likely to be our interesting objects. To measure the relative length of each boundary, we use the sizes of the decomposition partitions in which the objects are fully located. These local scale boundaries are not likely to extend to a number of partitions since the length of these boundaries are short. By this observation, we define the scale filter based on the sizes of the partitions.

Scale filter is a function which calculates the confidence of the boundary based on the scale observations. The input parameter of is the relative scale descriptor , which is ratio between the sizes of the local partitions and the original images. The relative scale descriptor is the measurement of the relative scale of the boundaries. We assume the sizes of the images are the length of the longer sides. An example of calculating scale descriptor is shown in Figure 5. The boundaries inside the marked partition have the relative scale descriptor , which is defined as:


where is the size of a partition and is the size of a original image.

Figure 5: The relative scale descriptor

Based on visual hint (i), must be a strictly increasing function on domain and the range of locates in the interval . If s is small, it suggests that the confidence of the boundary should be small since the boundaries are just located in a small area. If is close to one, it suggests that the boundaries have high confidence. The gradient of is decreasing. This is because human perception is not linearly dependent on the relative scale descriptor. For the same difference, human perception is more sensitive when both of them are short rather than both of them are long. Therefore we define as follows:


where is the scale damping coefficient and is the relative scale descriptor. Figure 6 gives the with different scale damping coefficients.

Figure 6: Scale filter with different scale damping coefficients.

3.2.2 Similarity Filter

Compared these regions with similar colors or textures, human perception is more impressed by regions with quite different features. Visual hint (ii) suggests that two adjacent regions with different colors or textures tend to produce the high confidence of boundaries between them. Based on this observation, we defined a similarity filter to filter out the boundary signals which pass through similar regions.

Similarity filter is a function which calculates the confidence of the boundaries based on the similarity measurements between two adjacent regions. Examples of similarity measurements are Dice, Jaccard, Cosine, Overlap. The similarity measurement is a real number in interval . The higher the value, the more similar two regions are. implies that they are absolutely the same and zero means they are totally different. Based on the visual hint (ii), small similarity measurement should result in high confidence boundary. Let denote the similarity measurement of two adjacent regions. Then is a strictly decreasing function over domain and the range of is inside . Human perception is not linear relationship with the similarity measures. Human perception is sensitive to the regions when these regions have obvious different colors or textures. For example, similarity measure and are not a big difference for human visual because both of them are obviously different. This fact suggests that the gradient of is decreasing over the domain . Then we define as follows to satisfy the requirements above.


where is the similarity damping coefficient and is the similarity measurement between two adjacent regions defined as . and are feature descriptors of the local regions. Figure 7 gives the curves of with respect to different similarity damping coefficients.

Figure 7: Similarity filter with different similarity damping coefficients.

We implemented VHBS by using given by equation 6. We found that the boundary signals are over-damped by the similarity filter . The algorithm assigns low confidence values for boundaries with global scales that preserved similar feature descriptors. Human visual is also sensitive to these sorts of boundaries. To avoid these cases, we redesign the similarity filter by considering the relative scale descriptor as well. Similarity filter is redefined as , which outputs high confidence weights when either parameter or is close to one. We also introduce a threshold similarity . If the similarity measurement is higher than this threshold, algorithm sets the confidence as , which means that there are no boundaries between these two regions if human visual cannot tell the difference of these two adjacent regions.


where is the similarity damping coefficient, is amplitude modulation and is the similarity measurement between two feature descriptors. Figure 8 demonstrates when and .

Figure 8: Similarity filter with and .

3.2.3 Partition Combination

The bottom-up composition starts from the very bottom leaves. Composition process iteratively combines partitions from the next lower layer and this process continues until reaching the root of the quadtree.

For the leaves of the quadtree, each boundary is marked by a confidence value, , which is given by formula , where and are scale and similarity filters defined by equations 6 and 8 respectively. The relative scale descriptor, , is computed by equation 5 and is the similarity of two adjacent local segments. Figure 9 demonstrates the function where with , and .

Figure 9: with , and .

For these non-chaos leaves, the contours which form the closed areas and the borders of the leaves form the boundaries of local regions. For these chaos leaves, the boundaries only refer to the leaf borders. Each running time of bottom-up combination, four leaves are combined together to the next lower layer. Figure 10 shows that there are several possible cases during combining four leaves together.

  1. No interconnection happens during the combination;

  2. A new segment is formed by connecting several local regions which locate in different leaves;

  3. The boundaries of leaves happen to be the boundaries of segments.

Figure 10: Three cases of partition combination.

For the case (i) such as region A shown in Figure 10, is calculated when it is in leaf and there is no necessary to recalculate during the combination. But for the case (ii) such as region B, region B is connected by four local regions. Each is calculated separately. But the region B needs to be recalculated after four local regions combined together since the new combined segment is located in a large partition and the relative scale descriptor is increased. Besides, a feature descriptor for region B is generated based on feature descriptors for each local region. The third case is that the boundaries of partitions happen to be the boundaries of the regions. One example is region C shown in Figure 10. During the combination process, algorithm also calculates of the leaf boundaries. After combination hits the root of the quadtree, the process generates the initial segment set , the boundary confidence set , where indicates the boundary probability between segments and , and the feature descriptor set . Follows is the Algorithm 2 of iterative bottom-up composition.

1em boxed

input : : A decomposition quadtree
: root of
output : : the initial segment set
: the boundary confidence set
: the feature descriptor set
noderead a node of by ;
if node has children then
       Read the four children of node as, ,,,;
       //Combine partitions. Recompute if needed
       //This node is a leaf of
       imgcomputelocalcnf(node); //
       return img;
Algorithm 2 Bottomupcomposition


4 Hierarchical Probability Segmentation

Hierarchical segmentation is a widely used technique for image segmentation. Regular hierarchical segmentation is modeled in layer-built structures. Compared with the regular hierarchical structures, Hierarchical Probability Segmentation, HPS, presents the hierarchical segmentation by a Probability Binary Tree (PBT,) where the links are weighted by the confidence values, . The root represents an image. Nodes represent segments and the children of a node are the sub-segments of this node. Initial segments compose the leaves of PBT. Since PBT is generated in greedy manner, higher level nodes always have higher probabilities than the lower level nodes. One can visualize the PBT in arbitrary number of segments. Of course, this number is less than the number of the initial segments.

4.1 Probability Binary Tree (PBT)

Definition 6 (Probability Binary Tree (PBT):).

Let denote the root of a PBT and represents the original images . Nodes of PBT denote the segments and links represent the relationship of inclusion. Assume nodes , , and links , , where and are children of linked by and respectively. , , , , preserve the following properties:

  1. Let denote the set of all the possible pairs of sub-segments of . Function gives the of segments and . Assume and . Therefore are weighted by ;

  2. for any element of , , and .

Definition 6 recursively gives the definition of PBT, which has the following properties:

  1. Every PBT node (except root) is contained in exactly one parent node;

  2. every PBT node (except the leaves) is spanned by two child nodes;

  3. a number of pairs of nodes span . These pairs are candidates to be the children of and each pair is labeled with the probability of these two nodes, . PBT chooses the pairs , which have the highest to span the node . The links and are weighted by ;

  4. assume a node with a link pointed in from its parent and the links , pointing out to its children. Weights of , must be no larger than the weight of ;

  5. if two nodes (segments) overlap, one of them must be a child of the other.

By presenting the segmentation in PBT, the images are recursively partitioned in two segments with the highest probability among all the possible pair of segments. Figure 11 gives an example of the PBT.

Figure 11: Probability Binary Tree.

Let denote the set of boundary probabilities. represents the boundary probability between initial segments and , where , . HPS constructs PBT in bottom-up manner, which means leaves are first created and the root is the last node created. To generate a PBT in greedy schema, is sorted in ascending order. Let be a PBT and be the ascending order sequence of . Algorithm 3 describes to generate a .

1em boxed

input : : sorted in ascending order
output : : a probability binary tree, PBT
while  is not empty do
       read the first element of and remove it from ;
       read the index of segment from ;
       read the index of segment from ;
       if  is not exist in  then
             Create a node in ;
             read the from ;
      if  is not exist in  then
             Create a node in ;
             read the from ;
      Create a new node , which is the parent of and ;
       Create the links from to and from to weighted by ;
       Replace index and in current by the index of and remove the duplicate elements in ;
Algorithm 3 Generating a probability binary tree.


4.2 Visualization of the Segmentation

Generally, there are two ways to visualize the segments. One is threshold-based visualization and another is number-based visualization. As discussed in previous section, the root of the PBT represents the original images. For the other nodes, the more shallow the positions are, the more coarse-gradient the segments are. For example, visualizing image in segments and shown in Figure 11 is combining initial segments , and together to form segment and combining initial segments and together to form segment .

Suppose a threshold, , is selected for visualization. By the properties of the PBT, the weights of the links are the probabilities of the segments. The weighs are decreasing as long as the depths are increasing. Given a , threshold-based visualization only displays the segments whose link weights are greater than the given .

The number-based visualization displays a certain number of segments. Let denote the number of visualization segments. The implementation of number-based visualization is trivial. Algorithm sorts the nodes in descending order with respect to the link weights and picks the first number of nodes to display.

5 Algorithm Complexity Analysis

Since the algorithm is divided into two stages, EDHS and HPS, we discuss the computational complexity of them separately.

5.1 EDHS Computational Complexity

Assume the depth of the quadtree generated by top-down decomposition is and the depth of the root is zero. The maximum is , where is the size of original images and is the chaos threshold. Depending on the different images and the chosen stopping condition , decomposition process generates an unbalance quadtree with depth of . To analysis the complexity of the decomposition, we assume the worst cases that the images are fully decomposed. It implies that depth of all the leaves is . At the th level of the quadtree, there are numbers of nodes and the size of each node is . Then the running time of computing the stopping condition of the ith depth is and the total running time of decomposition is . Commonly, the time complexity of an edge detector is such as Canny Edge Detection [6]. Plus the time complexity of generating the feature descriptors . The running time of top-down decomposition is , which gives the time complexity of top-down decomposition .

The combination process starts from the leaves to calculate the boundary confidence by , which gives the running time for each leaf (leaf size is ) and total running time is since there are numbers of leaves totally. At the th level of the quadtree, the composition process combines the four quadrants into one, which gives the running time and total running time of the th depth is . Then the total running time of bottom-up composition is . It can be proved when is large enough, term of dominates the running time. It gives the time complexity . Then the time complexity of the EDHS is , where is the depth of the quadtree and mn is the size of the input images.

5.2 HPS Computational Complexity

To make the analysis simple, we assume maximum is (under the worst situation.) It suggests that the maximum number leaves is