 # Hierarchy construction schemes within the Scale set framework

Segmentation algorithms based on an energy minimisation framework often depend on a scale parameter which balances a fit to data and a regularising term. Irregular pyramids are defined as a stack of graphs successively reduced. Within this framework, the scale is often defined implicitly as the height in the pyramid. However, each level of an irregular pyramid can not usually be readily associated to the global optimum of an energy or a global criterion on the base level graph. This last drawback is addressed by the scale set framework designed by Guigues. The methods designed by this author allow to build a hierarchy and to design cuts within this hierarchy which globally minimise an energy. This paper studies the influence of the construction scheme of the initial hierarchy on the resulting optimal cuts. We propose one sequential and one parallel method with two variations within both. Our sequential methods provide partitions near the global optima while parallel methods require less execution times than the sequential method of Guigues even on sequential machines.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Despite much efforts and significant progresses in recent years, image segmentation remains a notoriously challenging computer vision problem. It’s usually a preliminary step towards image interpretation and plays a major role in many applications.

The use of an energy minimisation scheme within the region based segmentation framework allows to define criteria which should be globally optimised over a partition. Several types of methods such as the Level set , the Bayesian , the minimum description length  and the minimal cut  frameworks are based on this approach. Within these frameworks the energy of a partition is usually defined as where and denote respectively the fit to data and the regularising term. The energy corresponds to the Lagrangian of the constraint problem: minimise subject to . Where is a function of . Under large assumptions, minimising is also equivalent to the dual problem: minimise subject to , where is also a function of . Therefore may be interpreted as the amount of freedom allowed to minimise () while keeping as low as possible. Since is a growing function of , as is growing, the constraint on is more and more relaxed while the importance of the term is getting more and more important. This parameter may thus be interpreted as a scale parameter which represents the relative weighting between the two energy terms.

In many approaches the parameter is fixed experimentally and a minimisation algorithm determines for a value of a locally optimal partition from the set of all the possible partitions on image . A sequence of may also be defined a priori in order to compute the optimal partition on each sampled value of  .

The scale set framework proposed by Guigues  is based on a different approach. Instead of performing the minimisation scheme on the whole set of possible partitions of an image , Guigues proposes to restrict the search on a hierarchy . The advantages of this approach are twofold: firstly as shown by Guigues the globally optimal partition on may be found efficiently while the search on the whole set of partitions only provides local minima. Secondly, Guigues shown that if the energy satisfies some basic properties, the whole set of solutions on when describes corresponds to a sequence of increasing cuts within the hierarchy hereby providing a contiguous representation of the solutions for the parameter . A method to build the hierarchy has been proposed by Guigues. Since the research space used by Guigues is restricted to the initial hierarchy the construction scheme of this hierarchy is of crucial importance for the optimal partitions within built in the second step.

This paper explores different heuristics to build the initial hierarchy. These heuristics represent different compromises between the energy of the final partitions and the execution times. We first present in Section

2 the scale set framework. The different heuristics are then presented in Section 3. These heuristics are evaluated and compared to the method of Guigues in Section 4.

## 2 The Scale Set framework

Given an image and two partitions and on , we will say that is finer than (or is coarser then ) iff may be deduced from by merging operations. This relationship is denoted by . Let us now consider a theoretic segmentation algorithm parametrised by . We will say that is an unbiased multi-scale segmentation algorithm iff for any couple such that , and any image , . If is an unbiased multi-scale segmentation algorithm, increases according to and the set defines a hierarchy as an union of nested partitions. Note that the set of partitions on being finite, must be also finite.

Unbiased multi-scale segmentation algorithms follow a well known causal principal: increasing the scale of observation should not create new information. In other words any phenomenon observed at one scale should be caused by objects defined at finer scales. In our framework, increasing the scale should not create new contours.

The family of energies considered by Guigues corresponds to the set of Affine Separable Energies (ASE) which can be written for any partition of in regions as:

 E(P)=D(P)+λC(P)=n∑i=1D(Ri)+λn∑i=1C(Ri)=n∑i=1D(Ri)+λC(Ri)

Let us consider a hierarchy and the sequence of optimal cuts within . The approach of Guigues is based on the following result: If is an ASE and if is decreasing within :

 ∀(P,Q)∈PP◃Q⇒C(P)>C(Q)

then the sequence is an unbiased multi-scale segmentation. The union of all defines thus a new hierarchy within . The tree corresponding to the hierarchical structure of may be deduced from by merging with their fathers all the nodes which do not belong to any optimal cuts. Note that an equivalent result may be obtained if no condition is imposed to but if is increasing according to .

The restriction by Guigues of the research space to a hierarchy may thus be justified by the fact that the set of partitions produced by any unbiased multi-scale segmentation algorithm describes a hierarchy. Conversely, given a hierarchy , if the energy is an ASE with a decreasing term the sequence of optimal cuts of according to : is an unbiased multi-scale segmentation algorithm.

Given a partition , the decrease of may be equivalently expressed as a sub-additivity relationship:

 ∀(R,R′)∈P | R is adjacent to R′C(R∪R′)

Note that the sub-additivity of the regularising term in common is many applications. For example, if is proportional to some quantity summed up along contours, is sub-additive due to the removal of the common boundaries between the two merged regions. Moreover, the term may be interpreted within the Minimum Description Length framework  as the amount of information required to encode a partition. Therefore, one can expect to decrease when the partition gets coarser.

Given a hierarchy , the sequence of optimal cuts within has to be computed. Let us consider one region at the second level of the hierarchy (computed from the base) and its set of sons . Let us additionally consider the tree rooted at within (Fig. 1(a)). Since is a level node, the hierarchy allows only two cuts: one encoding the partition made of the sons of whose energy is equal to and one encoding the partition reduced to the single region . The energy of is equal to . Due to the sub additivity of we have . Therefore, using the linear expression of and in , if the line is below the line until a value of for which the two lines cross(Fig. 1(b)). If , is always greater or equal to in which case we set to . Therefore, in both cases the partition is associated to a lower energy than for until . Above this value the partition is associated to the lowest energy. In terms of optimal cuts, corresponds to the optimal cut of until and is the optimal cut above this value(Fig. 1(c)). The value is called the scale of appearance of the region .

Guigues shown that the above process may be generalised to the whole tree. Each node of is then valuated by a scale of appearance. Some of the nodes of may get a greater scale of appearance than their father. Such nodes do not belong to any optimal cut and are removed from during a cleaning step which merges them with their fathers. Each node of the resulting hierarchy belongs to an optimal cut from until the scale of appearance of its father , where denotes the father of in . The value may be set for each node of the tree using a bottom-up process. The optimal cut for a given value of may then be determined using a top-down process which selects in each branch of the tree the first node with a scale of appearance lower than . The set of selected nodes constitutes a cut of which is optimal by construction according to . The function corresponds to a concave piecewise linear function whose each linear interval corresponds to the energy of an optimal cut within (Fig. 1(d)).

Given a hierarchy and the function encoding the energy of the sequence of optimal cuts, the optimality of may be measured as the area under the curve for a given range of scales or as the area of the surface (Fig. 1(d)) between and the energy of the coarsest cut . Where denote the partition composed of a single region encoding the whole image. We propose in Section 4 an alternative measure of the quality of a hierarchy which allows to reduce the influence of the initial image.

Guigues proposed to build a hierarchy by using an initial partition and a strategy called the scale climbing. This strategy merges at each step the two adjacent regions and such that:

 λ+(R∪R′)=D(R∪R′)−D(R)−D(R′)C(R)+C(R′)−C(R∪R′)=min(R1,R2)∈P2,R1∼R2D(R1∪R2)−D(R1)−D(R2)C(R1)+C(R2)−C(R1∪R2) (2)

where denotes the current partition and indicates that and are adjacent in .

This process merges thus at each step the two regions whose union would appear at the lowest scale. Such a construction scheme is coherent with the further processes applied on the hierarchy. However, there is no evidence that the resulting hierarchy may be optimal according to any of the previously mentioned criteria. We indeed show in the next section that other construction schemes of a hierarchy may lead to lower energies.

## 3 Construction of the initial hierarchy

Many energies have been designed in order to encode different types of homogeneity criteria (piecewise constant [3, 6], linear or Polynomial  variations,…). This paper being devoted to the construction schemes of the hierarchy, we restrict our topic to the piecewise constant model described by Leclerc  and Mumford and Shah . The energy of this model may be written as:

 Eλ(P)=D(P)+λC(P)=n∑i=1SE(Ri)+λ|δ(Ri)| (3)

where represents the partition of the image, is the squared error of region and is the total length of its boundaries.

Within the Minimum Description Length framework, may be understood as the amount of information required to encode the deviation of the data against the model, while

is proportional to the amount of information required to encode the shape of the model. Within the statistical framework, the squared error may also be understood as the log of the probability that the region satisfies the model (i.e. is constant) using a Gaussian assumption while

is a regularising term.

Our approach follows the scale climbing strategy proposed by Guigues (equation 2). Given a set of regions within a partition we thus consider the scale of appearance of the region defined as the union of the regions in . The heuristics below use this basic approach but differ on the sets which are considered and on the ordering of the merge operations.

### 3.1 Sequential Merging

Given a current partition , let us consider for each region of , its set defined as union its set of neighbours and the set of all possible subsets of including . Each subset encodes a possible merging of the region with at least one of its neighbour. Let us denote by the region formed by the union of the regions in . Note that the region is connected since belongs to and all the regions of are adjacent to . Let us additionally consider the two partitions of : and . The energies associated to these partitions are respectively equal to and:

 Eλ(PW)=D(W)+λC(W)=∑R′∈WD(R′)+λ∑R′∈WC(R′)

where and denote respectively the fit to data and the regularising terms of the partition .

Since is sub additive (equation 1) we have . The energy is thus lower than until a value called the scale of appearance of (Section 2). Using the scale climbing principle, our sequential merging algorithm computes for each region of the partition the minimal scale of appearance of a region :

 λ+min(R)=argminW∈P∗(V(R))D(RW)−D(W)C(W)−C(RW)

the set which realises the min is denoted .

Given the quantities and , our sequential algorithm iterates the following steps:

1. Let denotes the current partition initialised with an initial partition ,

2. For each region of compute and

3. Compute and merge all the regions of .

4. If more than one region remains go to step 2,

5. Output the final hierarchy encoding the sequence of merge operations.

This algorithm performs thus one merge operation at each step of the algorithm. Note that all the regions of are adjacent to . Therefore, within the irregular pyramid framework, the merge operation may be encoded by a contraction kernel of depth one composed of a single tree whose root is equal to . The computation of for each region of the partition requires to traverse whose cardinal is equal to . Therefore, if the partition is encoded by a graph , the complexity of each step of our algorithm is bounded by where denotes the number of vertices (i.e. the number of regions) and represents the maximal vertices’s degree of . The cardinal of is decreased by at each iteration. Since is at least equal to , the cardinal of decreases by at least . The computation of for each region of the partition may induce important execution times when the degree of the vertices of the graph is important. However, experiments presented in Section 4 show that the cardinal of the subsets may be bounded without altering significantly the energy of the optimal cuts. Let us finally note that this algorithm includes the scale climbing approach proposed by Guigues. Indeed, the merge operations studied by Guigues (Section 2) correspond to the subsets with which are considered by our algorithm.

### 3.2 Parallel Merge algorithm

Our parallel merge algorithm is based on the notion of maximal matching. A set of edges of a graph is called a maximal matching if each vertex of is incident to at most one edge of and if is maximal according to this property. Moreover, we would like to design a maximal matching such that the scale of appearance of the regions produced by the contraction of is as low as possible. Let us denote by , the two vertices incident to . Using the same approach as in Section 3.1 we associate to each edge of the graph the scale of appearance (equation 2) of the region defined as the union of the regions encoded by the two vertices incident to . Following, the same approach as Haxhimusa  we define our maximal matching as a Maximal Independent Set on the set of edges of the graph. The iterative process which builds the maximal independent set selects at each step edges whose scale of appearance is locally minimal. This process may be formulated thanks to two boolean variables and attached to each edge such that:

 ⎧⎪⎨⎪⎩p1e=λ+(e)=mine′∈Γ(e){λ+(e′)}q1e=⋀e′∈Γ(e)¯¯¯¯¯¯p1e′ and ⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩pk+1e=pke∨(qke∧λ+(e)=mine′∈Γ(e) | qke′{λ+(e′)})qk+1e=⋀e′∈Γ(e)¯¯¯¯¯¯¯¯¯¯pk+1e′ (4)

where denotes the neighbourhood of the edge and is defined as .

This iterative process stops when no change occurs between two iterations. If denotes the final iteration, the set of edges such that is true defines a maximal matching  which encodes the set of edges to be contracted. Moreover, the set of selected edges corresponds to local minima according to the scale of appearance . Roughly speaking if is understood as a merge score, one edge between two vertices will be marked at iteration , if among all the remaining possible merge operations involving these two vertices, the one involving them is the one with the best merge score. Note that the construction of a maximal matching is only the first step of the method of Haxhimusa which completes this maximal matching in order to get a decimation ratio of order . The restriction of our method to a maximal matching allows to restrict the merge operations to edges which become locally optimal at a given iteration. We thus favour the energy criterion against the reduction factor. As shown by Bield , the reduction factor in terms of edges induced by the use of a maximal matching is a least equal to where is the maximal vertex’s degree of the graph. The edge’s decimation ratio may thus be very low for graphs with important vertices’s degrees. Nevertheless, experiments performed on 100 natural images of the Berkeley database1 shown that the mean vertex’s decimation ratio between levels on this database is equal to which is comparable to the decimation ratio obtained by Haxhimusa.

The local minima selected in equation 4 are computed on decreasing sets along the iterations in order to complete the maximal matching. We can thus consider that the detected minima are less and less significants as the iterations progress. We thus propose an alternative solution which consists in contracting at each step only the edges selected at the first iteration (). These edges correspond to minima computed on the whole neighbourhood of each edge. This method may be understood as a combination of the method proposed by Haxhimusa  and the stochastic decimation process of Jolion  which consists in merging immediately vertices corresponding to local minima.

## 4 Experiments Figure 2: Partitions of the mushroom and the fisherman images at different scales. Each line of the array corresponds to an heuristic whose acronym is indicated on the first column. (a) Execution time

The different heuristics presented in this paper have been evaluated on the Berkeley database. The evaluated heuristics include our parallel merge heuristic based on a maximal matching (MM) and the variation of this method() which merges at each step the edges selected during the first iteration (Section 3.2). We also evaluated our sequential method (SM) and two variations of this method: the first variation , considers for each region of the partition the subsets of cardinal of . This method corresponds to the heuristic proposed by Guigues. We also evaluated an intermediate method () which restricts the cardinal of the subsets of including to an upper threshold fixed to five in these experiments. All the experiments have used an initial partition obtained by a Watershed algorithm .

Fig. 2 shows optimal cuts obtained for increasing values of on the Mushroom and Fisherman images of the Berkeley database111Color plates are available at the following url: http://www.greyc.ensicaen.fr/jhpruvot/Cut/ . The heuristics used to build the hierarchies are displayed on the first column of Fig. 2. The original images are displayed in Fig. 4(a).

Fig. 3(a) shows the influence of the number of initial regions on the execution time. These curves have been obtained on the Mushroom image with different initial partitions obtained by varying the smoothing parameter of the gradient within our Watershed algorithm.

Fig. 3(b) allows to compare the performance of each heuristic on the whole Berkeley database. However, a direct comparison of the energies obtained by the different heuristics on different images would be meaningless since the shape of the function depends both of the intrinsic performances of the heuristic used to build and of the image on which has been built. We have thus to normalise the energies produced by the different heuristics before any comparison.

Given a hierarchy , since is an unbiased multi-scale segmentation (Section 2), the hierarchy obtained by each of our methods may be associated to a value above which the optimal partition is reduced to a single region encoding the whole image. The energy of is defined as: where denotes the global image’s squared error and the perimeter of the image. Since the energy of the optimal cuts of a hierarchy is a piecewise linear concave function of , the function is below the energy associated to the coarser partition(Fig. 3(c)). Moreover, if denotes the initial partition, the two points and belong to the curve. Therefore, being concave, it should be above the line connecting these two points. Finally, the line connecting to being below the line joining and we have for any hierarchy and any scale (Fig. 3(c)):

 λλmaxEλmax(Pmax)≤Eλ(C∗λ(H))≤Eλ(Pmax)

We obtain from this last inequality and after some calculus the following equation:

 ∀λ∈R+xλ≤1+xλ−11+xλEI≤Eλ(C∗λ(H))Eλ(Pmax)≤1 with xλ=λλmax and EI=λmaxCIDI (5)

Therefore, using the normalised energy, and the normalised scale , any curve lies in the upper left part of the unit cube . Note that this result is valid for any hierarchy and thus any heuristic.

Using our piecewise constant model (equation 3), the energy is roughly equal to the squared error of the image for small values of and may be interpreted as the global variation of the image. The normalised energy allows thus to reduce the influence of the global variation of the images on the energy and to compare energies computed with a same heuristic but on different images. Note however, that the use of the normalised scale discards the absolute value of . We thus do not take into account the range of scales for which the optimal cut is not reduced to the trivial partition . However, the absolute value of varies according to each image and each heuristics. The normalised scale allows thus to remove the influence of the image. Moreover, our experiments shown thus that for each image, our different heuristics obtain close values.

Fig. 3(b) represents for each value of and each heuristic, the mean value of the normalised energy computed on the whole set of images of the Berckley database.

As shown in Fig 3(b) the energy of the optimal cuts obtained by the heuristic () is lower than the one obtained by the maximal matching heuristic (). This result is confirmed by Fig. 2 (lines and ) where the heuristic removes more details of the mushroom at a given scale. This result is connected to the greater decimation ratio of the heuristic. The heuristic merges at each step regions with important scale of appearance without considering regions which may appear at further steps. The algorithms and induce equivalent execution times on a sequential machine. The execution times of the method () are overlayed by the ones of the method () in Fig. 3(a) due to the vertical scale of this figure.

The subjective quality of the partitions obtained by the heuristics and (Fig. 2) seems roughly similar. We can notice that the heuristic seems to produce slightly coarser partitions at each scale. However, considering Fig. 3(b), the optimal energy obtained by the heuristic () are lower than the one obtained by (). Note that the heuristic produces lower execution times than even on a sequential machine(Fig. 3(a)).

As shown by Fig. 3(b) the optimal energies produced by the heuristic () are always below the one produced by the heuristic (). Note that, the curve () is close to the diagonal of the square . This last point indicates that on most of the images of the Berkeley database the hierarchies produced by the heuristic provide optimal cuts whose normalised energy is closed from the lower bound of the optimal cut’s energies (equation 5). This result is confirmed by Fig. 2 where the heuristic preserves more details of the image at each scale. However, the heuristic is the one which requires the more important execution times on a sequential machine (Fig. 3(a)).

The heuristic may be understood as a compromise between and . As shown by Fig. 3(b) the optimal energies obtained by the heuristic ( ) are close to the one obtain by () and below the one obtained by (). Moreover, as shown by Fig. 3(a), the execution times required by are between the one required by the heuristics and . Finally, the partitions obtained by the heuristic in Fig. 2 are closed from the one obtained by the heuristic .

Fig. 4 shows results obtained using an other fit to data criterion based on the intuitive notion of contrast. The basic idea of this criterion  states that a region should have a higher contrast with its neighbours (called external contrast) than within its eventual subparts (called internal contrast). Let us denote by the mean gradient computed along the contour associated to an edge . The internal and external contrasts of a region are then respectively defined as and . Where denotes the set of edges which have been contracted to define and denotes the set of edges incident to . Our new energy combines the contrast and the squared error criteria as follows:

 Eλ(P)=n∑i=1SE(Ri)(1+f(Int(Ri)Ext(Ri)))+λ|δ(Ri)| (6)

where

denotes a sigmoid function.

A contrasted region will thus have a low ratio between its internal and external contrast. Conversely, a poorly contrasted region may have a fit to data term close to twice its squared error. As shown by Fig. 3(b) and (c) this energy favours highly contrasted regions. For example, the cloud merged with the sky in Fig. 3(b) remains in Fig. 3(c). Moreover, experiments not reported here, shown us that the same type of discussion about the advantages and drawbacks of the different heuristics may be conducted on this new energy with the same conclusions.

## 5 Conclusion

The Scale Set framework is based on two steps: the determination of a hierarchy according to an energy criterion and the determination of optimal cuts within this hierarchy. We have presented in this article parallel and sequential heuristics to build such hierarchies. The normalised energy of the optimal cuts, associated with these hierarchy are bounded bellow by the diagonal of the unit square . Our experimental results suggest that our sequential heuristic provides hierarchies whose normalised energies are closed from this lower bound. This methods may however require important execution times. We thus propose an alternative heuristic providing lower execution time at the price of generally slightly higher optimal cut’s energies. Our parallel methods provide greater energies than the one produced by Guigues’s heuristic. However, these methods require less execution times even on sequential machine.

Hierarchies encoding a sequence of optimal cuts are usually composed of a lower number of levels and regions than the initial hierarchies built by our merge heuristics. In the future, we would like to use these hierarchies of optimal cuts in order to match two hierarchies encoding the content of two images sharing a significant part of a same scene.

## References

•  Lecellier, F., Jehan-Besson, S., Fadili, M., Aubert, G., Revenu, M., Saloux, E.: Region-based active contours with noise and shape priors. In: proceedings of ICIP’2006. (2006) 1649–1652
•  Geman, S., Geman, D.: Stochastic relaxation, gibbs distribution, and the bayesian restoration of images. IEEE Transactions on PAMI. 6(6) (1984) 721–741
•  Leclerc, Y.G.: Constructing simple stable descriptions for image partitioning. International Journal of Computer Vision 3(1) (1989) 73–102
•  Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transaction on PAMI 26(9) (2004) 1124–1137
•  Guigues, L., Cocquerez, J.P., Men, H.: Scale-sets image analysis. Int. J. Comput. Vision 68(3) (2006) 289–317
•  Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and associated variational problems. Communications on Pure Applied Mathematics 42 (1989) 577–685
•  Haxhimusa, Y., Glantz, R., Kropatsch, W.: Constructing stochastic pyramids by mides - maximal independent directed edge set. In Hancock, E., Vento, M., eds.: Proc. of GbR’2003. Volume 2726 of LNCS. (2003) 35–46
•  Biedl, T., Demaine, E.D., Duncan, C.A., Fleischer, R., Kobourov, S.G.: Tight bounds on maximal and maximum matching. Discrete Mathematics 285(Issues 1-3) (2004) 7–15
•  Jolion, J.M.: Data driven decimation of graphs. In Jolion, J.M., Kropatsch, W., Vento, M., eds.: Proceedings of

IAPR-TC15 Workshop on Graph based Representation in Pattern Recognition, Ischia-Italy (2001) 105–114

•  Brun, L., Mokhtari, M., Meyer, F.: Hierarchical watersheds within the combinatorial pyramid framework. In: Proc. of DGCI 2005. Volume 3429., IAPR-TC18, LNCS (2005) 34–44
•  Felzenszwalb, P., Huttenlocher, D.: Image segmentation using local variation. In: In Proceedings of IEEE Conference on CVPR, Santa Barbara, CA. (1998) 98–104