Matching-Constrained Active Contours

07/24/2013 ∙ by Junyan Wang, et al. ∙ Nanyang Technological University 0

In object segmentation by active contours, the initial contour is often required. Conventionally, the initial contour is provided by the user. This paper extends the conventional active contour model by incorporating feature matching in the formulation, which gives rise to a novel matching-constrained active contour. The numerical solution to the new optimization model provides an automated framework of object segmentation without user intervention. The main idea is to incorporate feature point matching as a constraint in active contour models. To this effect, we obtain a mathematical model of interior points to boundary contour such that matching of interior feature points gives contour alignment, and we formulate the matching score as a constraint to active contour model such that the feature matching of maximum score that gives the contour alignment provides the initial feasible solution to the constrained optimization model of segmentation. The constraint also ensures that the optimal contour does not deviate too much from the initial contour. Projected-gradient descent equations are derived to solve the constrained optimization. In the experiments, we show that our method is capable of achieving the automatic object segmentation, and it outperforms the related methods.



There are no comments yet.


page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Automatic object segmentation is desirable in many higher level computer vision tasks, such as object detection, tracking, scene understanding etc. Active contour model is one of the well-known models for object segmentation. The active contour tries to find the boundary contour of the target object. However, it generally requires the user to input a contour curve sufficiently close to the object boundary as the initial contour. Hence, the existing framework of active contours are generally semi-automatic. This paper presents a new contour optimization model of object segmentation based on active contours and feature matching. The numerical optimization of our model leads to an automatic object segmentation algorithm.

I-a Related works

Almost all the active contours fall into two categories, i.e. the edge-based and region-based active contour models. The edge-based active contour models, e.g. the Geodesic Active Contour [1], requires the initial contours provided by the user to be sufficiently close to the boundaries. This is because edges are local image feature distributed over the entire image domain, and it is crucial to determine which of the edges are of interest. Region-based active contours, e.g. the Chan-Vese model [2], are often insensitive to initialization. However, several image regions may share similar regional property, and the discriminating regional property of an object of interest against various backgrounds in the image is difficult to be modeled exactly beforehand. The difficulty in the modeling may reduce the suitability of the region-based active contour for object segmentation in real images. It is generally recognized that the two type of models suit for different images. In this work, we will be focusing on the edge-based active contours.

Relaxation of the user initialization in edge-based active contours for object segmentation has become a research topic. Xu and Prince in [3]

proposed the gradient vector field (GVF) to extend the localized gradients such that the large gradients at the boundary edges can influence the active contour far away from the edges. This allows the initial contours to be far from the edges. However, the active contour in GVF cannot find the boundary of interest when there are boundaries of other objects. Paragios et al.

[4] also applied the GVF to the level set based active contours to extract multiple objects. However, they reported that their method may not perform well when the initialization is arbitrary. Li et al. [5] proposed to split the contour for extracting multiple objects by evolving the curve in a segmented GVF. Xie and Mirmehdi [6] proposed a curve evolution formulation based on edge detection, which allows more flexible contour initialization, as an alternative to the conventional edge-based active contours. In our previous work [7], we have also addressed the problem of restrictive initialization according to the observations on the geometry of the gradient field. However, even though the initial contours can be more flexible, the indication of the object of interest by the users is still required. Previous attempt in [8] for automatic initialization of the edge-based active contours selects the initial contours that have approximately the minimum energy. In other words, it is assumed that most of the edges in the image are of interest so that they should be considered in the optimization. This assumption is valid for the images studied in [8] but it cannot be generalized.

The global optimization techniques for some region-based and edge-based active contour models have been proposed in [9] [10]. In [9], Cremers et al. proposed a branch-and-bound method for approximating the exhaustive search in a region-based active contour with shape prior modeling. Schoenemann and Cremers in [11] and [10] proposed a functional ratio energy for characterizing the object boundaries. The optimization is achieved via minimum ratio cycle algorithm by Lawler [12]. The global search methods do not require initial contour. Hence, these methods achieve automatic object segmentation. The idea behind these methods is to approximate the exhaustive search of the globally optimal contour curve over the entire image domain. These methods provide efficacious numerical solutions to global optimization of many active contour models. However, it can sometimes be observed that the object of interest is one of the local optimal solutions to the active contour models, as shown in the experiment section of this paper.

I-B Methodology

Fig. 1: Illustration of the feature point matching (with errors).

We address the automatic object segmentation following a divide-and-conquer strategy. We consider the problem of automatic object segmentation as a unity of two subproblems: the object detection and the boundary location. Unlike the global search methodology in [9] [10], the object detection can provide the coarse location of the object with a relatively high confidence. We expect to obtain the segmentation with the initialization provided by the detector. The philosophy is that if the object detection cannot be done properly, then the object segmentation is also hopeless. We aim at extending the formulation of active contour models by using object detection. There exist many effective object detectors based on classification, e.g. the Haar-like feature based detectors [13] and the HOG descriptor based detectors [14] as well as object matching [15].

Object matching based on locally discriminative feature matching finds a set of points on the object of interest in the image [15] [16] [17], achieving coarse object localization. The principle of the feature point matching is illustrated in Fig. 1, where the dots in two images are the feature points, and the lines across the images visualize the matching correspondences. In object matching, the feature point matching often starts from an interest point detection on a pair of images as in [15]

. Then the similarity between all possible pairs of matches are computed. Finally, the matching algorithm will assign a set of feature points in one image, possibly a subset of the entire feature set, to a set of feature points in the other image according to the point-wise similarity. A correspondence is often represented by using binary variable, i.e.

means matched means unmatched.

In this chapter, the formulation of active contour model is extended by using the formulations of object matching for automatic object segmentation. A difficulty is that the scatter of matched feature points cannot directly determine the object boundary or even approximate to the boundary reasonably. We propose to estimate the boundary contour of the object of interest by using the feature point matching. The boundary estimation can be used as the initial contour for active contours, and the initial contour is to be optimized in the active contour framework for object segmentation. This framework is also formulated as a constrained optimization model. In the experiment, we show that our method is capable of achieving accurate object extraction when the assumption in

[9] [11] [10] does not hold true, i.e., the object of interest is not the global optimal solution to active contour models.

I-C Contributions

Our contributions mainly lie in three aspects. a) We obtain a mathematical model of the points-to-shape relation. We assume that such relation is invariant to affine transformation. We use this model to estimate the boundary contour given the matched interior points. b) Using the model of points-to-shape relation, we also obtain an affine active contour in which the contour motion is determined by the motions on the inner points, and we assume the point motion is due to affine transformation to preserve the shape initialized by the matching. c) These two contributions have readily led to the segmentation framework, and we further formulate the framework as a novel constrained optimization problem, which is called matching-constrained active contour model. The initial contour generated by using point matching is the initial feasible solution to the constrained optimization. We derive the projected-gradient descent equations for solving the constrained optimization.

I-D Organization

The rest of the paper is organized as follows. In Section II, we introduce our mathematical model of the affine-invariant interior-points-to-shape relation, which leads up to the affine points-to-shape alignment. In Section III, we present the unified constrained optimization model of the framework, namely the matching-constrained active contour, and the projected gradient descent algorithm. In Section IV, we demonstrate our method for automatic object segmentation on real images of cluttered scenes, we compare our method with the state-of-the-art methods on example based object segmentation and we present the associated quantitative analysis with discussions. We conclude the paper with discussions on future directions in Section V.

Ii Modeling affine-invariant interior-points-to-shape relation

In this section, we present our mathematical model of the relation between interior points and the outlining contour shape, which enables the contour alignment based on feature matching.

Ii-a Points-to-shape relation as a binary classifier

The reference object shape can be represented by its silhouette. The shape silhouette of an object can be defined as a binary function as follows:


where is the object region and is the non-object region. This definition can be used to define the object boundary contour as follows:


where , is a Heaviside function and is a signed distance function. Generally, , where is a Dirac delta function. In other words, we employ the implicit shape representation in this work, which is motivated by the level set method [18]. The contour in the reference image is a set of discrete points and we propose to estimate the continuous contour shape by function approximation as follows:


where is the entire image domain, is the approximate of and denotes the solution space of .

In our context, is generated by the set of the position vectors of feature points , i.e. . If we consider the feature points

as randomly distributed points, we may adopt the radial basis function (RBF) for the function approximation:


where , is a Heaviside function and we may call the shape decision function in which is a kernel function, are the weights and bias to be determined. The points are the center points of the kernel functions. The sign of shape decision function determines whether a point belongs to the shape. An example of the model is shown in Fig. 2.

Combining the previous formulations in Eqs. (1), (3) and (4

), we obtain a binary classification problem based on RBF neural network in which the decision function is formed by only some of the positive samples, i.e. the feature points, but the training is accomplished by using both positive and negative samples over the entire image domain. We consider that the training is done by direct minimization of the fitting error in (

3) with respect to the parameters and . The gradient descent equations for learning the parameters are given in APPENDIX-A. The detailed learning strategy is presented in the section of experiment.

Fig. 2: A 2D visualization of the trained shape decision function overlayed the corresponding implicit contour (in red). The larger the intensity of the image is the more likely the point belongs to the shape region.

Ii-B Imposing affine invariance in the points-to-shape relation

In real images, the object shape and the positions of the interior feature points are often different from those given in the training session. Given the matched set of feature points in the image, we are left to determine the object shape. We assume that the contour shape of the object undergoes the transformation the same as that of the interior feature points, and we consider the affine transformation especially. This property is illustrated in Fig. 3. This assumption is common in vision. The behavior that the shape and points undergo the same affine transformation is the affine invariance property. However, we have not examined whether the implicit contour of the trained , or the shape decision function , have the affine invariance property. In the following, we will discuss about this issue.

Fig. 3: Illustration of the need of affine invariance. The point correspondences should lead to the shape alignment.

We choose the Gaussian function as the RBF kernel. Thus, the shape decision function can be written as follows.


The contour curves can be defined by the shape decision function according to , which is equivalent to Eq. (2) where is replaced by .

Let us consider the affine transformation of the kernel centers as follows:



is an invertible matrix,

is a translation vector. The corresponding shape decision function in terms of the transformed kernel centers can be written as follows:


The affine transformation of the contour points can be represented by , where , such that in (5). By substituting into Eq. (7), we obtain the following:

Regarding the affine invariance property, the above leads to the following.

Proposition II.1

There exists infinitely many and , such that for each , we have , if

is not an orthogonal matrix.

The proof is deferred to APPENDIX-B.

This means that the affine transformation of the contour curve defined by the trained shape decision function, , may not be the contour curve defined by the transformed shape decision function , which is an violation of the aforementioned affine invariance property. To address this problem, we propose a revised shape representation as follows.


Substituting into (8), we can verify that for general and , we have the following affine invariance.

where and defines an identity transformation. We consider that is parameterized by rather than to simplify the later derivation for optimizing .

We may further appreciate the significance of the affine invariance property by the example shown in Fig. 4. Suppose we are given the pair of initial points and shape in Fig. 2, the interior feature points of the leaf are then transformed by a predetermined affine transformation. The implicit contour curve defined by Eq. (7) is shown in Fig. 4(a). Note that Eq. (7) does not have the affine invariance property. The resultant shape, which tends to be a circle, differs significantly from a shape of a leaf. With the same transformation of the feature points, the implicit contour curves defined by the affine invariant shape decision function in Eq. (8) is shown in Fig. 4(b). Obviously, the affine-invariant implicit contour in Fig. 4(b) is like a leaf transformed resulting from the same transformation applied to the feature points, which is the affine-invariance property. With such an affine invariant interior-points-to-shape relation in Eq. (8), we can achieve the affine points-to-shape alignment. A result of the points-to-shape alignment is shown in Fig. 5 based on the feature matching presented in Fig. 1.

Fig. 4: Significance of affine invariance of the shape representation. The dots are the transformed feature points. The outlining curves are the implicit shape contours.

Fig. 5: A demo of points-to-shape alignment.

Lastly, we enumerate some of the advantages of the established interior-points-to-shape model over the explicit transformation of the contour shape given in the object template.

  1. The affine invariant interior-points-to-shape relation provides continuous implicit contours without reparametrization, which can simplify the implementation.

  2. Sometimes, the entire region of object shape, namely the shape silhouette, is of interest. However, the explicit transformation generally cannot provide a uniform region of shape. The proposed interior-points-to-shape relation provides the shape silhouette by definition.

Iii The matching-constrained active contour

The proposed model of affine-invariant interior-points-to-shape relation can be used for contour alignment given point matching. The implicit contours defined by the affine-invariant interior-points-to-shape relation are continuous and they do not require reparametrization when the shape changes. Such continuous geometrical shape representation is also desired by the active contour framework for segmentation. Therefore, we can reformulate the conventional active contour model by using our affine invariant points-to-shape relation. Due to the affine invariance, the solution to the reformulated active contour model will be an affine transformation of the initial contour. Hence, an affine shape prior adheres to the reformulated active contour model.

In this section, we propose a general active contour model based on the affine-invariant interior-points-to-shape relation, with a constraint of matching.

Iii-a Active contour model with matching constraint

Given the interior-point-to-shape relation, we are able to achieve affine contour alignment by affine point alignment. The affine point alignment can be obtained by using the matching correspondences. The aligned contour can then be used to initialize the curve evolution in the affine active contour. These all form an algorithmic framework. In this section, we propose an optimization model to unify the matching, alignment and curve evolution in a unified optimization framework. The motivation of our constrained optimization model lies in the notion of initial feasible solution. Specifically, the contour curve given by the feature points matching and alignment provides the initial feasible solution to the constrained optimization model of active contour. The local features matching can also be used to constrain the optimization. The model is as follows:


where and are the feature points on the template and the transformed sets of these feature points, denotes the implicit relation between the interior points and the contour shape defined by (8). is the abstract form of the active contour energy, is the abstract form of the cost of joint matching and alignment, and is a tolerance level. Note that the set of that corresponds to the minimal value of must be feasible to the inequality if there exists at least one feasible solution. By this formulation, we assume that the optimal contour can be obtained by transforming the boundary of the template object, and we assume that the transformation is affine.

Iii-B Projected gradient decent algorithm

Without knowing the actual formulations of the objective functional and the constraint , we are still able to derive the solution to the abstract optimization problem based on the idea of projected gradient descent method [19]. Our projected-gradient descent equations are as follows:


Note that these equations are not exactly the same as the ones in conventional projected gradient descent method in which inversion of a large matrix is needed. The rationale of these equations lies in the following property.


The derivation of this property is deferred to APPENDIX-C. The above indicates that the projected gradient descent algorithm governed by Eqs. (10) and (11) can reduce while leaving unchanged. This might be too strong for the inequality constrained optimization. In fact, we only require to be smaller than a predefined tolerance . Therefore, we implement the original gradient descent if , and we implement the full projected gradient algorithm, if . To implement the projected gradient descent algorithm, we require the explicit form of the gradients . , and .

Iii-C Gradient of active contour parameterized by affine motion of interior points

The projected gradient descent algorithm requires the explicit form of the gradients of the active contour energy, i.e. and . The derivation of the explicit form of is complex. Alternatively, we can obtain numerically by using . The explicit form of and can be written as follows:


in which is the normal of the contour and can be computed either numerically or exactly. The closed form expressions of and are needed for the computation, which are as follows:


The derivation is deferred to APPENDIX-D. Our derivation is not restricted to any specific active contour model. For the Geodesic Active Contour (GAC) [1], which is a well-known edge-based active contour and can locate the object boundary accurately given a good initialization, the functional gradient, , is the following:

where is an edge indicator function in which the stronger edge corresponds to smaller value, is the normal of the contour, is the contour curvature.

Iii-D Joint formulation of point matching and alignment

To obtain the explicit form of and , we require the explicit formulation of in the constraint. is a measure of the optimality of feature matching and point alignment. Our formulation is a slight variate of the simplest form of the linear model of of feature matching [20]. Our joint optimization model of matching and alignment can be written as follows:


where is the so-called matching cost measuring the distance between all pairs of features across the two images. We use the SIFT interest point detector and the SIFT feature [15]. is the approximation of the Dirac delta with a parameter , and is the set of target feature points in the target image, is the cost of the matching between and . is the relaxed matching indicator. has the following property:

where is a constant. In this model, the optimal is determined by the closeness between and . Thus, by aligning toward a proper , the matching cost can be minimized. However, the above model is easily trapped by degenerate solutions where . To avoid such degeneration, we have the following reformulation:


The constraint in terms of is therefore .

The gradients of the energy function in (16) are as follows:


where . We adopt the Gaussian function to approximate the Dirac delta. Thus, . We also normalize the Gaussian functions according to the constraint .

Given the detailed formulations presented previously, we present the entire framework of the matching-constrained active contour in the pseudo code form in Algorithm 1. We also visualize the algorithm in a diagram in Fig. 6.

Input : Input image , source points , initial , trained
Output : 
       /*     Points-to-shape alignment */
       Solving w.r.t. repeat
             Update by Eqs. (17) and (18)
      until Convergence Evaluate on image domain by Eq. (8) /*     Constrained optimization */
             if  then
                   Update and by using (13) and (14)
                   Update and by Eqs. (10) and (11)
            Evaluate image domain by Eq. (8)
      until Convergence
———————————————————————– is an matrix of feature vectors in the template image, is an matrix of the corresponding feature vectors in the target image.
Algorithm 1 Projected gradient descent active contour

Fig. 6: Flow chat of matching-constrained active contour framework.

The initial and are computed by using established point correspondences. We adopt the locally affine matching [17]

to provide the initial point correspondences. The matching score due to the locally affine matching is guaranteed to be high enough and the locally affine matching is robust to outliers.

Iv Experiments

In our implementation, we use the SIFT interest point detector and the SIFT feature [15], and we adopt the locally affine matching for producing the initial point correspondences. We are not confined to this choice of feature representation and matching algorithm. The in (9) is set in relation to the matching score from initial object matching.

We experiment on the real images taken from Mikolajczyk’s homepage 111, Caltech computer vision archive 222 and the ETHZ Toys dataset to evaluate our method for automatic object segmentation.

Iv-a Affine invariant shape modeling

Following the shape training process presented in Section II, we can obtain the affine invariant points-to-shape model. We present the template objects and the trained shape contours in Fig. 7. We also present the fitting errors during the training of the RBF for the shapes in Fig. 7.The training of the shape models converges stably.

Fig. 7: Three pairs of template objects (left) and the corresponding trained shape contours overlaying the training shape (right)
Fig. 8: Minimization of the errors of shape modeling corresponding to Figs. 7 (a) (b) and (c).

There is a parameter in the RBF based shape representation. We may use the fitting errors with fixed initial weights and threshold for selecting . Specifically, we select which corresponds to the highest fitting score () from a set of containing candidates. Fig. 9(a) shows the fitting scores w.r.t. for the three shapes. We also randomly select values of from and we implement the gradient descent learning to get the convergent shape models of the leaf shape corresponding to the random values of . The scores of the optimal fitting, in terms of the Jaccard shape similarity, given the random values of are shown in Fig. 9(b). We may observe that the peaks of the two curves are quite close, which means that the criteria for selecting with fixed weights and threshold is effective. The major benefit of the selection of before the shape model fitting is the computational efficiency.

Fig. 9: Experimental results on selection of the parameter . (a) is the plot of fitting scores (vertical) v.s. (horizontal), given the fixed initial weights and thresholds. (b) is the plot of fitting scores w.r.t. different values of given the fixed initial weights and thresholds (squares with curve) comparing with the fitting scores due to the optimization of the weights and threshold given the randomly selected (crosses with curve).

Upon the accomplishment of the shape model training, we then validate the claimed affine invariance of the shape model defined in Eq. (8). We take the trained leaf shape model for evaluation. The principle is to compare the contour shape implicitly defined by the transformed interior points with the explicitly transformed contour shape, which could be considered as the ground truth shapes. We choose the normalized Hausdorff distance, i.e. the Hausdorff distance normalized by the maximum distance between the two point sets, as the shape distance. We randomly generate the sets of parameters for the affine transformation without translation, leading up to randomly generated leaves. Some examples of the implicit contours overlaying the explicitly transformed contour points are shown in Figs. 10(a)-(i). We can observe that the proposed interior-points-to-shape relation defined in Eq. (8) is affine invariant. In Figs. 10(j)-(r). We also present the implicit contours due to the shape model without affine invariance, i.e. Eq. (7). The shape distances corresponding to Fig. 10 are summarized in Table I. In Fig. 11, we plot the shape distances corresponding to the two shape models for all the examples. We observe that even if the shape transformation is large, the distance between the explicitly and implicitly transformed contour shapes is still small for the affine-invariant shape model, whereas the shape model without the affine invariance deviates from the explicitly transformed shapes a lot. The explicitly transformed shapes can be considered as the ground-truth shapes.

Fig. 10: The significance of the affine invariance of the interior-points-to-shape relation. The top row shows shapes generated by the affine invariant model , i.e. Eq. (8). The bottom row shows the shapes generated by the model without affine invariance, i.e. Eq. (7). The dots are the interior feature points, the curves are the implicit contours, and the circles are the explicitly transformed contours.
Fig. 10 (a) (b) (c) (d) (e) (f) (g) (h) (i)
NHD 0.108 0.086 0.046 0.054 0.053 0.033 0.049 0.056 0.036
Fig. 10 (j) (k) (l) (m) (n) (o) (p) (q) (r)
NHD 0.180 0.116 0.113 0.130 0.154 0.080 0.100 0.114 0.126

NHD = Normalized Hausdorff distance

TABLE I: Shape distances corresponding to Fig. 10.

Fig. 11: The normalized Hausdorff distances between the shapes implicitly defined by the transformed interior points and the original shapes (horizontal) v.s. the distance between the explicitly transformed contour shapes and the original shape (vertical).

Iv-B The consequence of globally minimizing improper active contour energies

There could be various global optimization strategies for active contours such as those reported in [9] [10]. In this subsection, we show that the global optimal solution to improper active contour energies may not correspond to the target object. This claim should not depend on the choice of the optimization method. Hence, we adopt the exhaustive search for the optimization. To examine the global optimality of object shape in image for a given active contour energy, we ensure that the object shape in the image has been included in the search space.

In this experiment, we perform exhaustive search for both the Chan-Vese model and the GAC model with hard but correct shape priors. In the implementation, we search over 8 orientations of the given shape, i.e. , at all possible locations within the image domain. We make sure that this relatively small shape space covers roughly the correct object shape. A result is shown in Fig. 12. In this experiment, we have fixed the size of the shapes.

Fig. 12: The object segmentation by exhaustive search in simple image. (a) is the result by global search with Chan-Vese model (b) is the result by global search with GAC model

We also show that given the correct fixed size, the global search also may not provide a reliable result of object segmentation in Fig. 13. We can observe that the global optimal solution to the Chan-Vese model (at the top-left corner) provides the region in which the image values contrast the outer region most, and the GAC model locates the group of the strongest edges that fit the shape prior best while not necessarily being the target object. Neither of the results are satisfactory. This is because we are not able to ensure the object of interest to correspond to the globally optimal solution of the formulated energy minimization problem.

Fig. 13: The object segmentation by exhaustive search in a cluttered image. (a) is the result by global search with Chan-Vese model (b) is the result by global search with GAC model

There definitely exist cases that the global search can output satisfactory segmentation in relatively complex images given the correct size of the object. An example is shown in Fig. 14 (b), which is a result of the global optimization for the GAC model. However, the global optimization of the Chan-Vese model is unsatisfactory, since the object of interest does not have significant contrast against the background. If we allow the size to vary in the search space, the results are often undesirable, such as in Fig. 15. In this experiment, we include the correct size with additionally one smaller and one larger sizes in the search space. The boundary of the template deviates a bit from the object boundary in the result of global optimization for GAC in Fig. 15(b), since the gradients on the object boundary are small while the gradients inside the object are significant. The result of segmentation by Chan-Vese model is in Fig. 15(a).

Fig. 14: The object segmentation by exhaustive search in a cluttered image containing single object, with the correct object size. (a) is the result by global search with Chan-Vese model (b) is the result by global search with GAC model
Fig. 15: The object segmentation by exhaustive search in a cluttered image containing single object, without knowing the size. (a) is the result by global search with Chan-Vese model (b) is the result by global search with GAC model

The experimental results in this subsection show that if the energy measure is improper, the global optimal solution of the active contour does not necessarily correspond to the desired object boundary.

Iv-C Automatic object segmentation by matching-constrained active contour

In this subsection, we evaluate the matching-constrained active contour. The centroids of the template objects are set to be the origin . This position can be anywhere, and it does not affect the feature point matching since the formulation of feature point matching does not involve the absolute position.

From the point correspondences (obtained from SIFT matching) as shown in the left-most columns of Figs. 16, 17 and 18, we obtain the initial contours shown in the middle columns of the figures and we finally obtain the segmentation results shown in the right-most columns of the figures. We can observe that the initial contours produced by our interior-points-to-shape relation are close to the object boundaries, and the active contour further improves the boundary location. Our method is capable of achieving the desirable segmentation results. We also compare the initial and final segmentation results with manually labeled ground truth shapes by Jaccard similarity measure to validate our visual observations quantitatively. We summarize the results in Table II. We can see the improvement of the initial contour and the region enclosed by the final contour overlaps well with the ground truth. We also present the computational cost of the entire framework on the images in Table III for reference (Note that the entire process is run in MATLAB on a PC with Intel®Core i5-450M Processor and 4GB memory).

Fig. 16: Segmentation of an object of different poses in noisy images in the presence of occlusion by leaves, clutter background and indefinite boundary on the top. We label each triple of the figures from top to bottom as (a) (b) (c) and (d).
Fig. 17: Segmentation of an object of relatively complex shape under different poses and scales with non-ideal initial matchings. The top and bottom triple of figures are labeled as (a) and (b).
Fig. 18: Segmentation of an object of complex shape under different scale, clutter background and indefinite boundary.
Fig. 16 Fig. 17 Fig. 18
(a) (b) (c) (d) (a) (b)
Initial 0.73 0.75 0.74 0.63 0.53 0.68 0.65
Final 0.75 0.84 0.86 0.86 0.90 0.93 0.75
TABLE II: Jaccard similarity against the ground truth.
Figures Fig. 16(a) Fig. 16(b) Fig. 16(c) Fig. 16(d) Fig. 17(a) Fig. 17(b) Fig. 18
Size(pixel) 320400 320400 320400 320400 296448 282448 411408
M&R 17.83 18.3 18.53 17.9 18.44 16.27 10.05
GC 4.26 4.21 4.20 4.32 7.14 7.10 2.29
AC 98.14 40.93 82.03 650.56 504.97 401.95 82.95
Total 120.23 63.46 104.75 672.78 530.56 425.32 95.29

M&R=matching and registration, GC= generation of initial contour, AC = Active contour before convergence, Total = Total running time

TABLE III: Computational costs (seconds)

We have implicitly assumed that a better shape model leads to a better result of segmentation throughout the paper. We ascertain this by experiments. We use the shape models due to the random generated previously for extraction of the leaf in the top image in Fig 17. In Fig. 19, we show the strong correlation between the quality of shape modeling, in terms of fitting score, and the quality of segmentation, in terms of Jaccard similarity between the result and the ground truth.

Fig. 19: Quality of segmentation (vertical) v.s. score of model fitting (horizontal)

Iv-D Robustness to noise

We also evaluate our method under different noise level. We use the image in Fig. 17

(a). We add Gaussian noise to the image. The means of the Gaussian noise are set to be zero and the standard deviations are varied from 1 to 20. For each noise level, we create 30 images and we apply our method to the images. The boxplot of the results of segmentation are shown in Fig.

20. We can observe that when the standard deviation of the noise is below , our method is robust. However, the decay of accuracy is sharp when the standard deviation is larger than . Hence, the proposed method may be sensitive to noise. This is because we adopt the SIFT feature for matching. This problem of sensitivity to noise in object matching have been addressed by robust feature representations such as the PCA-SIFT [21].

Fig. 20: Segmentation under different noise level. (a) is the boxplot of the accuracy v.s. noise level. (b) and (c) are the images with noise. The standard deviation of the noise in (b) is 5, and the other is 20.

Iv-E Experimental comparison

In this subsection, we conduct experiments for comparing our method with other related methods, including global optimization of the GAC model (global search, or GS), co-segmentation based on discriminative clustering (DCCoSeg) [22] and co-segmentation based on submodular optimization (SOCoSeg) [23]. We use eight orientations and 3 scales in the global search. The implementations of the co-segmentation algorithms are taken from the authors’ websites. All the methods in the comparison require an object example. The task is also the same, namely to outline the same or similar object in the image of interest.

We conduct a quantitative experimental comparison. To eliminate miscellaneous factors in the experiment as many as possible, we use images generated by randomly transformation of the same image which is the leaf image shown in Fig. 12. We use 50 random affine transformation matrices to generate 50 random images. The object example has been shown in Fig. 7. The quantitative results are shown in Fig. 21. We can observe that our method, matching-constrained active contour (MCAC), significantly outperforms the others. Some visual results are also shown in Fig. 22. We can observe that our method can locate the object boundary satisfactorily. However, the results of global search may deviate from the object of interest because of insufficient information on the object. The co-segmentation methods may produce a lot spurious contours and some times they may also miss the object. This is because the formulation of co-segmentation generally does not include a sensible object model. Our model integrates feature matching, shape prior modeling and object boundary modeling, all of which are practically useful object models.

Fig. 21: Quantitative comparison
Fig. 22: Visual comparison. The first row to the row shows the results of our method, global search, discriminative clustering based co-segmentation [22] and submodular optimization [23]

Iv-F Discussions

There is a tolerance level in the formulation of the matching-constrained active contour to be determined by the user. We provide some observations to help the users to select their appropriate . Since the small may only allow small deviation from the initial solution even if the boundary is not very close to the initial contour. Hence, we may not need to constrain the optimization intensively. In the images for evaluation, the boundary of the objects of interest corresponds to the local optimal solutions to the active contour energy. Hence, we only need the gradient descent given the initial feasible solution produced by optimal matching without compromising to the constraint. The constraint should be more useful during the process of searching for the optimal solution if the boundary is actually not a local optimal solution to the active contour model.

Objects may deform both rigidly and non-rigidly. In this paper, we have considered a general type of deformation, i.e. the affine transformation. The more general non-rigid deformations can be estimated by following the nonrigid shape prior modeling, e.g. [24] [25] [26], once the affine shape transformation is known.

Robust detection of the convergence of the gradient descent based method is a research problem in many areas such as optimization, machine learning. In our experiments, we terminate the algorithm if the average change of the affine parameters is less than a threshold or the maximum iteration is reached.

The computational cost depends on the number of selected feature points on the template object. A careful selection of the feature points can reduce the computational cost drastically while preserving the segmentation accuracy.

V Conclusion and future work

In this paper, we proposed a novel constrained optimization formulation of active contours. The numerical optimization of this new active contour model leads to an automatic object segmentation algorithm. This work expands the capacity of the conventional active contour approach for object segmentation, and the extension has practical significance in that the conventional semi-automatic framework has been automated.

There are several possible future directions. 1) The shape modeling in our affine-invariant interior-point-to-shape relation can be improved, and the computations can be accelerated by cleverly selecting the feature points. 2) The object matching is based on single template. However, the appearance of the object of interest could vary significantly in different images. Robust object matching is crucial to our method. 3) There could be interesting applications of this method, such as in the training phase of general object recognition tasks.

Appendix-a Gradient descent equations for shape training

The gradient descent equations for learning the parameters and in the shape decision function in Eq. (4) are as follows:

where is the first order derivative of . The trained contour curve is defined by Eq. (2) in which is replaced by the trained .

Appendix-B Proof of Proposition ii.1


Let , we have that . Our objective is to show in general. Since , and is a nonzero symmetric matrix. We can write

where and . This is a result of eigen-decomposition.

Obviously, there exists one vector such that the above does not equal for any nonzero . By almost arbitrary scaling of , we obtain infinitely many such vectors, which completes the proof.       

Appendix-C Derivations of Eq. (12)

In the following, we present the derivations of Eq. (12).

Appendix-D Derivation of Eqs. (13) and (14)

Our derivation is based on the following equality for minimizing general active contour energy.


Note that the equality holds true for geometric active contours.

The differential of the shape decision function at the implicit contour leads to the following.


Substituting Eq. (D-1) into the above, we obtain the expression for .


Substituting the (D-3) into (D-1) we obtain the curve evolution as follows.


To minimize a general active contour energy , we require the derivative of to be non-positive as


where is the inner product of two vector functions in which is the vector inner product. Substituting (D-4) into (D-5), considering , we obtain the following:




which gives Eqs. (13) and (14). In the gradient descent process, we can set and .


  • [1] V. Caselles, R. Kimmel, and G. Sapiro, “Geodesic active contour,” International Journal of Computer Vision, vol. 22, no. 1, pp. 61–79, 1997.
  • [2] T. Chan and L. Vese, “Active contours without edges,” IEEE Transactions on Image Processing, vol. 10, no. 2, pp. 266–277, 2001.
  • [3] C. Xu and J. L. Prince, “Snakes, shapes, and gradient vector flow,” IEEE Transactions on Image Processing, vol. 7, no. 3, pp. 359–369, 1998.
  • [4] N. Paragios, O. Mellina-Gottardo, and V. Ramesh, “Gradient vector flow fast geometric active contours,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 3, pp. 402–407, 2004.
  • [5] C. Li, J. Liu, and M. D. Fox, “Segmentation of edge preserving gradient vector flow: An approach toward automatically initializing and splitting of snakes,” in

    IEEE Computer Society Conference on Computer Vision and Pattern Recognition

    , 2005.
  • [6] X. Xie and M. Mirmehdi, “Mac: Magnetostatic active contour model,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 4, pp. 632–646, 2008.
  • [7] J. Wang, K. L. Chan, and Y. Wang, “On the stationary solution of PDE based curve evolution,” in Proceedings of the 19th British Machine Vision Conference, 2008.
  • [8] B. Li and S. T. Acton, “Automatic active model initialization via poisson inverse gradient,” IEEE Transactions on Image Processing, vol. 17, no. 8, pp. 1406–1420, 2008.
  • [9] D. Cremers, F. R. Schmidt, and F. Barthel, “Shape priors in variational image segmentation: Convexity, lipschitz continuity and globally optimal solutions,” in IEEE Conference on Computer Vision and Pattern Recognition, 2008.
  • [10] T. Schoenemann and D. Cremers, “A combinatorial solution for model-based image segmentation and real-time tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, pp. 1153–1164, 2010.
  • [11] ——, “Globally optimal image segmentation with an elastic shape prior,” in Procedings of the 12th International Conference on Computer Vision, 2007.
  • [12] E. L. Lawler, “Optimal cycles in doubly weighted linear graphs,” in Theory of Graphs: International Symposium, 1966, pp. 209–213.
  • [13]

    P. Viola and M. J. Jones, “Robust real-time face detection,”

    International Journal of Computer Vision, vol. 57, pp. 137–154, May 2004.
  • [14] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005.
  • [15] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, pp. 91–110, 2004.
  • [16] H. Jiang and S. Yu, “Linear solution to scale and rotation invariant object matching,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2009.
  • [17] H. Li, E. Kim, X. Huang, and L. He, “Object matching with a locally affine-invariant constraint.” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
  • [18] S. Osher and J. A. Sethian, “Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations,” Journal of Computational Physics, vol. 79, pp. 12–49, 1988.
  • [19] D. G. Luenberger and Y. Ye, Linear and Nonlinear Programming, Second Edition, 3rd ed.   Springer, 2008.
  • [20]

    H. Jiang, M. S. Drew, and Z.-N. Li, “Matching by linear programming and successive convexification,”

    TPAMI, vol. 29, pp. 959–975, June 2007.
  • [21] Y. Ke and R. Sukthankar, “Pca-sift: A more distinctive representation for local image descriptors,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition.   Los Alamitos, CA, USA: IEEE Computer Society, 2004.
  • [22] A. Joulin, F. Bach, and J. Ponce, “Multi-class cosegmentation,” in IEEE Computer Sociaty Conference on Computer Vision and Pattern Recognition, 2012.
  • [23] G. Kim, E. P. Xing, L. Fei-Fei, and T. Kanade, “Distributed cosegmentation via submodular optimization on anisotropic diffusion,” in Proceedings of the 2011 International Conference on Computer Vision, 2011, pp. 169–176.
  • [24] P. Etyngier, F. Segonne, and R. Keriven, “Shape priors using manifold learning techniques,” in Proceedings of the Eleventh IEEE International Conference on Computer Vision, 2007.
  • [25] J. Wang and K. L. Chan, “Shape evolution for rigid and nonrigid shape registration and recovery,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2009.
  • [26] S. H. Joshi and A. Srivastava, “Intrinsic bayesian active contours for extraction of object boundaries in images,” International Journal of Computer Vision, vol. 81, no. 3, pp. 331–355, 2009.