Contour is one of the most important object descriptors, along with texture and color. The boundary of an object in an image is encoded in contour description, which is useful in various applications, such as image retrieval[chuang1996, zhang2002, zhang2004review], recognition [mokhtarian1992, shotton2008, xu2012], and segmentation [xie2020, xu2019, maninis2017, zhen2020joint, peng2020]. It is desirable to represent object boundaries compactly, as well as faithfully, but it is challenging to design such contour descriptors due to the diversity and complexity of object shapes.
Early contour descriptors were developed mainly for image retrieval [mokhtarian1992, chuang1996, zhang2002, zhang2004review]. An object contour can be simply represented based on the area, circularity, and/or eccentricity of the object [young1974]. For more precise description, there are several approaches, including shape signature [davies2004, van1991, xie2020], structural analysis [freeman1978, perez1994, dierckx1995, cinque1998, xu2019], spectral analysis [chuang1996, zhang2002], and curvature scale space (CSS) [mokhtarian1992, dudek1997].
Recently, contour descriptors have been incorporated into deep-learning-based object detection, tracking, and segmentation systems. In[zhou2019bottom], bounding boxes are replaced by polygons to enclose objects more tightly. In [xin2019fast], ellipse fitting is done to produce a rotated box of a target object to be tracked. For instance segmentation, contour-based techniques have been proposed that represent pixelwise masks by contour descriptors based on shape signature [xie2020] or polynomial fitting [xu2019]. Even though these descriptors can localize an object effectively, they may fail to reconstruct the object boundary faithfully. Also, they consider the structural information of an individual object only, without exploiting the shape correlation between different objects.
In this paper, we propose novel contour descriptors, called eigencontours, based on low-rank approximation. First, we construct a contour matrix containing all object boundaries in a training set. Second, we decompose the contour matrix into eigencontours, based on the best rank-
approximation of singular value decomposition (SVD)[y2015SVD]. Then, each contour is represented by a linear combination of the eigencontours, as illustrated in Figure 1. Also, we incorporate the eigencontours into an instance segmentation framework. Experimental results demonstrate that the proposed eigencontours can represent object boundaries more effectively and more efficiently than the existing contour descriptors [xie2020, xu2019]. Moreover, utilizing the existing framework of YOLOv3 [redmon2018], the proposed algorithm yields promising instance segmentation performances on various datasets — KINS [qi2019kins], SBD [hariharan2011], and COCO2017 [lin2014].
This work has the following contributions:
We propose the notion of eigencontours — data-driven contour descriptors based on SVD — to represent object boundaries as faithfully as possible with a limited number of coefficients.
The proposed algorithm can represent object boundaries more effectively and more efficiently than the existing contour descriptors.
The proposed algorithm outperforms conventional contour-based techniques in instance segmentation.
2 Related Work
The goal of contour description is to represent the boundary of an object in an image compactly and faithfully. Simple contour descriptors are based on the area, circularity, and/or eccentricity of an object [young1974], and basic geometric shapes, such as rectangles and ellipses, can be also used. However, these simple descriptors cannot preserve the original shape of an object faithfully [zhang2020mask], [shen2021dct]. For more sophisticated description, there are four types of approaches: shape signature [davies2004, van1991, xie2020], structural analysis [freeman1978, perez1994, cinque1998, xu2019], spectral analysis [chuang1996, zhang2002], and CSS [mokhtarian1992, dudek1997]. First, a shape signature is a one-dimensional function derived from the boundary coordinates of an object. For example, a polar coordinate system is set up with respect to the centroid of an object. Then, the object boundary is represented by the graph, called the centroidal profile [davies2004]
. Also, an object shape can be represented by the angle between the tangent vector at each contour point and the-axis [van1991]. Second, structural methods divide an object boundary into segments and approximate each segment to encode the whole boundary. In [freeman1978], the boundary is represented by a sequence of unit vectors with a few possible directions. In [perez1994], polygonal approximation is performed to globally minimize the errors from an approximated polygon to the original boundary. In [cinque1998], segments of an object contour are represented by cubic polynomials. Third, in spectral methods, boundary coordinates are transformed to a spectral domain. In [chuang1996], a wavelet transform is used for contour description. In [zhang2002], the Fourier descriptors are derived from the Fourier series of centroidal profiles. Fourth, in CSS [mokhtarian1992]
, a boundary is smoothed by a Gaussian filter with a varying standard deviation. Then, the boundary is represented by the curvature zero-crossing points of the smoothed curve at each standard deviation.
Recently, attempts have been made to improve the performances of deep-learning-based vision systems. In [zhou2019bottom], a bounding box for object detection is replaced by an octagon to enclose an object more tightly via polygonal approximation. In [xin2019fast], a rotated box for a target object is determined based on ellipse fitting, in order to cope with object deformation in a visual tracking system. For instance segmentation, contour-based approaches [xu2019, xie2020] have been developed, which reformulate the pixelwise classification task as the boundary regression of an object. To this end, these methods encode segmentation masks into contour descriptors. In [xie2020], centroidal profiles are used to describe object boundaries. In [xu2019], each segment of a boundary is represented by a few coefficients based on polynomial fitting. Although these methods are computationally efficient for localizing object instances, they often fail to reconstruct the boundaries of the object shapes faithfully.
The proposed algorithm aims to represent an object boundary as faithfully as possible by employing as few coefficients as possible. To this end, we develop eigencontours based on the best low-rank approximation property of SVD.
3 Proposed Algorithm
Instead of deriving contour descriptors based on prior assumptions on object boundaries, such as rectangular, elliptical, or polynomial models, we develop eigencontours by analyzing boundary data in a training set. In this sense, the proposed eigencontours are data-driven descriptors. Figure 2 is an overview of the proposed algorithm. First, we compose a contour matrix, containing all object boundaries in a training set. Second, we approximate the matrix, by performing the best rank- approximation, to determine eigencontours. Third, we represent an object boundary by a linear combination of the eigencontours.
3.1 Mathematical Formulation
SVD and principal component analysis (PCA) are used in various fields to achieve dimensionality reduction and represent data concisely[Linear2012, y2015SVD, jin2022]. In this paper, we use SVD to represent object boundaries compactly and reliably. More specifically, we adopt a data-driven approach to exploit the distribution of object contours in a training set, instead of performing curve fitting [cinque1998] or Fourier analysis [zhang2002], in order to represent object boundaries efficiently in a low-dimensional space.
Star-convex contour generation: There is a tradeoff between accuracy and simplicity of a contour representation scheme: an accurate representation yields a high-dimensional feature vector, while too simple a representation cannot describe complicated boundaries precisely. To strike a good balance, we adopt the star-convexity assumption of object shapes. A regional set (or shape) is star-convex [stanek1977characterization] if it contains a point such that the line segment from the point to any point in the set is contained in the set. Then, a star-convex contour is defined as the set of boundary points of a star-convex set. For example, Figure 3(a) is not a star-convex contour, but Figure 3(b) is a star-convex one.
To represent star-convex contours, we use centroidal profiles [davies2004]. Given an object shape, we find the inner-center, which is the center of the circle of the maximum size wholly contained in the shape, as done in [xu2019]. Then, with respect to the inner center, we describe the boundary using polar coordinates , . The angular coordinates are sampled uniformly, so only the radial coordinates are recorded to represent the contour
As in Figure 3(c), is set to be the distance of the farthest object point from the center along the -axis. By construction, describes a star-convex contour.
Figure 3(d) shows more star-convex contours. With the infinite sampling , a star-convex contour is guaranteed to enclose all object points, since it is the boundary of the star-convex hull of the object. However, with a finite , the star-convex contour may miss some object points, as well as include some non-object points. However, we see that the contours in Figure 3(d) represent object shapes quite faithfully.
Eigencontour space: In general, object shapes are well structured and thus highly correlated to one another, especially between objects in the same class. By exploiting this structural relationship using big data, we design effective contour descriptors. Specifically, we first construct a star-convex contour matrix from training objects. Then, we perform SVD of the matrix ,
where and are orthogonal matrices and
is a diagonal matrix, composed of singular values. It is known that
is the best rank- approximation of [y2015SVD].
In (3), each approximate contour is given by a linear combination of the first left singular vectors . In other words,
We refer to these vectors as eigencontours, and the space spanned by as the eigencontour space.
Given a contour , we project it onto the eigencontour space to obtain the low-rank approximation
where the coefficient vector is given by
In (6), an -dimensional contour is optimally approximated by an -dimensional vector in the eigencontour space, where . Also, the approximate can be reconstructed from via (5). Note that eigencontours may have negative elements. Thus, in rare cases, the approximate has negative elements. In such cases, we truncate the negative elements to 0 to ensure the star-convexity of .
Clustering in eigencontour space: To discover typical contour patterns in a dataset, contour clustering can be performed. Instead of the original contour space of dimension , contours can be grouped more effectively and more efficiently in the eigencontour space of dimension . This is because the original space is transformed to the eigencontour space by an isometry . Specifically, let be object contours, which are approximated via (4). Then, it can be easily shown that
In other words, the distances between contours in the original space are equal to those between the corresponding coefficient vectors in the eigencontour space. Hence, the clustering can be performed to yield the same results in both spaces, but it can be done more reliably and more efficiently in the eigencontour space because
. Note that, as the dimension of a space gets higher, clustering becomes more difficult because of the curse of dimensionality[bellman1966dynamic].
Regression in eigencontour space: Furthermore, it is also beneficial to find object contours in the eigencontour space. A contour regressor can be designed to detect object boundaries in images. To detect a star-convex contour in (1) in the original space, we should regress variables. However, we can approximate all ground-truth contours of training objects using the first eigencontours and train a network to regress coefficients of in (6) in the eigencontour space. This approach requires the regression of fewer variables. Hence, the regression network also needs fewer parameters and is more efficient in both training and inference stages. The efficacy of the regression in the eigencontour space is demonstrated in Sections 4.2 and 4.3.
3.2 Examples and Analysis
Eigencontours: In this example, we use the KINS dataset [qi2019kins], the instances of which are divided into seven categories. We determine the eigencontours for the six categories of ‘cyclist,’ ‘pedestrian,’ ‘tram,’ ‘car,’ ‘truck,’ and ‘van,’ respectively, except for ‘misc’ containing miscellaneous instances with unspecified classes. We also obtain the eigencontours for the universal set of all instances in the six categories. Each object boundary is represented by a 360-dimensional star-convex contour vector, by uniformly quantizing the 360-degree with an interval of , i.e. .
Figure 4 shows the first six eigencontours . For each category, the first eigencontour describes rough outlines of typical instances. For example, most pedestrians stand or walk on sidewalks, as implied by the vertical shape of for ‘pedestrian.’ By weighting , the size of the shape can be controlled. Next, is more complicated to represent detailed parts of instances. For ‘pedestrian,’ is used to reconstruct a pair of legs, as shown in Figure 5(a). Also, for ‘car’ generates a streamlined shape by refining the four sides of a car in Figure 5(b). The coefficient for affects the horizontal and vertical sizes of the car. Similarly, for ‘cyclist’ recovers bike wheels in Figure 5(c). In general, the coefficients for and are larger than those for the other eigencontours, and they are major factors for determining overall shapes. To represent those shapes more precisely, more eigencontours are required. Note that, for the three related categories of ‘car,’ ‘truck,’ and ‘van,’ and are similar to one another. Also, for the universal set is a round shape to describe various instances in different categories.
Rank- approximation: Figure 6 shows two object boundaries in the COCO2017 dataset [lin2014] and their rank- approximations. In this test, the eigencontours are determined for all training instances in all categories. The rank-1 approximations are not good enough; they represent the overall sizes of the objects only. The rank-2 approximations better reconstruct object shapes, but only roughly. As gets larger, more faithful contours are restored. In this example, the objects have relatively complex shapes. Hence, to represent their boundaries well, the rank-20 approximations are required, which are almost identical to the 360-dimensional star-convex contours. Although they cannot reconstruct the original contours perfectly, it is not because of the low-rank approximation, but because of the star-convex conversion. Note that, compared to the 360-dimensional star-convex contours, the rank-20 approximations reduce the dimensionality by a factor of 18.
Clustering in eigencontour space: For each of the six categories in the KINS dataset, we cluster the object boundaries in the 16-dimensional eigencontour space () using the -means algorithm, where is set to 100. Figure 7 shows examples of contour centroids. We see that the centroids represent typical object shapes in the categories from different views. This indicates that eigencontours are effective not only for representing individual contours faithfully, but also for clustering contours into typical patterns in a lower-dimensional space.
We use three datasets: KINS, SBD, and COCO2017. All these datasets were approved by institutional review boards.
KINS [qi2019kins]: It is a dataset for amodal instance segmentation, built on the KITTI dataset [geiger2012]
. It consists of 7,474 training and 7,517 test images. All instances are classified into seven categories, and an amodal segmentation mask is annotated for each instance.
It is a semantic boundary dataset, re-annotated from the PASCAL VOC dataset[everingham2010]. Its 11,355 images are split to 5,623 training and 5,732 validation images. All instances are classified into 20 object categories. Each instance is annotated with its shape boundary without holes.
COCO2017 [lin2014]: It is a large dataset for various tasks, such as object detection and segmentation. It contains 118K training images, 5K validation images, and 41K test images. The instance segmentation masks for objects in 80 categories are provided.
4.2 Comparative Assessment
Contour descriptors: It is desirable for contour descriptors to represent an object boundary compactly, as well as to reconstruct it faithfully. We compare the proposed eigencontours with the conventional contour descriptors [xu2019, xie2020]. For contour description, centroidal profiles are used in PolarMask [xie2020], while polynomial fitting is performed to approximate the shape signature of a boundary in ESE-Seg [xu2019]. In this test, the proposed eigencontours are determined for all instances in all categories in a training dataset.
For the quantitative assessment of contour descriptors, we employ the F-measure () [perazzi2016]. Specifically, bipartite matching is performed between the boundary points of a ground-truth contour and its approximated version. Then, the
score is defined as the harmonic mean of the precision () and the recall () of the matching results.
Figure 8 compares the curves of the proposed eigencontours with those of the conventional descriptors according to the dimension of the descriptors. In PolarMask, radial coordinates in a centroidal profile are sampled to describe a contour. In ESE-Seg, is the number of Chebyshev polynomial coefficients for approximating a contour. For all three datasets of KINS, SBD, and COCO2017, the proposed algorithm outperforms both PolarMask and ESE-Seg at every . For KINS, the proposed algorithm achieves an score higher than 0.9 at , while the conventional ones need approximately double the dimension to yield a similar score. For SBD, similar tendencies are observed. For COCO2017, containing diverse instances with complicated shapes, the instances require higher-dimensional description than those in KINS and SBD. However, the proposed algorithm is still superior to the conventional ones.
Table 1 compares the area under curve performances of the curves (AUC-) in Figure 8. The proposed algorithm outperforms the conventional algorithms by significant margins on all datasets. In other words, the proposed algorithm represents object boundaries more faithfully than the conventional algorithms, when the same number of coefficients are used for the contour description.
Figure 9 compares object boundaries approximated by the contour descriptors at . PolarMask fails to reconstruct curved parts. ESE-Seg provides better results, but it blurs complicated parts, especially the leg boundaries in the second and third columns. In contrast, the proposed eigencontour descriptors represent the object boundaries more accurately and more reliably.
Clustering in low-dimensional space: As mentioned in Section 3.1, it is possible to cluster object contours in a lower-dimensional descriptor space and obtain contour centroids there. To validate the effectiveness of the clustering in the proposed eigencontour space, we compare the clustering performances of the proposed algorithm on the COCO2017 dataset with those of PolarMask and ESE-Seg. To this end, we employ each algorithm to approximate all training boundaries into -dimensional descriptors and obtain centroids via -means. Then, each contour in the dataset is matched with the nearest centroid, and the matching performance is computed in terms of , , and .
Table 2 compares the performances at and . The proposed algorithm yields the best results in all three metrics, which indicates that the proposed algorithm can process object contours more reliably in a low-dimensional space. Qualitative comparison results of the clustering are available in the supplemental document.
Instance segmentation: Both PolarMask and ESE-Seg were proposed for instance segmentation. To localize each instance, these methods reformulate the pixelwise classification as the regression of an object contour. The proposed eigencontours are more effective for this instance segmentation task as well. To demonstrate this, as done in ESE-Seg, we adopt YOLOv3 [redmon2018] as an object detector and modify its components. Given an input image, we predict an output map, in which each element contains an -dimensional coefficient vector as well as the original YOLOv3 vector for bounding box regression and object classification. Then, we use the coefficient vector to linearly combine eigencontours to reconstruct the contour and shape mask of an object. The supplemental document describes the implementation details and the training procedure.
Table 3 compares the instance segmentation results on the SBD validation dataset at . The average precision (AP) performances, based on two intersection-over-union (IoU) thresholds of 0.5 and 0.75 and an score threshold of 0.3, are reported. The proposed algorithm performs better than PolarMask and ESE-Seg in terms of all three metrics. Figure 10 shows boundary regression results. PolarMask and ESE-Seg fail to reconstruct object boundaries reliably. In contrast, the proposed algorithm represents the boundaries more faithfully. Figure 11 shows more instance segmentation results.
Dimension of eigencontour space (): Table 4 lists the AUC- performances of the proposed algorithm on the SBD validation dataset according to the dimension, , of the eigencontour space. At , the proposed algorithm yields poor scores, since object boundaries are too simplified and not sufficiently accurate. At , it provides the best results. At , it yields similarly good results. However, at , the performances are degraded further, which indicates that a high-dimensional space does not always lead to better results. It is more challenging to regress more variables reliably. There is a tradeoff between accuracy and reliability. In this test, achieves a good tradeoff.
Categorical eigencontour space: The proposed eigencontours are data-driven descriptors, which depend on the distribution of object contours in a dataset. Thus, different eigencontours are obtained for different data. Let us consider two options for constructing eigencontour spaces: categorial construction and universal construction. In the categorial construction, eigencontours are determined for each category in a dataset. In the universal construction, they are determined for all instances in all categories.
For the two options, score curves are presented according to the dimension in the supplemental document. Table 5 compares the area under curve performances of the curves up to . The categorial construction provides better performances than the universal construction, because it considers similar shapes in the same category only. In COCO2017, the gap between the two options is the smallest. This is because some object shapes are not properly represented due to occlusions and thus COCO2017 objects exhibit low intra-category correlation. In contrast, in KINS, whole contours are well represented because occluded regions are also annotated. Hence, the gap between the two options is the largest.
Limitations: The proposed eigencontours represent typical contour patterns in a dataset. Thus, if object contour patterns differ among datasets, the eigencontours for a dataset may be effective for that particular dataset only. To assess the dependency of eigencontours on a dataset, we conduct cross-validation tests between datasets in the supplemental document.
We proposed novel contour descriptors, called eigencontours, based on low-rank approximation. First, we constructed a contour matrix containing all contours in a training set. Second, we approximated the contour matrix, by performing the best rank- approximation. Third, we represent an object boundary by a linear combination of the eigencontours. Experimental results demonstrated that the proposed eigencontours can represent object boundaries more effectively and more faithfully than the existing methods. Moreover, the proposed algorithm yields meaningful instance segmentation performances.
This work was supported by the National Research Foundation of Korea (NRF) grants funded by the Korea government (MSIT) (No. NRF-2021R1A4A1031864 and No. NRF-2022R1A2B5B03002310).