Shape Recognition by Bag of Skeleton-associated Contour Parts

05/20/2016 ∙ by Wei Shen, et al. ∙ 0

Contour and skeleton are two complementary representations for shape recognition. However combining them in a principal way is nontrivial, as they are generally abstracted by different structures (closed string vs graph), respectively. This paper aims at addressing the shape recognition problem by combining contour and skeleton according to the correspondence between them. The correspondence provides a straightforward way to associate skeletal information with a shape contour. More specifically, we propose a new shape descriptor. named Skeleton-associated Shape Context (SSC), which captures the features of a contour fragment associated with skeletal information. Benefited from the association, the proposed shape descriptor provides the complementary geometric information from both contour and skeleton parts, including the spatial distribution and the thickness change along the shape part. To form a meaningful shape feature vector for an overall shape, the Bag of Features framework is applied to the SSC descriptors extracted from it. Finally, the shape feature vector is fed into a linear SVM classifier to recognize the shape. The encouraging experimental results demonstrate that the proposed way to combine contour and skeleton is effective for shape recognition, which achieves the state-of-the-art performances on several standard shape benchmarks.



There are no comments yet.


page 6

page 8

Code Repositories


Shape recognition based on the combination of contour and skeleton (codes for our paper on PRL2016)

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Shape is a significant cue in human perception for object recognition. The objects shown in Fig. 1 have lost their brightness, color and texture information and are only represented by their silhouettes, however it’s not intractable for human to recognize their categories. This simple demonstration indicates that shape is stable to the variations in object color and texture and light conditions. Due to such advantages, recognizing objects by their shapes has been a long standing problem in the literature. Shape recognition is usually considered as a classification problem that is given a testing shape, to determine its category label based on a set of training shapes as well as their category label. The main challenges in shape recognition are the large intra-class variations induced by deformation, articulation and occlusion. Therefore, the main focus of the research efforts have been made in the last decade Belongie et al. (2002); Ling and Jacobs (2007); Sun and Super (2005); Wang et al. (2010a); Bai et al. (2009, 2014) is how to form a informative and discriminative shape representation.

Figure 1: Human biological vision system is able to recognize these object without any appearance information (brightness, color and texture).

Generally, the existing main stream shape representations can be classified into two classes: contour based Belongie et al. (2002); Ling and Jacobs (2007); Felzenszwalb and Schwartz (2007) and skeleton based Aslan et al. (2008); Bai and Latecki (2008); Siddiqi et al. (1999); Sebastian et al. (2004); Xie et al. (2008); Shen et al. (2013b). The former one delivers the information that how the spatial distribution of the boundary points varies along the object contour. Therefore, it captures more informative shape information and is stable to affine transformation. However, it is sensitive to non-ridge deformation and articulation; On the contrary, the latter one provides the information that how thickness of the object changes along the skeleton. Therefore, it is invariant to non-ridge deformation and articulation, although it only carries more rough geometric features of the object. Consequently, such two representations are complementary. Nevertheless, very few works have tried to combine these two representations for shape recognition. The reason might be that combining the data of different structures is not trivial, as the contour is always abstracted by a closed string while the skeleton is abstracted either by a graph or a tree. Consequently, the matching methods Duchon (1977); Demirci et al. (2006); Cormen et al. (2001); Ma et al. (2015a, b, 2016, 2014) for these two data abstraction are different. ICS Bai et al. (2009) is the first work to explicitly discuss how to combine contour and skeleton to improve the performance of shape recognition. However, the combination proposed in this work is just a weighted sum of the outputs of two generative models trained individually on contour features and skeleton features respectively. Therefore, how to combine contour and skeleton into a shape representation in a principled way is still an open problem.

In this paper, our goal is to address the above combination issue to explore the complementarity between contour and skeleton to improve the performance of shape recognition. The main obstacle of the combination is the data structures of contour and skeleton are different (closed string vs graph). A contour is usually described by the features of its parts (contour fragments) Sun and Super (2005); Felzenszwalb and Schwartz (2007). As the correspondence between contour points and skeleton points can be obtained easily, for each contour point, we can associate the geometric information of its corresponding skeleton point with it. In this way, we can record the change of the object thickness, i.e., the skeleton radius, along each contour fragment. Such association actually leads to the combination of contour and skeleton on part level (Fig. 2 shows some corresponding contour and skeleton parts. Note that, a contour fragment may correspond to more than one skeleton segments, such as the second example in Fig. 2). Therefore, combing contour and skeleton on part level is a feasible way.

Figure 2: Some corresponding contour and skeleton parts, marked in green and red, respectively. The corresponding contour and skeleton points are linked by blue lines.

With the extra information provided by skeleton, inspired by the well known descriptor Shape Context (SC)  Belongie et al. (2002)

, we propose to encode the features of a contour point into a 3D tensor, in which the three dimensions describe the Euclidean distances, orientations and thickness differences between the contour points and others in the fragment, respectively. Intuitively, the proposed new descriptor extends SC by including the extra information, object thickness, provided by skeleton. Therefore, it is more informative; Essentially, this new descriptor is formed by concatenating the SC descriptors of the sub-parts of the contour fragment separated according to thickness information. Such sub-parts based representation capture fine level geometric information, so it is more discriminative. Fig. 

3 illustrates the new descriptor for a contour point in a contour fragment, in which the sub-parts of the contour fragment are marked by different colors and the sub-part and its SC descriptor are marked by the same color. This new shape descriptor is termed as Skeleton-associated Shape Context (SSC), as it associates the skeletal information with the contour descriptor.

Figure 3: The Skeleton-associated Shape Context descriptor of a contour point in a contour fragment, which is a 3D tensor to describe the Euclidean distances, orientations and thickness differences between the contour point and others in the fragment. It equals to the concatenated shape context descriptors Belongie et al. (2002) computed on sub-parts (marked by different colors) separated according to the object thickness differences between the contour point and others in the fragment.

Following the framework of the recent work Bag of Contour Fragments (BCF) Wang et al. (2014), we can obtain a shape feature vector of an overall shape by encoding and then pooling the SSC descriptors extracted from it. We term our method as Bag of Skeleton-associated Contour Parts (BSCP), as it associates skeletal information with contour fragments and encodes the shape features from shape part level. Fig. 4 shows the pipeline of building a shape feature vector by BSCP. Given a shape, firstly a normalization step is performed to align the shape according to its major axis (Fig. 4(b)), as the Spatial Pyramid Matching (SPM) Lazebnik et al. (2006) step (Fig. 4(g)) is not rotation invariant. Then, the skeleton of the shape is extracted and the contour of the shape is decomposed into contour fragments (Fig. 4(c)). Each contour point is associated with a object thickness value, i.e, the radius of its corresponding skeleton point. A shape part is then described by the contour fragment associated with the object thickness values provided by its corresponding skeleton segments (Fig. 4(d)). After that, each shape part is represented by concatenating the SSC descriptors extracted on its reference points (Fig. 4(e)), and then encoded into shape codes (Fig. 4(f)). To encode shape parts, we adopt local-constrained linear coding (LLC) Wang et al. (2010b) scheme, as it has been proved to be efficient and effective for image classification. Finally, the shape codes are pooled into a compact shape feature vector by SPM (Fig. 4

(h)). The obtained shape feature vectors can be fed into any discriminative models, such as SVM and Random Forest, to perform shape classification. Using such discriminative models for shape recognition is more efficient than traditional shape classification methods, as the latter require time consuming matching and ranking steps.

Figure 4: The pipeline of building a shape feature vector by bag of skeleton-associated contour parts

Our contributions can be summarized in three aspects. First, we propose a natural way to associate a shape contour with skeletal information. Second, we propose a new shape descriptor which encodes the shape features from a contour fragment associated with skeletal information. Last, our method, Bag of Skeleton-associated Contour Parts achieves the state-of-the-arts on several shape benchmarks.

The remainder of this paper is organized as follows. Sec. 2 reviews the works related to shape recognition. Sec. 3 introduces the proposed shape descriptor as well as our framework for shape recognition. Experimental results and analysis on several shape benchmarks are shown in Sec. 4. Finally, we draw the conclusion in Sec. 5.

Our preliminary work Shen et al. (2014) also combines contour and skeleton for shape recognition, while the difference to this paper is obvious. Rather than simply concatenating the contour and skeleton features on mid-level, this paper associates skeletal information with a shape contour on low-level by making full use of the natural correspondence between a contour and its skeleton.

2 Related Work

There have been a rich body of works concerning shape recognition in recent years Belongie et al. (2002); Ling and Jacobs (2007); Sun and Super (2005); Grigorescu and Petkov (2003); Baseski et al. (2009); Daliri and Torre (2010); Wang et al. (2010a); Erdem and Tari (2010); Bicego and Lovato (2015); Bharath et al. (2015). In the early age, the exemplar-based strategy has been widely used, such as Belongie et al. (2002); Ling and Jacobs (2007). Generally, there are two key steps in this strategy. The first one is extracting informative and robust shape descriptors. For example, Belongie et al. Belongie et al. (2002) introduce a shape descriptor named shape context (SC) which describes the relative spatial distribution (distance and orientation) of landmark points sampled on the object contour around feature points. Ling and Jacobs Ling and Jacobs (2007) use inner distance to extend shape context to capture articulation. As for skeleton based shape descriptors, the reliability of them is ensured by effective skeletonization Saha et al. (2015); Borgefors et al. (1999) or skeleton pruning Bai et al. (2007); Shen et al. (2011) methods to a large extent. Among them, the shock graph and its variants Siddiqi et al. (1999); Sebastian et al. (2004); Macrini et al. (2011)

are most popular, which are abstracted from skeletons by designed shape grammar. The second one is finding the correspondences between two sets of the shape descriptors by matching algorithms such as Hungarian, thin plate spline (TPS) and dynamic programming (DP). A testing shape is classified into the class of its nearest neighbor ranked by the matching costs. The exemplar-based strategy requires a large number of training data to capture the large intra-class variances of shapes. However, when the size of training set become quite large, it’s intractable to search the nearest neighbor due to the high time cost caused by pairwise matching.

Generative models are also used for shape recognition. Sun and Super Sun and Super (2005) propose a Bayesian model, which use the normalized contour fragments as the input features for shape classification. Wang et al. Wang et al. (2010a)

model shapes of one class by a skeletal prototype tree learned by skeleton graph matching. Then a Bayesian inference is used to compute the similarity between a testing skeleton and each skeletal prototype tree. Bai

et al. Bai et al. (2009)

propose to integrate contour and skeleton by a Gaussian mixture model, in which contour fragments and skeleton paths are used as the input features. Unlike their method, ours encodes the contour and skeleton features into one shape descriptor according to the association between contour and skeleton. Therefore, we avoid the intractable step to finetune the weight between contour and skeleton models.

Recently, researchers begin to apply the powerful discriminative models to shape classification. Daliri and Torre Daliri and Torre (2008, 2010) transform the contour into a string based representation according to a certain order of the corresponding contour points found during contour matching. Then they apply SVM to the kernel space built from the pairwise distances between strings to obtain classification results. Edem and Tari Erdem and Tari (2010) transform a skeleton into a similarity vector, in which each element is the similarity between the skeleton and a skeletal prototype of one shape category. Then they apply linear SVM to the similarity vector to determine the category of the skeleton. Wang et al. Wang et al. (2014) utilize LLC strategy to extract the mid-level representation BCF from contour fragments and they also use linear SVM for classification. Such coding based methods are used for 2D and 3D shape retrieval Bai et al. (2014, 2015). Shen et al. Shen et al. (2014) propose a skeleton based mid-level representation named Bag of Skeleton Paths (BSP), and concatenate the BCF and BSP for shape recognition. The weights between BCF and BSP are automatically learned by SVM. This method implicitly combines contour and skeleton according to the weights learned by SVM, while this paper explicitly combines contour and skeleton by using the correspondence between them, which is a more natural combination way.

3 Methodology

In this section, we will introduce our method for shape recognition, including the steps of shape normalization, SSC descriptor and shape classification by BSCP.

3.1 Shape Normalization

As the SPM strategy assumes that the parts of shapes falling in the same subregion are similar, it is not rotation invariant. To apply SPM to shape classification, a normalization step is required to align shapes roughly. One straightforward solution is to align each shape with its major axis. Here, we use principal component analysis (PCA) to compute the orientation of the major axis of each shape. Formally, given a shape

, we apply PCA to the point set . First, the covariance matrix is computed by , where and

. Then, the two eigenvectors

and of form the columns of the matrix

, and the two eigenvalues of

are . The orientation of the major axis of the shape

is the orientation of the eigenvector whose corresponding eigenvalue is bigger. All shapes are rotated to ensure their estimated major axes are aligned with the horizontal line, such as the example given in Fig. 


3.2 Skeleton-associated Shape Context

In this section, we show how to compute the SSC descriptor for a given contour point step by step.

3.2.1 Skeleton-associated Contour

For a given shape , let and denote its contour and skeleton, respectively. The skeleton can be obtained by the method introduced in Shen et al. (2013a), which does not require parameter tuning for skeleton computation. Our goal is to find the corresponding skeleton point of each contour point and assign a object thickness value to it. To describe our method clearly, here we first briefly review some skeleton related definitions. According to the definition of skeleton Blum (1973), a skeleton is a set of the centers of the maximal discs of a shape. A maximal disc has at least two points of tangency on the contour, which are called Generating Points (GPs).

Formally, for a skeleton point , let be the radius of the maximal disc of the shape centered at and be the set of GPs of . On the discrete domain, can be approached by the Distance Transform (DT) value of to the contour :


where is the -Norm. can be obtained approximatively by


where denotes the eight neighbors of . Note that, . Now we have a one-to-many correspondence between a skeleton point and a set of contour point . For each contour point , we associate the object thickness value with it, and use the notation to denote the corresponding function mapping it to the skeleton point , i.e., , if . Now considering the overall shape, let be the set of all the GPs of :


Note that , so the function can not be applied to all the contour points. However, we can define a unified function to compute the associated object thickness value for each contour point :


where and is denoted by the minimum contour curve length between two contour points. Eq. 4 means that for each contour point , we search its closest contour point along the contour (if , then ), and assign the associated object thickness value of to .

3.2.2 Shape Descriptor Computation

Part-based methods Sun and Super (2005); Felzenszwalb and Schwartz (2007); Bai et al. (2009); Wang et al. (2014) have been widely used for shape recognition, as shape parts are the basic meaningful elements of a shape. We want to build a discriminative and informative shape representation based on shape parts. The shape parts can be obtained by any contour decomposition methods, such like Discrete Contour Evolution (DCE) Latecki and Lakämper (1999). Given a shape contour , we apply DCE to obtain its critical points , where is the number of the critical points. We build a shape part set , which consists of the contour fragments between any pairs of critical points . Let denote the contour fragment from to (anticlockwise direction), then we have


Note that we do not force and to be adjacent points in the critical point set, and and are two different parts. Also we have . Using the method described in the previous section, any contour part can be transformed into a skeleton-associated contour part. In the reminder of this paper, unless otherwise specified, we treat these two concepts equally.

Now we propose how to compute the SSC descriptor at a reference contour point of a skeleton-associated contour part. Each point on a skeleton-associated contour part can be represented by a triplet , where is the relative coordinate and is the associated object thickness value111To ensure scale invariant, this value should be normalized by dividing by the mean value of the points on the contour part.. From this view, the point actually lies in a 3D space. Given a contour part, we uniformly sample points on it, then for a given reference contour point , we describe its descriptor by the distribution of relative differences to the sampled points on Euclidean distance, orientation and associated object thickness value. We compute a coarse histogram for :


Here, , where and are the Euclidean distance between and and the orientation angle of the ray from to defined on log-polar space, respectively. We use bins that are uniform in such a 3D space, which follows the strategy used in SC Belongie et al. (2002) to make the descriptor more sensitive to nearby sample points than those farther away. The histogram is defined to be the SSC of .

Finally, we concatenate the SSC descriptors of the reference points on a contour part to form the descriptor vector for : , where is the number of the reference points and .

3.3 Bag of Skeleton-associated Contour Parts

In this section, we introduce how to perform shape classification by BSCP.

3.3.1 Contour Parts Encoding

Encoding a skeleton-associated contour part is transforming it into a new space by a given codebook with entries, . In the new space, the contour part is represented by a shape code .

Codebook construction is usually achieved by unsupervised learning, such as k-means. Given a set of contour parts randomly sampled from all the shapes in a dataset as well as their flipped mirrors, we apply k-means algorithm to cluster them into

clusters and construct a codebook . Each cluster center forms an entry of the codebook .

To encode a contour part , we adopt LLC scheme Wang et al. (2010b), as it has been proved to be effective for image classification. Encoding is usually achieved by minimizing the reconstruction error. LLC additionally incorporates locality constraint, which solves the following constrained least square fitting problem:


where is the local bases formed by the nearest neighbors of and is the reconstruction coefficients. Such a locality constrain leads to several favorable properties such as local smooth sparsity and better reconstruction. The code of encoded by the codebook , i.e. , can be easily converted from by setting the corresponding entries of are equal to ’s and others are zero.

Note that, the SSC descriptors of a contour part and its flipped mirror are different, as shown in Fig. 5. To make our shape code invariant to the flip transformation, for a contour part, we propose to add the shape code of its flipped mirror to its in an element-wise manner (as shown in Fig. 5). In this way, the shape codes of a contour part and its flipped mirror are the same. The available encoding of contour parts and their flipped mirrors are ensured by the sufficient samples used for codebook building (recall that our codebook is generated by clustering a set of contour parts randomly sampled from all the shapes in a dataset as well as their flipped mirrors).

Figure 5: The shape codes of a contour part and its flipped mirror are added in an element-wise manner to form the final shape code for it, which is invariant to flip transformation.

3.3.2 Shape Code Pooling

Given a shape , its skeleton-associated contour parts are encoded into shape codes , where is the number of the contour parts in . Now we describe how to obtain a compact shape feature vector by pooling the shape codes. SPM is usually used to incorporate spatial layout information when pooling the image codes. It usually divides a image into subregions and then the features in each subregion are pooled respectively. For the aligned shapes belong to one category, the contour parts falls in the same subregions should be similar. Here, the position of a contour part is defined as its median point. More specifically, we divide a shape into subregions, i.e. subregions totally. Let denote the shape code of a contour part at position , to obtain a shape feature vector , for each subregion

, we perform max pooling on it as follow:


where the “max” function is performed in an element-wise manner, i.e. for each codeword, we take the max value of all shape codes in a subregion. Max pooling is robust to noise and has been successfully applied to image classification. is a dimensional feature vector of the subregion . The BSCP vector is a concatenation of the feature vectors of all subregions:


Finally, is normalized by its -norm: .

3.4 Shape Classification by BSCP

Given a training set consisting of shapes from classes, where and are the BSCP vector and the class label of -th shapes respectively, we train a multi-class linear SVM Crammer and Singer (2001) as the classifier:


where and is a parameter to balance the weight between the regularization term (left part) and the multi-class hinge-loss term (right part). For a testing shape vector , its class label is given by


Here we adopt linear SVM, as the proposed BSCP feature vector is a high dimensional sparse vector, computed by LLC coding. The normalization in LLC makes the inner product of any vector with itself to be one, which is desirable for linear kernels Wang et al. (2010b). Using classifiers with nonlinear kernel, such as kernel SVM and random forest, instead leads to performance decrease.

4 Experimental Results

In this section, we evaluate our method on several shape benchmarks in comparison to the state-of-the-arts. We also investigate the effects of two important parameters introduced in our method on classification accuracy: the number of object thickness difference bins for computing SSC and codebook size .

Figure 6: Shapes of two classes from Animal dataset Bai et al. (2009). The first row shows 5 shapes of the Cat class, with large intra-class variations caused by view point change and various gestures of the cats. Moreover, leopards on the second row are similar to those cats on the first row, which makes recognition of these two kinds of shapes much more difficult.

4.1 Experimental Setup

For each contour part, we form a descriptor vector for it by concatenating the SSC descriptors computed on reference points. Unless otherwise specified, we set the number of bins for computing SSC to 300 (5 Euclidean distance bins, 12 orientation bins, 5 object thickness difference bins). Thus the dimension of a descriptor vector for a contour part is 1500. The number of Euclidean distance bins and the number of orientation bins are set to the default values used in SC Belongie et al. (2002). Hence, we will discuss the effects of the number of object thickness difference bins on classification accuracy individually. When learning the codebook, the number of cluster centers (codebook size) is set to 2500 by default. We also study the performances of BSCP by varying the codebook size. To encode a contour part, we adopt the approximated LLC with nearest neighbors. When pooling, a shape is divided into , and , in total 21 regions. The weight between the regularization term and the multi-class hinge-loss term in the multi-class linear SVM formulation is set to 10. Default parameter settings reported in Shen et al. (2013a) are adopted to extract skeletons.

All the experiments were carried out on a workstation (3.1GHz 32-core CPU, 128G RAM and Ubuntu14.04 64-bit OS). It takes about ms to compute our SSC descriptor for one contour fragment, and s to encode the BSCP feature vector for one shape. The whole training process takes about hours (including feature computation and codebook learning), the testing process for one shape takes ms (excluding feature computation).

We evaluate our method on several shape classification benchmark datasets, including the MPEG-7 dataset Latecki et al. (2000), the Animal dataset Bai et al. (2009), and the ETH-80 dataset Leibe and Schiele (2003). To avoid the biases caused by randomness, such a procedure is repeated 10 times. Average classification accuracy and standard derivation are reported to evaluate the performance of different shape classification methods. In each round, we randomly select half of shapes in each class to train and use the rest shapes to evaluate for every dataset except the ETH-80 dataset. On the ETH-80 dataset, following the previous methods Leibe and Schiele (2003); Ling and Jacobs (2007); Daliri and Torre (2008, 2010); Wang et al. (2014), we use all shapes except the current one for training and use the current one for testing (Leave-one-out setting Devijver and Kittler (1982)). Experimental results and analysis are given in the rest of this section.

4.2 Animal Dataset

We firstly test our method on the Animal dataset which is introduced in Bai et al. (2009). This dataset contains 2000 shapes divided into 20 kinds of animals, including cat, spider, leopard, etc. It is the most challenging shape dataset due to the large intra-class variations caused by view point change and various gestures of animals (as shown in Fig. 6). We randomly choose 50 shapes per class for training and leave the rest 50 shapes for testing. The comparison between BSCP and other shape classification methods is demonstrated in Table. 1.

As shown in Table. 1, the proposed method achieves a classification accuracy at which significantly outperforms the previous state-of-the-art method, Contextual BOW Bharath et al. (2015), by over . This result proves that the introduction of the object thickness information extracted from skeletons indeed help shape recognition. Our method also performs much better than BCF+BSP Shen et al. (2014), evidencing that our method which associates a shape contour with skeletal information in such a principal way is more effective than the previous method, which combines contour and skeleton implicitly according to the weights learned by SVM. The comparison between our method and BCF Wang et al. (2014), directly shows that SSC descriptor can capture not only the geometric information of the object contour but also the object thickness information for a shape. The combination of such two kinds of complementary information leads to an improvement on resisting interference caused by intra-class variations.

Algorithm Classification accuracy
Skeleton Paths Bai et al. (2009) 67.90%
Contour Segments Bai et al. (2009) 71.70%
IDSC Ling and Jacobs (2007) 73.60%
ICS Bai et al. (2009) 78.40%
BCF Wang et al. (2014) 83.40 1.30%
Bioinformatic Bicego and Lovato (2015) 83.70%
ShapeVocabulary Bai et al. (2014) 84.30 1.01%
BCF+BSP Shen et al. (2014) 85.50 0.88%
Contextual BOW Bharath et al. (2015) 86.00%
BSCP 89.04 0.95%
Table 1: Classification accuracy comparison on Animal dataset Bai et al. (2009)
Figure 7: Typical shapes of some classes from MPEG-7 dataset Latecki et al. (2000).

4.3 MPEG-7 Dataset

Then we evaluate our method on the MPEG-7 dataset Latecki et al. (2000)

, which is the most well-known dataset for shape analysis in the field of computer vision (see Fig. 

7). 1400 images of the dataset are divided into 70 classes with high shape variability, in each of which there are 20 different shapes. Average classification accuracy and standard derivation of classification accuracies are reported in Table. 2.

As shown in Table. 2, our method achieves the best performance on the MPEG-7 dataset. BCF Wang et al. (2014) has already obtained good result, since it applies the Bag of Features framework to obtain the mid-level model of shape representation, which is more robust and accurate. BCF+BSP Shen et al. (2014) combines skeleton and contour information in a simple but effective way, and performs better than BCF, which proves that both skeleton and contour features are important in shape classification. However, with adopting SSC descriptor to combine contour and skeleton information, our method achieves better result than BCF+BSP on this dataset. The improvement on this dataset is not so significant as the one on the Animal dataset, the reason is the accuracies of the state-of-the-arts on this dataset have already approached to .

Algorithm Classification accuracy
Skeleton Paths Bai et al. (2009) 86.70%
Contour Segments Bai et al. (2009) 90.90%
Bioinformatic Bicego and Lovato (2015) 96.10%
ICS Bai et al. (2009) 96.60%
BCF Wang et al. (2014) 97.16 0.79%
BCF+BSP Shen et al. (2014) 98.35 0.63%
BSCP 98.41 0.44%
Table 2: Classification accuracy comparison on MPEG-7 dataset Latecki et al. (2000)

4.4 ETH-80 Dataset

The ETH-80 dataset Leibe and Schiele (2003) contains 80 objects, which are divided into 8 categories. There are 41 3-D color photographs token from different viewpoints for each object. We use the segmentation masks provided by the dataset to evaluate our method. The result is shown in Table. 3.

Compared with other methods, ours achieves the classification accuracy of 93.05%, outperforming the previous state-of-the-art approach in Wang et al. (2014) by over 1.5%.

Algorithm Classification accuracy
Color histogram Leibe and Schiele (2003) 64.86%
PCA gray Leibe and Schiele (2003) 82.99%
PCA masks Leibe and Schiele (2003) 83.41%
SC+DP Leibe and Schiele (2003) 86.40%
IDSC+DP Ling and Jacobs (2007) 88.11%
Robust symbolic Daliri and Torre (2008) 90.28%
Kernel-edit Daliri and Torre (2010) 91.33%
BCF Wang et al. (2014) 91.49%
Bioinformatic Bicego and Lovato (2015) 91.50%
BSCP 93.05%
Table 3: Classification accuracy comparison on ETH-80 dataset Leibe and Schiele (2003)

4.5 Parameter Discussion

In this section, we investigate the effects of three important parameters on shape classification accuracy.
The number of object thickness difference bins for computing SSC. Since the proposal of the shape descriptor SSC is an important contribution, it is necessary to study how different settings of the descriptor effect the performance on shape classification.

As an extension of the Shape Context, SSC has one more dimension to describe the thickness differences, the number of object thickness difference bins . To investigate the influence of this parameter, we set to different values to observe the performance change on the Animal dataset, while other parameters are set to the default values. The result is reported in Fig. 8.

Observed that our method achieves the best performance when is set to 5. (or ) leads to performance decrease. The reason may be that SSC with small can only give a coarse representation of the thickness information, while losing most of the information a skeleton provides. Although leads to a result close to the best one, it will result in significant increase in SSC descriptor computation, codebook learning and feature encoding. , which is selected by us, is thought to be the best trade-off between accuracy and efficiency. We use it as the default value in our experiments, and gain the state-of-the-art performances on several datasets (see Table. 1, Table. 2 and Table. 3).

Figure 8: Classification accuracies on Animal dataset Bai et al. (2009) by varying the number of object thickness difference bins for computing SSC .

The number of reference points for computing SSC. We also show how performance changes by varying the number of reference points when computing our SSC descriptor in Fig. 9. Unsurprisingly, with the increase of the number of reference points, the classification accuracy is improved, as more shape details are considered. However, using more reference points leads to a significantly time consuming shape feature computation process. To balance the performance and computational cost, we choose reference points.

Figure 9: Classification accuracies on Animal dataset Bai et al. (2009) by varying the number of reference points for computing SSC.

Codebook size. In this experiment, we adopt codebooks with different sizes, including 500, 1000, 1500, 2000, 2500 and 3000, to classify shapes on the Animal dataset. Other parameters are fixed to their default values. The classification accuracies of BSCP by using different codebook sizes are shown in the Fig. 10. As the codebook size increases, shape classification accuracy improves generally, which was also reported in Wang et al. (2014).

Figure 10: Classification accuracies on Animal dataset Bai et al. (2009) by varying codebook size .

4.6 Limitation

Our SSC descriptor relies on the quality of the extracted skeleton. It also requires that the object can be well represented by its skeleton. Some objects in the MPEG-7 dataset, such as the “device” classes shown in Fig. 11, are not suitable to be represented by skeletons. In this case, our SSC descriptor does not perform well. We have applied our SSC descriptor to the shape retrieval framework of “Shape Vocabulary” Bai et al. (2014) and test it on the MPEG-7 dataset. Unfortunately, we do not see the performance increase. This may be another reason why our method does not achieve an obvious classification improvement on the MPEG-7 dataset, as shown in Table 2.

Figure 11: Each row represents four shape examples from one kind of “device” class in the MPEG-7 dataset. The skeleton of each shape is visualized by black curves. The envelope contour of the shapes in each row are similar, while their skeletons are totally different.

5 Conclusion

In this paper, we present a novel shape representation called BSCP, which combines contour and skeleton in a principal way. This is achieved through the adoption of a novel low-level shape descriptor, the SSC, which is able to make full use of the natural correspondence between a contour and its skeleton. Both the normalization step and SPM are adopted to ensure that our method is effective and accurate, without losing the invariance to rotation. We have tested BSCP in many benchmarks, and the results lead to a conclusion that our method has achieved the state-of-the-art performance. Parameter discussion is also done as a reference for other researchers. In the future, we will further study how to apply BSCP to recognize objects in natural images, which requires reliable object contour detection Shen et al. (2015) and symmetry detection Shen et al. (2016).

Acknowledgement. This work was supported in part by the National Natural Science Foundation of China under Grant 61303095, in part by Research Fund for the Doctoral Program of Higher Education of China under Grant 20133108120017, in part by Innovation Program of Shanghai Municipal Education Commission under Grant 14YZ018, in part by Innovation Program of Shanghai University under Grant SDCX2013012 and in part by Cultivation Fund for the Young Faculty of Higher Education of Shanghai under Grant ZZSD13005.


  • Aslan et al. (2008) Aslan, C., Erdem, A., Erdem, E., Tari, S., 2008. Disconnected skeleton: shape at its absolute scale. IEEE Trans. Pattern Analysis and Machine Intelligence 30, 2188–2203.
  • Bai et al. (2015) Bai, X., Bai, S., Zhu, Z., Latecki, L.J., 2015. 3d shape matching via two layer coding. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2361–2373.
  • Bai and Latecki (2008) Bai, X., Latecki, L., 2008. Path similarity skeleton graph matching. IEEE Trans. Pattern Analysis and Machine Intelligence 30, 1282–1292.
  • Bai et al. (2007) Bai, X., Latecki, L.J., Liu, W., 2007. Skeleton pruning by contour partitioning with discrete curve evolution. IEEE Trans. Pattern Anal. Mach. Intell. 29, 449–462.
  • Bai et al. (2009) Bai, X., Liu, W., Tu, Z., 2009. Integrating contour and skeleton for shape classification, in: ICCV Workshops, pp. 360–367.
  • Bai et al. (2014) Bai, X., Rao, C., Wang, X., 2014. Shape vocabulary: A robust and efficient shape representation for shape matching. IEEE Transactions on Image Processing 23, 3935–3949.
  • Baseski et al. (2009) Baseski, E., Erdem, A., Tari, S., 2009. Dissimilarity between two skeletal trees in a context. Pattern Recognition 42, 370–385.
  • Belongie et al. (2002) Belongie, S., Malik, J., Puzicha, J., 2002. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Analysis and Machine Intelligence 24, 509–522.
  • Bharath et al. (2015) Bharath, R., Xiang, C., Lee, T.H., 2015. Shape classification using invariant features and contextual information in the bag-of-words model. Pattern Recognition 48, 894–906.
  • Bicego and Lovato (2015) Bicego, M., Lovato, P., 2015. A bioinformatics approach to 2d shape classification. Computer Vision and Image Understanding .
  • Blum (1973) Blum, H., 1973. Biological shape and visual science. J. Theor. Biol. 38, 205–287.
  • Borgefors et al. (1999) Borgefors, G., Nyström, I., di Baja, G.S., 1999. Computing skeletons in three dimensions. Pattern Recognition 32, 1225–1236.
  • Cormen et al. (2001) Cormen, T., Leiserson, C., Rivest, R., Stein, C., 2001. Introduction to Algorithms, second ed. MIT Press.
  • Crammer and Singer (2001) Crammer, K., Singer, Y., 2001. On the algorithmic implementation of multiclass kernel-based vector machines.

    Journal of Machine Learning Research 2, 265–292.

  • Daliri and Torre (2008) Daliri, M.R., Torre, V., 2008. Robust symbolic representation for shape recognition and retrieval. Pattern Recognition 41, 1782–1798.
  • Daliri and Torre (2010) Daliri, M.R., Torre, V., 2010. Shape recognition based on kernel-edit distance. Computer Vision and Image Understanding 114, 1097–1103.
  • Demirci et al. (2006) Demirci, M., Shokoufandeh, A., Keselman, Y., Bretzner, L., Dickinson, S., 2006. Object recognition as many-to-many feature matching. Int’l J. Computer Vision 69, 203–222.
  • Devijver and Kittler (1982) Devijver, P.A., Kittler, J., 1982. Pattern Recognition: A Statistical Approach. London, GB: Prentice-Hall.
  • Duchon (1977) Duchon, J., 1977. Splines Minimizing Rotation-Invariant Semi-Norms in Sobolev Spaces. Berlin: Springer-Verlag.
  • Erdem and Tari (2010) Erdem, A., Tari, S., 2010. A similarity-based approach for shape classification using aslan skeletons. Pattern Recognition Letters 31, 2024–2032.
  • Felzenszwalb and Schwartz (2007) Felzenszwalb, P.F., Schwartz, J., 2007. Hierarchical matching of deformable shapes, in: CVPR.
  • Grigorescu and Petkov (2003) Grigorescu, C., Petkov, N., 2003. Distance sets for shape filters and shape recognition. IEEE Transactions on Image Processing 12, 1274–1286.
  • Latecki and Lakämper (1999) Latecki, L.J., Lakämper, R., 1999. Convexity rule for shape decomposition based on discrete contour evolution. Computer Vision and Image Understanding 73, 441–454.
  • Latecki et al. (2000) Latecki, L.J., Lakämper, R., Eckhardt, U., 2000. Shape descriptors for non-rigid shapes with a single closed contour, in: CVPR, pp. 1424–1429.
  • Lazebnik et al. (2006) Lazebnik, S., Schmid, C., Ponce, J., 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, in: CVPR, pp. 2169–2178.
  • Leibe and Schiele (2003) Leibe, B., Schiele, B., 2003. Analyzing appearance and contour based methods for object categorization, in: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003), 16-22 June 2003, Madison, WI, USA, pp. 409–415.
  • Ling and Jacobs (2007) Ling, H., Jacobs, D.W., 2007. Shape classification using the inner-distance. IEEE Trans. Pattern Analysis and Machine Intelligence 29, 286–299.
  • Ma et al. (2015a) Ma, J., Qiu, W., Zhao, J., Ma, Y., Yuille, A.L., Tu, Z., 2015a. Robust le estimation of transformation for non-rigid registration. IEEE Transactions on Signal Processing 63, 1115–1129.
  • Ma et al. (2014) Ma, J., Zhao, J., Tian, J., Yuille, A.L., Tu, Z., 2014. Robust point matching via vector field consensus. IEEE Trans. Image Process. 23, 1706–1721.
  • Ma et al. (2016) Ma, J., Zhao, J., Yuille, A.L., 2016. Non-rigid point set registration by preserving global and local structures. IEEE Transactions on Image Processing 25, 53–64.
  • Ma et al. (2015b) Ma, J., Zhou, H., Zhao, J., Tian, J., 2015b.

    Robust feature matching for remote sensing image registration via locally linear transforming.

    IEEE Transactions on Geoscience and Remote Sensing 53, 6469–6481.
  • Macrini et al. (2011) Macrini, D., Dickinson, S.J., Fleet, D.J., Siddiqi, K., 2011. Object categorization using bone graphs. Computer Vision and Image Understanding 115, 1187–1206.
  • Saha et al. (2015) Saha, P.K., Borgefors, G., di Baja, G.S., 2015. A survey on skeletonization algorithms and their applications. Pattern Recognition Letters .
  • Sebastian et al. (2004) Sebastian, T., Klein, P., Kimia, B., 2004. Recognition of shapes by editing their shock graphs. IEEE Trans. Pattern Analysis and Machine Intelligence 26, 550–571.
  • Shen et al. (2011) Shen, W., Bai, X., Hu, R., Wang, H., Latecki, L.J., 2011. Skeleton growing and pruning with bending potential ratio. Pattern Recognition 44, 196–209.
  • Shen et al. (2016) Shen, W., Bai, X., Hu, Z., Zhang, Z., 2016. Multiple instance subspace learning via partial random projection tree for local reflection symmetry in natural images. Pattern Recognition .
  • Shen et al. (2013a) Shen, W., Bai, X., Yang, X., Latecki, L.J., 2013a. Skeleton pruning as trade-off between skeleton simplicity and reconstruction error. SCIENCE CHINA Information Sciences 56, 1–14.
  • Shen et al. (2015) Shen, W., Wang, X., Wang, Y., Bai, X., Zhang, Z., 2015. Deepcontour: A deep convolutional feature learned by positive-sharing loss for contour detection, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pp. 3982–3991.
  • Shen et al. (2014) Shen, W., Wang, X., Yao, C., Bai, X., 2014. Shape recognition by combining contour and skeleton into a mid-level representation, in: Pattern Recognition - 6th Chinese Conference, CCPR 2014, Changsha, China, November 17-19, 2014. Proceedings, Part I, pp. 391–400.
  • Shen et al. (2013b) Shen, W., Wang, Y., Bai, X., Wang, H., Latecki, L.J., 2013b. Shape clustering: Common structure discovery. Pattern Recognition 46, 539–550.
  • Siddiqi et al. (1999) Siddiqi, K., Shokoufandeh, A., Dickinson, S., Zucker, S., 1999. Shock graphs and shape matching. Int’l J. Computer Vision 35, 13–32.
  • Sun and Super (2005) Sun, K.B., Super, B.J., 2005. Classification of contour shapes using class segment sets, in: CVPR, pp. 727–733.
  • Wang et al. (2010a) Wang, B., Shen, W., Liu, W., You, X., Bai, X., 2010a. Shape classification using tree -unions, in: ICPR, pp. 983–986.
  • Wang et al. (2010b) Wang, J., Yang, J., Yu, K., Lv, F., Huang, T.S., Gong, Y., 2010b. Locality-constrained linear coding for image classification, in: CVPR, pp. 3360–3367.
  • Wang et al. (2014) Wang, X., Feng, B., Bai, X., Liu, W., Latecki, L.J., 2014. Bag of contour fragments for robust shape classification. Pattern Recognition 47, 2116–2125.
  • Xie et al. (2008) Xie, J., Heng, P., Shah, M., 2008. Shape matching and modeling using skeletal context. Pattern Recognition 41, 1756–1767.