Shape is a distinctive object attribute which is frequently utilized in image processing and computer vision applications. Measuring similarity of objects via their shapes is a difficult task due to high within-class and low between-class variations. Within-class variations may be due to transformations such as rotation, scaling and deformation of articulations. Articulated shapes can be successfully represented by structural representations which are organized in the form of graphs of shape components. However, it is challenging to build and compare structural representations. Moreover, measuring similarity of shapes through their structural representations requires finding a correspondence between a pair of graphs, which is an intricate process entailing advanced algorithms.
In this work, we propose a representation scheme for articulated shapes which involves neither building a graph of shape components nor matching a pair of graphs. The proposed representation is used to measure pairwise shape similarity according to which we cluster a set of shapes. The clustering results obtained on three articulated shape datasets show that our method performs comparable to state of the art methods utilizing component graphs or trees even though we are not explicitly modeling component relations.
2 The Method
Our representation scheme relies on first constructing multiple high-dimensional feature spaces in which shape points (pixels in 2D discrete setting) are represented and then, determining distinctness of the shape points in each space separately via Robust Principal Component Analysis (RPCA).
The distinctness values deduced from each feature space are utilized for two main purposes. First, their spatial distribution on the 2D shape domain is used to partition the shape into a set of regions. Second, each region is described by the normalized probability distribution of the corresponding distinctness values. The dissimilarity between a pair of shapes via each feature space is defined as the cost of the optimal assignment between their regions. Notice that we do not build any graphs to model the shape structure and the optimal assignment problem does not involve matching a pair of graphs. The final shape dissimilarity is computed by combining the dissimilarities deduced from multiple feature spaces. Below, we present the details of our representation scheme.
2.1 Construction of a High-dimensional Feature Space
Consider a planar shape discretized using a grid of dimension . We construct a
-dimensional feature vectorfor each shape pixel where and . In order to compute the feature component at each slot for , we first solve a linear system of equations in which the feature value of each shape pixel is related with the feature values of its four-neighboring pixels via (1) and we then normalize the obtained values as in (2).
is solved only for the shape pixels and, it is considered for the pixels outside the shape. is a scalar parameter which controls the dependence between the feature value of each shape pixel and the feature values of its four-neighbors.
In Fig. 1 (left), we show the features computed for a one-dimensional signal using three different values of corresponding to , and times the whole signal length. We normalize each feature to have the maximum value of (see Fig. 1 (right)). We observe that the feature values monotonically increase towards the center.
By varying , we obtain a collection of features each encoding a different degree of local interaction between the shape pixels and their surroundings. We determine for as where represents the extent of the maximum interaction between the shape pixels and their surroundings. In order to represent different shapes in a common feature space, we determine for each shape individually as a measurement of the same shape property.
2.2 Determining Multiple High-dimensional Feature Spaces
We utilize two different shape measurements which are related with thickness of the shape body and the maximum distance between the shape extremities. The first measurement is computed as the maximum value of the shape’s distance transform which gives the distance of each shape point from the nearest boundary. The second measurement is computed as the maximum value of the pairwise geodesic distances between the boundary points where the geodesic distance between a pair of points depends on the shortest path connecting them through the shape domain.
As shown in Fig. 2, and provide characteristic shape information which can be used to define the extent of the local interactions between the shape pixels during the feature space construction. We construct six different feature spaces for which is selected as multiples of or , namely, , , , , and .
2.3 Computing Distinctness of Shape Pixels via Each Feature Space
We organize the feature vectors in the form of a matrix where each row represents the feature vector computed for a shape pixel and is the total number of shape pixels. The matrix is decomposed into a low-rank matrix and a sparse matrix via RPCA, which seeks to solve the following convex optimization problem:
denotes the sum of the singular values of the matrix,is the sum of the absolute values of all matrix entries and is the weight of penalizing denseness of the sparse matrix . Various algorithms are proposed to solve the optimization problem in (3). We use the inexact augmented Lagrange multipliers method for RPCA Lin et al. (2010), which is efficient and highly accurate. We choose as suggested by the available implementation of Lin et al. (2010).
The correlation between the feature vectors hence the shape pixels is encoded by the matrix whereas their discrimination is contained in the matrix . Thus, we define the distinctness of each shape pixel as the norm of the corresponding vector in the matrix . The shape pixels whose feature components vary more are found to be more distinct. The shape articulations are associated with larger distinctness since they are thinner compared to the shape body and the constant value coming from the shape boundary is propagated faster in these regions during the feature computation.
2.4 Partitioning Shapes into a Set of Regions via Each Feature Space
We utilize the afore-mentioned property of the distinctness values in order to partition shapes into a set of regions. We first divide the shape domain into two disjoint sets by thresholding at the mean distinctness value. We further partition each set into multiple regions by dilating the two sets one after another in descending distinctness order. In this way, we remove the connections between different regions of each set. Radius of the structuring element used for dilating each pixel is determined using the distance of the pixel from the nearest boundary.
2.5 Measuring Pairwise Shape Dissimilarity via Each Feature Space
We describe each shape region by the normalized probability distribution of the distinctness values of its constituent pixels where the normalization is performed by making the probability sum equal to the ratio of the region area to the total shape area. In order to estimate the probability distribution, we simply utilize the histogram of the distinctness values with a constant bin size. The dissimilarity between a pair of shapes is defined as the cost of the optimal assignment between their regions. We use Hungarian matching for solving the optimal assignment problem. We do not assume any relation between the regions of each shape. Hungarian matching aims to find a one-to-one correspondence between the regions of the two shapes leaving some regions unmatched. The cost of assigning two regions is simply taken as the sum of the absolute value of the difference between their normalized probability distributions. The cost of leaving a region unmatched is taken as the sum of its normalized probability distribution, which is equal to the ratio of its area.
2.6 Combining Pairwise Shape Dissimilarities Deduced from Multiple Feature Spaces
In order to define the final dissimilarity of a pair of shapes, we compute a weighted average of the dissimilarities deduced from the six feature spaces. The weight is for each of the dissimilarities via the feature spaces constructed using whereas it is for each of the dissimilarities via the feature spaces constructed using . The non-uniform weighting is due to that is more reliable than since the shape body is a more stable structure compared to the articulations.
3 Experimental results
As shown in Fig. 3, the distribution of distinctness values vary considering representations of different shapes via the same feature space or representations of a single shape via different feature spaces. Grouping of the distinctness values on the shape domain provides partitioning of the shape into meaningful regions such as the shape body (gray) and the articulations (black) via simple operations.
In order to observe the clustering effect implied by the proposed dissimilarity measure, we utilize t-Distributed Stochastic Neighbor Embedding (t-SNE) van der Maaten and Hinton (2008) which aims to map objects into a plane based on their pairwise dissimilarities. In Fig. 4, we show the t-SNE mapping result for 56shapes Aslan and Tari (2005) dataset which consist of shape categories each with shapes where the within category variations are due to transformations such as rotation, scaling and deformations of articulations. We see that the shapes from the same category cluster together and the shapes from the similar categories (e.g. horse and cat shapes) are close to each other.
We compare our clustering results with state of the art methods using Normalized Mutual Information (NMI). NMI measures the degree of agreement between the ground-truth category partition and the obtained clustering partition by utilizing the entropy measure. The formulation of NMI is as follows Shen et al. (2013). Let denote the number of shapes in cluster and category , denote the number of shapes in cluster , and denote the number of shapes in category . Then NMI can be computed as follows:
where is the number of clusters, is the number of categories and is the total number of shapes.
A high value of NMI indicates that the obtained clustering matches well with the ground-truth category partition. In order to compute NMI of our clustering result, we need to assign a cluster id to each shape. Given the t-SNE mapping of a dataset obtained using our proposed dissimilarity measure, we apply affinity propagation Frey and Dueck (2007) to partition the dataset into a number of clusters (which is chosen equal to the number of categories in the dataset).
In Table 1, we present NMIs of our proposed method and other state of the art methods on 56shapes Aslan and Tari (2005), 180shapes Aslan et al. (2008) and 1000shapes Baseski et al. (2009) datasets. 180shapes dataset consist of categories each with shapes. 1000shapes dataset consist of categories each with shapes. The method of common structure discovery (CSD) Shen et al. (2013)
employs hierarchical clustering in which a common shape structure is constructed each time two clusters are merged into a single cluster where building a common shape structure requires matching skeleton graphs. The method (skeleton path+spectral) presented in the workBai et al. (2016) combines the skeleton path distance Bai and Latecki (2008)
with spectral clustering. The performance of these two skeleton-based methods decreases for 1000shapes dataset which contains unarticulated shape categories such as face category. For 1000shapes dataset, the highest performance is obtained via the method (shape context+spectral) inBai et al. (2016) which uses shape context descriptor Belongie et al. (2002). As the shape context descriptor is not robust to deformation of shape articulations, the performance decreases for highly articulated 56shapes and 180shapes datasets. Inner distance shape context (IDSC) descriptor Ling and Jacobs (2007) is an articulation invariant alternative to the shape context descriptor. In the work Shen et al. (2013), the performance of IDSC combined with normalized cuts (Ncuts) algorithm is reported for the three datasets. Overall, we accurately cluster the shapes from 56shapes dataset and our proposed method has the highest NMI average over the three datasets. We observe that without constructing and matching graphs of shape components, our method performs comparable to the structural methods.
4 Summary and Conclusion
We presented a novel representation scheme which does not involve any relational/structural model of the shape components. Our representation scheme is based on a pixel-wise distinctness measure which is obtained by applying RPCA to the shape pixels represented in a high-dimensional feature space. The distinctness measure is proven to be very useful. Its spatial distribution on the shape domain provides easy partitioning of the shape into meaningful regions and its probability distribution provides a description of each region. We define a pairwise dissimilarity measure as the cost of optimal mapping between regions of the shapes. The results of the clustering experiments on highly articulated shape datasets show that our proposed method performs comparable to state of the art methods.
A.G. and S.T. contributed to the design and development of the proposed method and to the writing of the manuscript. A.G. contributed additionally to the software implementation and testing of the proposed method.
This research was funded by TUBITAK grant number 112E208.
The authors declare no conflict of interest. The funding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.
The following abbreviations are used in this manuscript:
|RPCA||Robust Principal Component Analysis|
|t-SNE||t-Distributed Stochastic Neighbor Embedding|
|NMI||Normalized Mutual Information|
|CSD||Common Structure Discovery|
|IDSC||Inner Distance Shape Context|
- Lin et al. (2010) Lin, Z.; Chen, M.; Ma, Y. The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv preprint arXiv:1009.5055 2010.
van der Maaten and Hinton (2008)
van der Maaten, L.; Hinton, G.
Visualizing High-Dimensional Data Using t-SNE.J. Mach. Learn. Res. 2008, 9, 2579–2605.
- Aslan and Tari (2005) Aslan, C.; Tari, S. An axis-based representation for recognition. Proc. of International Conference on Computer Vision, ICCV, 2005.
- Shen et al. (2013) Shen, W.; Wang, Y.; Bai, X.; Wang, H.; Latecki, L. Shape clustering: Common structure discovery. Pattern Recognition 2013, 46, 539–550. doi:black10.1016/j.patcog.2012.07.023.
- Frey and Dueck (2007) Frey, B.J.; Dueck, D. Clustering by Passing Messages Between Data Points. Science 2007, 315, 972–976. doi:black10.1126/science.1136800.
- Aslan et al. (2008) Aslan, C.; Erdem, A.; Erdem, E.; Tari, S. Disconnected skeleton: shape at its absolute scale. IEEE Trans. on Pattern Anal. and Mach. Intell. 2008, 30, 2188–2203. doi:black10.1109/TPAMI.2007.70842.
- Baseski et al. (2009) Baseski, E.; Erdem, A.; Tari, S. Dissimilarity between two skeletal trees in a context. Pattern Recognition 2009, 42, 370–385. doi:black10.1016/j.patcog.2008.05.022.
- Bai et al. (2016) Bai, S.; Liu, Z.; Bai, X. Co-spectral for robust shape clustering. Pattern Recognition Letters 2016, 83, 388–394. doi:black10.1016/j.patrec.2016.07.014.
- Bai and Latecki (2008) Bai, X.; Latecki, L. Path Similarity Skeleton Graph Matching. IEEE Trans. on Pattern Anal. and Mach. Intell. 2008, 30, 1282–1292. doi:black10.1109/TPAMI.2007.70769.
- Belongie et al. (2002) Belongie, S.; Malik, J.; Puzicha, J. Shape matching and object recognition using shape contexts. IEEE Trans. on Pattern Anal. and Mach. Intell. 2002, 24, 509–522. doi:black10.1109/34.993558.
- Ling and Jacobs (2007) Ling, H.; Jacobs, D.W. Shape Classification Using the Inner-Distance. IEEE Trans. on Pattern Anal. and Mach. Intell. 2007, 29, 286–299. doi:black10.1109/TPAMI.2007.41.