Diffusion-geometric maximally stable component detection in deformable shapes

12/17/2010 ∙ by Roee Litman, et al. ∙ 0

Maximally stable component detection is a very popular method for feature analysis in images, mainly due to its low computation cost and high repeatability. With the recent advance of feature-based methods in geometric shape analysis, there is significant interest in finding analogous approaches in the 3D world. In this paper, we formulate a diffusion-geometric framework for stable component detection in non-rigid 3D shapes, which can be used for geometric feature detection and description. A quantitative evaluation of our method on the SHREC'10 feature detection benchmark shows its potential as a source of high-quality features.



There are no comments yet.


page 5

page 6

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Over the past decade, feature-based methods have become a ubiquitous tool in image analysis and a de facto

standard in many computer vision and pattern recognition problems. More recently, there has been an increased interest in developing similar methods for the analysis of 3D shapes. Feature descriptors play an important role in many shape analysis applications, such as finding shape correspondence

Thorstensen and Keriven (2009) or assembling fractured models Huang et al. (2006) in computational aracheology. Bags of features Sivic and Zisserman (2003); Ovsjanikov et al. (2009); Toldo et al. (2009) and similar approaches Mitra et al. (2006) were introduced as a way to construct global shape descriptors that can be efficiently used for large-scale shape retrieval.

Many shape feature detectors and descriptors draw inspiration from and follow analogous methods in image analysis. For example, detection of geometric structures analogous to corners Sipiran and Bustos (2010) and edges Kolomenkin et al. (2009) in images has been studied. The histogram of intrinsic gradients used in Zaharescu et al. (2009) is similar in principle to the scale invariant feature transform (SIFT) Lowe (2004) which has recently become extremely popular in image analysis. In Gelfand et al. (2005), the integral invariant signatures Manay et al. (2004) successfully employed in 2D shape analysis were extended to 3D shapes.

Examples of 3D-specific descriptors include the popular spin image Johnson and Hebert (1999), based on representation of the shape normal field in a local system of coordinates. Recent studies introduced versatile and computationally efficient descriptors based on the heat kernel Sun et al. (2009); Bronstein and Kokkinos (2010) describing the local heat propagation properties on a shape. The advantage of these methods is the fact that heat diffusion geometry is intrinsic and thus deformation-invariant, which makes descriptors based on it applicable in deformable shape analysis.

1.1 Related work

A different class of feature detection methods tries to find stable components or regions in the analyzed image or shape. In the image processing literature, the watershed transform is the precursor of many algorithms for stable component detection Couprie and Bertrand (1997); Vincent and Soille (2002). In the computer vision and image analysis community, stable component detection is used in the maximally stable extremal regions (MSER) algorithm Matas et al. (2004). MSER represents intensity level sets as a component tree and attempts finding level sets with the smallest area variation across intensity; the use of area ratio as the stability criterion makes this approach affine-invariant, which is an important property in image analysis, as it approximates viewpoint transformations. Alternative stability criteria based on geometric scale-space analysis have been recently proposed in Kimmel et al. (2010).

In the shape analysis community, shape decomposition into characteristic primitive elements was explored in Mortara et al. (2003). Methods similar to MSER have been explored in the works on topological persistence Edelsbrunner et al. (2002). Persistence-based clustering Chazal et al. (2009) was used by Skraba et al. Skraba et al. (2010) to perform shape segmentation. In Digne et al. (2010), Digne et al. extended the notion of vertex-weighted component trees to meshes and proposed to detect MSER regions using the mean curvature. The approach was tested only in a qualitative way, and not evaluated as a feature detector.

1.2 Main contribution

The main contribution of our framework is three-fold. First, in Section 2 we introduce a generic framework for stable component detection, which unites vertex- and edge-weighted graph representations (as opposed to vertex-weighting used in image and shape maximally stable component detectors Matas et al. (2004); Digne et al. (2010)). Our results (see Section 4) show that the edge-weighted formulation is more versatile and outperforms its vertex-weighted counterpart in terms of feature repeatability. Second, in Section 3 we introduce diffusion geometric weighting functions suitable for both vertex- and edge-weighted component trees. We show that such functions are invariant under a large class of transformations, in particular, non-rigid inelastic deformations, making them especially attractive in non-rigid shape analysis. We also show several ways of constructing scale-invariant weighting functions. Third, in Section 4 we show a comprehensive evaluation of different settings of our method on a standard feature detection benchmark comprising shapes undergoing a variety of transformations (also see Figures 1 and 2).

2 Diffusion geometry

Diffusion geometry is an umbrella term referring to geometric analysis of diffusion or random walk processes. We models a shape as a compact two-dimensional Riemannian manifold . In it simplest setting, a diffusion process on

is described by the partial differential equation


called the heat equation, where denotes the positive-semidefinite Laplace-Beltrami operator associated with the Riemannian metric of . The heat equation describes the propagation of heat on the surface and its solution is the heat distribution at a point in time . The initial condition of the equation is some initial heat distribution ; if has a boundary, appropriate boundary conditions must be added.

The solution of (1) corresponding to a point initial condition , is called the heat kernel and represents the amount of heat transferred from to in time due to the diffusion process. The value of the heat kernel

can also be interpreted as the transition probability density of a random walk of length

from the point to the point .

Using spectral decomposition, the heat kernel can be represented as


Here, and

denote, respectively, the eigenfunctions and eigenvalues of the Laplace-Beltrami operator satisfying

(without loss of generality, we assume to be sorted in increasing order starting with ). Since the Laplace-Beltrami operator is an intrinsic geometric quantity, i.e., it can be expressed solely in terms of the metric of , its eigenfunctions and eigenvalues as well as the heat kernel are invariant under isometric transformations (bending) of the shape.

The parameter can be given the meaning of scale, and the family of heat kernels can be thought of as a scale-space of functions on . By integrating over all scales, a scale-invariant version of (2) is obtained,


This kernel is referred to as the commute-time kernel and can be interpreted as the transition probability density of a random walk of any length.

By setting , both the heat and the commute time kernels, and express the probability density of remaining at a point , respectively after time and after any time. The value , sometimes referred to as the auto-diffusivity function, is related to the Gaussian curvature through


This relation coincides with the well-known fact that heat tends to diffuse slower at points with positive curvature, and faster at points with negative curvature.

For any , the values of at every and in a small neighborhood around contain full information about the intrinsic geometry of the shape. Furthermore, Sun et al. Sun et al. (2009) show that under mild technical conditions, the set is also fully informative (note that the auto-diffusivity function has to be evaluated at all values of in order to contain full information about the shape metric).

2.1 Numerical computation

In the discrete setting, we assume that the shape is sampled at a finite number of points , upon which a simplicial complex (triangular mesh) with vertices , edges and faces is constructed. The computation of the discrete heat kernel and the associated diffusion geometry constructs is performed using formula (2), in which a finite number of eigenvalues and eigenfunctions of the discrete Laplace-Beltrami operator are taken. The latter can be computed directly using the finite elements method (FEM) Reuter et al. (2005), of by discretization of the Laplace operator on the mesh followed by its eigendecomposition. Here, we adopt the second approach according to which the discrete Laplace-Beltrami operator is expressed in the following generic form,


where is a scalar function defined on , are weights, and are normalization coefficients. In matrix notation, (5) can be written as , where is an vector, and . The discrete eigenfunctions and eigenvalues are found by solving the generalized eigendecomposition Lévy (2006) , where is a diagonal matrix of eigenvalues and

is the matrix of the corresponding eigenvectors.

Different choices of and have been studied, depending on which continuous properties of the Laplace-Beltrami operator one wishes to preserve Floater and Hormann (2005); Wardetzky et al. (2008). For triangular meshes, a popular choice adopted in this paper is the cotangent weight scheme Pinkall and Polthier (1993); Meyer et al. (2003), in which


where and are the two angles opposite to the edge between vertices and in the two triangles sharing the edge, and are the discrete area elements.

3 Maximally stable components

Let us now focus on the undirected graph with the vertex set and edge set underlying the discretization of a shape, which with some abuse of notation we will henceforth denote as . We say that two vertices and are adjacent if . An ordered sequence of vertices is called a path if for any , is adjacent to . In this case, we say that and are linked in . The graph is said to be connected if every pair of vertices in it is linked. A graph is called a subgraph of and denoted by . We say that is a (connected) component of if is a connected subgraph of that is maximal for this property (i.e., for any connected subgraph , implies ). Given , the graph induced by is the graph whose vertex set is made of all vertices belonging to an edge in , i.e., .

A scalar function is called a vertex weight, and a graph equipped with it is called vertex-weighted. Similarly, a graph equipped with a function defined on the edge set is called edge-weighted. In what follows, we will assume both types of weights to be non-negative. Grayscale images are often represented as vertex-weighted graphs with some regular (e.g., four-neighbor) connectivity and weights corresponding to the intensity of the pixels. Edge weights can be obtained, for example, by considering a local distance function measuring the dissimilarity of pairs of adjacent pixels. While vertex weighting is limited to scalar (grayscale) images, edge weighting is more general.

3.1 Component trees

Let be a vertex-weighted graph. For , the -cross-section of is defined as the graph induced by . Similarly, a cross-section of an edge-weighted graph is induced by the edge subset . A connected component of the cross-section is called an -level set of the weighted graph.

For any component of , we define the altitude as the minimal for which is a component of the -cross-section of . Altitudes establish a partial order relation on the connected components of as any component is contained in a component with higher altitude. The set of all such pairs therefore forms a tree called the component tree. Note that the above definitions are valid for both vertex- and edge-weighted graphs.

3.2 Maximally stable components

Since in our discussion undirected graphs are used as a discretization of smooth manifolds, we can associate with every component (or every subset of the vertex set in general) a measure of area, . In the simplest setting, the area of can be thought of as its cardinality. In a better discretization, each vertex in the graph is associated with a discrete area element , and the area of a component is defined as


Let now be a sequence of nested components forming a branch in the component tree. We define the instability of as


In other words, the more the area of a component changes with the change of , the less stable it is. A component is called maximally stable if the instability function has a local minimum at . Maximally stable components are widely known in the computer vision literature under the name of maximally stable extremal regions or MSERs for short Matas et al. (2004), with usually referred to as the region score.

It is important to note that in their original definition, MSERs were defined on a component tree of a vertex-weighted graph, while our definition is more general and allows for edge-weighted graphs as well. The importance of such an extension will become evident in the sequel. Also, the original MSER algorithm Matas et al. (2004) assumes the vertex weights to be quantized, while our formulation is suitable for scalar fields whose dynamic range is unknown a priori.

3.3 Computational aspects

We use the quasi-linear time algorithm detailed in Najman and Couprie (2006) for the construction of vertex-weighted component trees, and its straightforward adaptation to the edge-weighted case. The algorithm is based on the observation that the vertex set can be partitioned into disjoint sets which are merged together as one goes up in the tree. Maintaining and updating such a partition can be performed very efficiently using the union-find algorithm and related data structures. The resulting tree construction complexity is .

The derivative (8) of the component area with respect to constituting the stability function is computed using finite differences in each branch of the tree. For example, in a branch ,


The function is evaluated and its local minima are detected in a single pass over the branches of the component tree starting from the leaf nodes. We further filter out maximally stable regions with too high values of . In cases where two nested regions overlapping by more that a predefined threshold are detected as maximally stable, only the bigger one is kept.

4 Weighting functions

Unlike images where methods based on the analysis of the component tree have been shown to be extremely successful e.g. for segmentation or affine-invariant feature detection (namely, the MSER feature detector), similar techniques have been only scarcely explored for 3D shapes (with the notable exceptions of Digne et al. (2010) and Skraba et al. (2010)). One of possible reasons is the fact that while images readily offer pixel intensities as the trivial vertex weight field, 3D shapes are not generally equipped with any such field. While the use of the mean curvature was proposed in Digne et al. (2010), it lacks most of invariance properties required in deformable shape analysis. Here, we follow Skraba et al. (2010) in adopting the diffusion geometry framework and show that it allows to construct both vertex and edge weights suitable for the definition of maximally stable components with many useful properties.

Given a vertex , the values of the discrete auto-diffusivity function can be directly used as the vertex weights,


Maximally stable components defined this way are intrinsic and, thus, invariant to non-rigid bending. Such strong invariance properties are particularly useful in the analysis of deformable shapes. However, unlike images where the intensity field contains all information about the image, the above weighting function does not describe the intrinsic geometry of the shape entirely. It furthermore depends on the selection of the scale parameter .

Edge weights constitute a more flexible alternative allowing to incorporate fuller geometric information. The simplest edge weighting scheme can be obtained from a vector-valued field defined on the vertices of the graph. For example, associating for with each vertex , one can define an edge weighting function


(here, we write to make explicit that the norm is taken with respect to the variable ). The function has a closed-form expression that can be obtained by substituting the spectral decomposition (2) of the heat kernel. The advantage of this approach stems from its ability to incorporate multiple scales. Theoretically, the set of at all scales contains full information about the intrinsic geometry of the shape.

In a more general setting, edge weights do not necessarily need to stem from any finite- or infinite-dimensional vector field defined on the vertices. For example, since the discrete heat kernel represents “proximity” between and , a function inversely proportional to the value of the heat kernel, e.g.


can be used as an edge weight. For sufficiently small values of , this function also contains full information about the shape’s intrinsic geometry.

Another way of creating edge weights inversely proportional to is by integrating the squared difference between the kernels centered at and over the entire shape,


This construction has been previously introduced in Coifman and Lafon (2006) under the name of diffusion distance, which constitutes an intrinsic metric on and is fully informative for small ’s.

4.1 Scale invariance

The vertex weighting function (10) and the edge weighting functions (11), (12) and (13) based on the heat kernel are not scale invariant since a global scaling of the shape by a factor influences the heat kernel as , scaling by both the time parameter and the kernel itself. A possible remedy is to replace the heat kernel by the scale invariant commute time kernel. However, due to the slow decay of the expansion coefficients in (3) compared to in (2), the numerical computation of the commute time kernel is more difficult as it requires many more eigenfunctions of the Laplacian to achieve the same accuracy.

As an alternative, it is possible to use a sequence of transformations of that renders it scale invariant Bronstein and Kokkinos (2010)

. First, the heat kernel is sampled logarithmically in time. Next, the logarithm and a derivative with respect to time of the heat kernel values are taken to undo the multiplicative constant. Finally, taking the magnitude of the Fourier transform allows to undo the scaling of the time variable. This yields the

modified heat kernel of the form


where denotes the frequency variable of the Fourier transform. The transform is computed numerically using the FFT as detailed in Bronstein and Kokkinos (2010). Substituting into (11)–(12) yields scale invariant edge weighting functions.444Since the component inclusion relations giving rise to component tree are invariant to any monotonous transformation of the weighting functions, it is sufficient to undo just undoing the scaling of the time parameter without undoing the scaling of the kernel itself. However, such a transformation affects the scores of the detected regions. We found that the logarithmic transformation and derivative improve repeatability. Furthermore, by completely undoing the effect of scaling, the modified heat kernel can be used both in the weighting function and in descriptors of the maximally stable components as detailed in the following section. By selecting a single frequency , one can construct a scale invariant vertex weight similar to (10). Another way of constructing a scale invariant vertex weight is by integrating over a rage of frequencies, e.g.,

Figure 1: Maximally stable regions detected on different shapes from the TOSCA dataset. Note the invariance of the regions to strong non-rigid deformations. Also observe the similarity of the regions detected on the female shape and the upper half of the centaur (compare to the male shape from Figure 2). Regions were detected using as vertex weight function, with
Figure 2: Maximally stable regions detected on shapes from the SHREC’10 dataset using the vertex weight with . First row: different approximate isometries of the human shape. Second row: different transformations (left-to-right): holes, localscale, noise, shotnoise and scale.
Figure 3: Maximally stable regions detected on shapes from the SHREC’10 dataset using the edge weight with . Region coloring is arbitrary.

5 Descriptors

5.1 Point descriptors

Once the regions are detected, their content can be described using any standard point-wise descriptor of the form . In particular, here we consider point-wise heat kernel descriptors proposed in Sun et al. (2009). The heat kernel descriptor (or heat kernel signature, HKS) is computed by taking the values of the discrete auto-diffusivity function at vertex at multiple times, , where are some fixed time values. Such a descriptor is a vector of dimensionality at each point. Since the heat kernel is an intrinsic quantity, the HKS is invariant to isometric transformations of the shape.

A scale-invariant version of the HKS descriptor (SI-HKS) can be obtained as proposed Bronstein and Kokkinos (2010) by replacing with from (14), yielding , where are some fixed frequency values. In the following experiments, the heat kernel was sampled at time values . The first six discrete frequencies of the Fourier transform were taken, repeating the settings of Bronstein and Kokkinos (2010).

5.2 Region descriptors

Given a descriptor at each vertex , the simplest way to define a region descriptor of a component is by computing the average of in ,


The resulting region descriptor is a vector of the same dimensionality as the point descriptor .

An alternative construction considered here follows Ovsjanikov et al. Ovsjanikov et al. (2009) where a global shape descriptors were obtained from point-wise descriptors using the bag of features paradigm Sivic and Zisserman (2003). In this approach, a fixed “geometric vocabulary” is computed by means of an off-line clustering of the descriptor space. Next, each point descriptor at is represented in the vocabulary using vector quantization, yielding a point-wise -dimensional distribution of the form


The distribution is normalized in such a way that the elements of sum to one. In the case of , hard vector quantization is used, and for being the closest element of the geometric vocabulary to in the descriptor space, and zero elsewhere. Given a component , we can define a local bag of features by computing the distribution of geometric words over the region,


Such a bag of features is used as a region descriptor of dimensionality .

6 Results

6.1 Dataset

The proposed approach was tested on the data of the SHREC’10 feature detection and description benchmark Bronstein et al. (2010). The SHREC dataset consisted of three shape classes, with simulated transformations applied to them. Shapes are represented as triangular meshes with approximately 10,000 to 50,000 vertices. In our experiments, all meshes were downsampled to at most 10,000 vertices. Each shape class contained nine categories of transformations: isometry (non-rigid almost inelastic deformations), topology (welding of shape vertices resulting in different triangulation), micro holes and big holes simulating missing data and occlusions, global and local scaling, additive Gaussian noise, shot noise, and downsampling (less than 20% of the original points). In transformation appeared in five different strengths. Vertex-wise correspondence between the transformed and the null shapes was given and used as the ground truth in the evaluation of region detection repeatability. Since all shapes exhibit intrinsic bilateral symmetry, best results over the groundtruth correspondence and its symmetric counterpart were used.

We also used several deformable shapes from the TOSCA dataset Bronstein et al. (2008) for a qualitative evaluation.

6.2 Detector repeatability

Figure 4: Distributions of maximally stable components as a function of the overlap to the groundtruth regions and instability score. Left-to-right top-to-bottom are shown the following weighting function: vertex weight at , vertex weight , edge weight (diffusion-distance) at , edge weight at , vertex weight at and edge weight at . Good detectors are characterized by a large number of high-overlap stable regions (many regions in the upper left corner of the plot) that can be easily separated from the low-overlap regions that should be concentrated in the lower right corner.

The evaluation of the proposed feature detector and descriptor followed the spirit of the influential work by Mikolajczyk et al. Mikolajczyk et al. (2005). In the first experiment, the repeatability of the detector was evaluated. Let and be the null and the transformed version of the same shape, respectively. Let and denote the regions detected in and , and let be the image of the region in under the ground-truth correspondence.555As some of the transformed shapes had missing data compared to the null shape, comparison was defined single-sidedly. Only regions in the transformed shape that had no corresponding regions in the null counterpart decreased the overlap score, while unmatched regions of the null shape did not. Given two regions and , their overlap is defined as the following area ratio


The repeatability at overlap is defined as the percentage of regions in that have corresponding counterparts in with overlap greater than Mikolajczyk et al. (2005). An ideal detector has the repeatability of .

Four vertex weight functions were compared: discrete heat kernel (10) with , commute time kernel (3), modified heat kernel with , and the norm of the modified heat kernel (15). These four scalar fields were also used to construct edge weights according to . Furthermore, since these kernels are functions of a pair of vertices, they were used to define edge weights according to (12). In addition, we also tested edge weights constructed according to (11) and the diffusion distance (13). Unless mentioned otherwise, was used for the heat kernel and for the modified heat kernel, as these settings turned out to give best performance on the SHREC’10 dataset.

Figure 5: The set of maximally stable regions extracted from one of the shapes in Figure 2.

We first evaluated different region detectors qualitatively using shapes from the SHREC’10 and the TOSCA datasets. Figure 1 shows the regions detected using the vertex weight with on a few sample shapes from the TOSCA dataset. Figures 2 and 5 depict the maximally stable components detected with the same settings on several shapes from the SHREC dataset. Figure 3 shows the regions obtained using the edge weighting function . In all cases, the detected regions appear robust and repeatable under the transformations. Surprisingly, many of these regions have a clear semantic interpretation. Moreover, similarly looking regions are detected on the male and female shapes, and the upper half of the centaur. This makes the proposed feature detector a good candidate for partial shape matching and retrieval.

In order to select optimal cutoff threshold of the instability function (i.e., the maximum region instability value that is still accepted by the detector), we estimated the empirical distributions of the detecting regions as a function of the instability score and their overlap with the corresponding groundtruth regions. These histograms are depicted in Figure 

4. An good detector should produce many regions with overlap close to that have low instability, and produce as few as possible low-overlap regions that have very high instability that can be separated from the high-overlap regions by means of a threshold. In each of the tested detectors, the instability score threshold was selected to maximize the detection of high-overlap regions.

Table 1 summarizes the repeatability of different weighting functions at overlap of . Figures 6 and 7 depict the repeatability and the number of correctly matching regions as the function of the overlap for the best four of the compared weighting functions. We conclude that scale-dependent weighting generally outperform their scale-invariant counterparts in terms of repeatability. The four scalar fields corresponding to different auto-diffusivity functions perform well both when used as vertex and edge weights. Best repeatability is achieved by the edge weighting function . Best scale invariant weighting is also the edge weight .

Weighting Maximal instability Avg. number of Num. of correspondences Repeatability
function score used detected regions at overlap at overlap
Table 1: Repeatability of maximally stable components with different vertex and edge weighting functions.

Figure 6: Repeatability of maximally stable components with the vertex weight (first row) and edge weight (second row), .

Figure 7: Repeatability of maximally stable components with the edge weight (first row) and edge weight (second row), .

6.3 Descriptor discriminativity

Figure 8: ROC curves of different regions descriptors (“vs” stands for vocabulary size). The following detectors were used (left-to-right, top-to-bottom): vertex weight , edge weight , edge weight , and edge weight .
Figure 9: Performance of region descriptors with regions detected using the vertex weight , . Shown are the HKS descriptor (first row) and SI-HKS descriptor (second row).
Figure 10: Performance of region descriptors with regions detected using the edge weight , . Shown are the HKS descriptor (first row) and SI-HKS descriptor (second row).
Figure 11: Performance of region descriptors with regions detected using the edge weight , . Shown are the HKS descriptor (first row) and SI-HKS descriptor (second row).
Figure 12: Performance of region descriptors with regions detected using the edge weight . Shown are the HKS descriptor (first row) and SI-HKS descriptor (second row).

In the second experiment, the discriminativity of region descriptors was evaluated by measuring the relation between distance in the descriptor space and the overlap between the corresponding regions.

Using the notation from the previous section, let be one of the maximally stable components detected on a transformed shape , its image on the null shape under the ground truth correspondence, and let denote one of the maximally stable components detected on the null shape. A groundtruth relation between the regions is established by fixing a minimum overlap and deeming and matching if . Let us now be given a region descriptor ; for simplicity we assume the distance between the descriptors to be the standard Euclidean distance. By setting a threshold on this distance, and

will be classified as positives if

. We define the true positive rate as the ratio


similarly, the false positive rate is defined as


A related quantity is the false negative rate defined as . By varying the threshold , a set of pairs referred to as the receiver operator characteristic (ROC) curve is obtained. The particular point on the ROC curve for which the false positive and false negative rates coincide is called equal error rate (EER). We use EER as a scalar measure for the descriptor discriminativity. Ideal descriptors have .

Another descriptor performance criterion used here considers the first matches produced by the descriptor distance. For that purpose, for each we define its first match as the with (nearest neighbor of in the descriptor space). The matching score is defined as the ratio of correct first matches for a given overlap ,


The following four weighting functions exhibiting best repeatability scores in the previous experiment were used to define region detectors: the edge weight with (absolute winner in terms of repeatability), the vertex weight (second-best repeatability), its edge-weight counterpart (gives lower repeatability scores but supplies almost twice correspondences), and the edge weight (best scale invariant detector). Given the maximally stable components detected by a selected detector, region descriptors were calculated. We used two types of point descriptors: the heat kernel signature sampled at six time values , and its scale invariant version , for which we have taken the first six discrete frequencies of the Fourier transform (these are settings identical to Bronstein and Kokkinos (2010)). These point descriptors were used to create region descriptors using averaging () and local bags of features (). Bags of features were tested with vocabulary sizes and . Table 2 summarizes the performance in terms of EER of different combinations of weighting functions and region descriptors. Figure 8 depicts the ROC curves of different descriptors of vertex- and edge-weighted maximally stable component detectors.

Figures 912 show the number of correct first matches and the matching score as a function of the overlap for different choices of weighting functions and descriptors. Examples of matching regions are depicted in Figure 13.

We conclude that the scale invariant HKS descriptor consistently exhibits higher performance in both the average and bag of features flavors. The latter flavors perform approximately the same. The HKS descriptor, on the other hand, performs better in the bag of feature setting, though never reaching the scores of SIHKS. Surprisingly, as can be seen from Figures 912, the SIHKS descriptor is consistently more discriminative even in transformations not including scaling.

function Avgerage BoF() BoF() Avgerage BoF() BoF()
0.311 0.273 0.278 0.093 0.091 0.086
0.304 0.275 0.281 0.104 0.093 0.090
0.213 0.212 0.222 0.085 0.091 0.094
0.260 0.284 0.294 0.147 0.157 0.148
Table 2: Equal error rate (EER) performance of different maximally stable component detectors and descriptors ( was used in all cases). denotes the vocabulary size in the bag of features region descriptors.
Figure 13: Examples of closest matches found for different query regions on the TOSCA dataset. Shown from left to right are: query, 1st, 2nd, 4th, 10th, and 14th matches. Vertex weight with was used as the detector; average SIHKS was used as the descriptor.

7 Conclusions

We presented a generic framework for the detection of stable regions in non-rigid shapes. Our approach is based on the maximization of a stability criterion in a component tree representation of the shape with vertex or edge weights. Using diffusion geometric weighting functions allows obtaining a feature detection algorithm that is invariant to a wide class of shape transformations, in particular, non-rigid bending and global scaling, which makes our approach applicable in the challenging setting of deformable shape analysis. In followup studies, we are going to explore the uses of the proposed feature detectors and descriptors in shape matching and retrieval problems.


  • Bronstein et al. (2008) Bronstein, A., Bronstein, M., Bronstein, M., Kimmel, R., 2008. Numerical geometry of non-rigid shapes. Springer.
  • Bronstein et al. (2010) Bronstein, A., Bronstein, M., Bustos, B., Castellani, U., Crisani, M., Falcidieno, B., Guibas, L., Kokkinos, I., Murino, V., Ovsjanikov, M., et al., 2010. SHREC 2010: robust feature detection and description benchmark.
  • Bronstein and Kokkinos (2010) Bronstein, M. M., Kokkinos, I., 2010. Scale-invariant heat kernel signatures for non-rigid shape recognition. In: Proc. CVPR.
  • Chazal et al. (2009) Chazal, F., Guibas, L., Oudot, S., Skraba, P., 2009. Persistence-based clustering in Riemannian manifolds.
  • Coifman and Lafon (2006) Coifman, R., Lafon, S., 2006. Diffusion maps. Applied and Computational Harmonic Analysis 21 (1), 5–30.
  • Couprie and Bertrand (1997) Couprie, M., Bertrand, G., 1997. Topological grayscale watershed transformation. In: SPIE Vision Geometry V Proceedings. Vol. 3168. pp. 136–146.
  • Digne et al. (2010) Digne, J., Morel, J., Audfray, N., Mehdi-Souzani, C., 2010. The Level Set Tree on Meshes.
  • Edelsbrunner et al. (2002) Edelsbrunner, H., Letscher, D., Zomorodian, A., 2002. Topological persistence and simplification. Discrete and Computational Geometry 28 (4), 511–533.
  • Floater and Hormann (2005) Floater, M. S., Hormann, K., 2005. Surface parameterization: a tutorial and survey. Advances in Multiresolution for Geometric Modelling 1.
  • Gelfand et al. (2005) Gelfand, N., Mitra, N. J., Guibas, L. J., Pottmann, H., 2005. Robust global registration. In: Proc. SGP.
  • Huang et al. (2006) Huang, Q., Flöry, S., Gelfand, N., Hofer, M., Pottmann, H., 2006. Reassembling fractured objects by geometric matching. ACM Trans. Graphics 25 (3), 569–578.
  • Johnson and Hebert (1999) Johnson, A. E., Hebert, M., 1999. Using spin images for efficient object recognition in cluttered 3D scenes. Trans. PAMI 21 (5), 433–449.
  • Kimmel et al. (2010) Kimmel, R., Zhang, C., Bronstein, A. M., Bronstein, M. M., 2010. Are MSER features really interesting? IEEE Trans. PAMI.
  • Kolomenkin et al. (2009) Kolomenkin, M., Shimshoni, I., Tal, A., 2009. On edge detection on surfaces,. In: Proc. CVPR.
  • Lévy (2006) Lévy, B., 2006. Laplace-Beltrami eigenfunctions towards an algorithm that “understands” geometry. In: Proc. Shape Modeling and Applications.
  • Lowe (2004) Lowe, D., 2004. Distinctive image features from scale-invariant keypoint. IJCV 60 (2), 91–110.
  • Manay et al. (2004) Manay, S., Hong, B., Yezzi, A., Soatto, S., 2004. Integral invariant signatures. Lecture Notes in Computer Science, 87–99.
  • Matas et al. (2004) Matas, J., Chum, O., Urban, M., Pajdla, T., 2004. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing 22 (10), 761–767.
  • Meyer et al. (2003) Meyer, M., Desbrun, M., Schroder, P., Barr, A. H., 2003. Discrete differential-geometry operators for triangulated 2-manifolds. Visualization and Mathematics III, 35–57.
  • Mikolajczyk et al. (2005) Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L., 2005. A comparison of affine region detectors. IJCV 65 (1), 43–72.
  • Mitra et al. (2006) Mitra, N. J., Guibas, L. J., Giesen, J., Pauly, M., 2006. Probabilistic fingerprints for shapes. In: Proc. SGP.
  • Mortara et al. (2003) Mortara, M., Patane, G., Spagnuolo, M., Falcidieno, B., Rossignac, J., 2003. Blowing bubbles for multi-scale analysis and decomposition of triangle meshes. Algorithmica 38 (1), 227–248.
  • Najman and Couprie (2006) Najman, L., Couprie, M., 2006. Building the component tree in quasi-linear time. IEEE Trans. Image Proc. 15 (11), 3531–3539.
  • Ovsjanikov et al. (2009) Ovsjanikov, M., Bronstein, A., Guibas, L., Bronstein, M., 2009. Shape Google: a computer vision approach to invariant shape retrieval. In: Proc. NORDIA.
  • Pinkall and Polthier (1993) Pinkall, U., Polthier, K., 1993. Computing discrete minimal surfaces and their conjugates. Experimental mathematics 2 (1), 15–36.
  • Reuter et al. (2005) Reuter, M., Wolter, F.-E., Peinecke, N., 2005. Laplace-spectra as fingerprints for shape matching. In: Proc. ACM Symp. Solid and Physical Modeling. pp. 101–106.
  • Sipiran and Bustos (2010) Sipiran, I., Bustos, B., 2010. A robust 3D interest points detector based on Harris operator. In: Proc. 3DOR. Eurographics, pp. 7–14.
  • Sivic and Zisserman (2003) Sivic, J., Zisserman, A., 2003. Video Google: a text retrieval approach to object matching in videos. In: Proc. CVPR.
  • Skraba et al. (2010) Skraba, P., Ovsjanikov, M., Chazal, F., Guibas, L., 2010. Persistence-based segmentation of deformable shapes. In: Proc. NORDIA. pp. 45–52.
  • Sun et al. (2009) Sun, J., Ovsjanikov, M., Guibas, L., 2009. A Concise and Provably Informative Multi-Scale Signature Based on Heat Diffusion. In: Computer Graphics Forum. Vol. 28. pp. 1383–1392.
  • Thorstensen and Keriven (2009) Thorstensen, N., Keriven, R., 2009. Non-rigid shape matching using geometry and photometry. In: Proc. CVPR.
  • Toldo et al. (2009) Toldo, R., Castellani, U., Fusiello, A., 2009. Visual vocabulary signature for 3D object retrieval and partial matching. In: Proc. 3DOR.
  • Vincent and Soille (2002) Vincent, L., Soille, P., 2002. Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Trans. PAMI 13 (6), 583–598.
  • Wardetzky et al. (2008) Wardetzky, M., Mathur, S., Kälberer, F., Grinspun, E., 2008. Discrete Laplace operators: no free lunch. In: Conf. Computer Graphics and Interactive Techniques.
  • Zaharescu et al. (2009) Zaharescu, A., Boyer, E., Varanasi, K., Horaud, R., 2009. Surface feature detection and description with applications to mesh matching. In: Proc. CVPR.