1 Introduction
In last decade, the amount of geometric data available in the public domain, such as Google 3D Warehouse, has grown dramatically and created the demand for shape search and retrieval algorithms capable of finding similar shapes in the same way a search engine responds to text queries. However, while text search methods are sufficiently developed to be ubiquitously used, the search and retrieval of 3D shapes remains a challenging problem. Shape retrieval based on text metadata, like annotations and tags added by the users, is often incapable of providing relevance level required for a reasonable user experience (see Figure 1).
Contentbased shape retrieval using the shape itself as a query and based on the comparison of geometric and topological properties of shapes is complicated by the fact that many 3D objects manifest rich variability, and shape retrieval must often be invariant under different classes of transformations. A particularly challenging setting is the case of nonrigid shapes, including a wide range of transformations such as bending and articulated motion, rotation and translation, scaling, nonrigid deformation, and topological changes. The main challenge in shape retrieval algorithms is computing a shape descriptor, that would be unique for each shape, simple to compute and store, and invariant under different type of transformations. Shape similarity is determined by comparing the shape descriptors.
Prior works. Broadly, shape descriptors can be divided into global and local. The former consider global geometric or topological shape characteristics such as distance distributions [21, 24, 19]
, geometric moments
[14, 30], or spectra [23], whereas the latter describe the local behavior of the shape in a small patch. Popular examples of local descriptors include spin images [3], shape contexts [1], integral volume descriptors [12] and radiusnormal histograms [22]. Using the bag of features paradigm common in image analysis [25, 10], a global shape descriptor counting the occurrence of local descriptors in some vocabulary can be computed [7].Recently, there has been an increased interest in the use of diffusion geometry [11, 16] for constructing invariant shape descriptors. Diffusion geometry is closely related to heat propagation properties of shapes and allows obtaining global descriptors, such as distance distributions [24, 19, 8] and LaplaceBeltrami spectral signatures [23], as well local descriptors such as heat kernel signatures [26, 9]. In particular, heat kernel signatures [26] showed very promising results in largescale shape retrieval applications [7].
One limitation of these methods is that, so far, only geometric information has been considered. However, the abundance of textured models in computer graphics and modeling applications, as well as the advance in 3D shape acquisition [35, 36] allowing to obtain textured 3D shapes of even moving objects, bring forth the need for descriptors also taking into consideration photometric information. Photometric information plays an important role in a variety of shape analysis applications, such as shape matching and correspondence [28, 33]. Considering 2D views of the 3D shape [32, 20], standard feature detectors and descriptors used in image analysis such as SIFT [18] can be employed. More recently, Zaharescu et al. [37] proposed a geometric SIFTlike descriptor for textured shapes, defined directly on the surface.
Main contribution. In this paper, we extend the diffusion geometry framework to include photometric information in addition to its geometric counterpart. This way, we incorporate important photometric properties on one hand, while exploiting a principled and theoretically established approach on the other. The main idea is to define a diffusion process that takes into consideration not only the geometry but also the texture of the shape. This is achieved by considering the shape as a manifold in a higher dimensional combined geometricphotometric embedding space, similarly to methods in image processing applications [15, 17]. As a result, we are able to construct local descriptors (heat kernel signatures) and global descriptors (diffusion distance distributions). The proposed data fusion can be useful in coping with different challenges of shape analysis where pure geometric and pure photometric methods fail.
2 Background
Throughout the paper, we assume the shape to be modeled as a twodimensional compact Riemannian manifold
(possibly with a boundary) equipped with a metric tensor
. Fixing a system of local coordinates on , the latter can be expressed as a matrix, also known as the first fundamental form. The metric tensor allows to express the length of a vector
in the tangent space at a point as , where repeated indices are summed over following Einstein’s convention.Given a smooth scalar field on the manifold, its gradient is defined as the vector field satisfying for every point and every infinitesimal tangent vector . The metric tensor defines the LaplaceBeltrami operator that satisfies
(1) 
for any pair of smooth scalar fields ; here denotes integration with respect to the standard area measure on . Such an integral definition is usually known as the Stokes identity. The LaplaceBeltrami operator is positive semidefinite and selfadjoint. Furthermore, it is an intrinsic property of , i.e., it is expressible solely in terms of . In the case when the metric is Euclidean, becomes the standard Laplacian.
The LaplaceBeltrami operator gives rise to the heat equation,
(2) 
which describes diffusion processes and heat propagation on the manifold. Here, denotes the distribution of heat at time at point . The initial condition to the equation is some heat distribution , and if the manifold has a boundary, appropriate boundary conditions (e.g. Neumann or Dirichlet) must be specified. The solution of (2) with a point initial heat distribution is called the heat kernel and denoted here by . Using a signal processing analogy, can be thought of as the “impulse response” of the heat equation.
By the spectral decomposition theorem, the heat kernel can be represented as [13]
(3) 
where
are the eigenvalues and
the corresponding eigenfunctions of the LaplaceBeltrami operator (i.e., solutions to
). The value of the heat kernelcan be interpreted as the transition probability density of a random walk of length
from the point to the point . This allows to construct a family of intrinsic metrics known as diffusion metrics,^{1}^{1}1Note that here the term metric is understood in the sense of metric geometry rather than the Riemannian inner product. To avoid confusion, we refer to the latter as to metric tensor throughout the paper.(4)  
These metrics have an inherent multiscale structure and measure the “connectivity rate” of the two points by paths of length . We will collectively refer to quantities expressed in terms of the heat kernel or diffusion metrics as to diffusion geometry. Since the LaplaceBeltrami operator is intrinsic, the diffusion geometry it induces is invariant under isometric deformations of (incongruent embeddings of into ).
3 Fusion of geometric and photometric data
Let us further assume that the Riemannian manifold is a submanifold of some manifold () with the Riemannian metric tensor , embedded by means of a diffeomorphism . A Riemannian metric tensor on induced by the embedding is the pullback metric for , where is the differential of . In coordinate notation, the pullback metric is expressed as , where the indices denote the embedding coordinates.
Here, we use the structure of to model joint geometric and photometric information. Such an approach has been successfully used in image processing [15]. When considering shapes as geometric object only, we define and to be the Euclidean metric. In this case, acts as a parametrization of and the pullback metric becomes simply . In the case considered in this paper, the shape is endowed with photometric information given in the form of a field , where denotes some colorspace (e.g., RGB or Lab). This photometric information can be modeled by defining and an embedding . The embedding coordinates corresponding to geometric information are as previously and the embedding coordinate corresponding to photometric information are given by , where is a scaling constant. Simplifying further, we assume to have a Euclidean structure (for example, the Lab colorspace has a natural Euclidean metric). The metric in this case boils down to , which hereinafter we shall denote by .
The LaplaceBeltrami operator associated with such a metric gives rise to diffusion geometry that combines photometric and geometric information (Figure 2).
Invariance. It is important to mention that the joint metric tensor and the diffusion geometry it induces have inherent ambiguities. Let us denote by and the respective groups of transformation that leave the geometric and the photometric components of the shape unchanged. We will refer to such transformations as geometric and photometric isometries. The diffusion metric induced by is invariant the joint isometry group . Ideally, we would like to hold. In practice, is bigger: while every composition of a geometric isometry with a photometric isometry is a joint isometry, there exist some joint isometries which cannot be obtained as a composition of geometric and photometric isometries. An example of such transformations is uniform scaling of combined with compensating scaling of . The ambiguity stems from the fact that is bigger compared to . Experimental results show that no realistic geometric and photometric transformations lie in , however, a formal characterization of the isometry group is an important theoretical question for future research.
4 Numerical implementation
Let denote the discrete samples of the shape, and be the corresponding embedding coordinates (threedimensional in the case we consider only geometry, or sixdimensional in the case of geometryphotometry fusion). We further assume to be given a triangulation (simplicial complex), consisting of edges and faces where each , and is an edge (here ).
Discrete Laplacian. A function on the discretized manifold is represented as an dimensional vector . The discrete LaplaceBeltrami operator can be written in the generic form
(5) 
where are weights, are normalization coefficients, and denotes a local neighborhood of point . Different discretizations of the LaplaceBeltrami operator can be cast into this form by appropriate definition of the above constants. For shapes represented as triangular meshes, a widelyused method is the cotangent scheme, which preserves many important properties of the continuous LaplaceBeltrami operator, such as positive semidefiniteness, symmetry, and locality [31]. Yet, in general, the cotangent scheme does not converge to the continuous LaplaceBeltrami operator, in the sense that the solution of the discrete eigenproblem does not converge to the continuous one (pointwise convergence exists if the triangulation and sampling satisfy certain conditions [34]).
Belkin et al. [5] proposed a discretization which is convergent without the restrictions on “good” triangulation required by the cotangent scheme. In this scheme, is chosen to be the entire sampling , , and , where is a parameter. In the case of a Euclidean colorspace, can be written explicitly as
(6) 
where , which resembles the weights used in the bilateral filter [29]. Experimental results also show that this operator produces accurate approximation of the LaplaceBeltrami operator under various conditions, such as noisy data input and different sampling [27, 5].
Heat kernel computation. In matrix notation, equation (5) can be written as , where and . The eigenvalue problem is equivalent to the generalized symmetric eigenvalue problem , where is the diagonal matrix of the first eigenvalues, and
is the matrix of the eigenvectors stacked as columns. Since typically
is sparse, this problem can be efficiently solved numerically.Heat kernels can be approximated by taking the first largest eigenvalues and the corresponding eigenfunctions in (3). Since the coefficients in the expansion of decay as , typically a few eigenvalues ( in the range of to ) are required.
5 Results and applications
In this section, we show the application of the proposed framework to retrieval of textured shapes. We compare two approaches: bags of local features and distributions of diffusion distances.
5.1 Bags of local features
ShapeGoogle framework. Sun et al. [26] proposed using the heat propagation properties as a local descriptor of the manifold. The diagonal of the heat kernel,
(7) 
referred to as the heat kernel signature (HKS), captures the local properties of at point and scale . The descriptor is computed at each point as a vector of the values , where are some time values. Such a descriptor is deformationinvariant, easy to compute, and provably informative [26].
Ovsjanikov et al. [7] employed the HKS local descriptor for largescale shape retrieval using the bags of features paradigm [25]. In this approach, the shape is considered as a collection of “geometric words” from a fixed “vocabulary” and is described by the distribution of such words, also referred to as a bag of features or BoF. The vocabulary is constructed offline by clustering the HKS descriptor space. Then, for each point on the shape, the HKS is replaced by the nearest vocabulary word by means of vector quantization. Counting the frequency of each word, a BoF is constructed. The similarity of two shapes and is then computed as the distance between the corresponding BoFs, .
Using the proposed approach, we define the color heat kernel signature (cHKS), defined in the same way as HKS with the standard LaplaceBelrami operator replaced by the one resulting from the geometricphotometric embedding. In the following, we show that such descriptors allow achieving superior retrieval performance.
Evaluation methodology. In order to evaluate the proposed method, we used the SHREC 2010 robust largescale shape retrieval benchmark methodology [6]. The query set consisted of 270 realworld human shapes from 5 classes acquired by a 3D scanner with real geometric transformations and simulated photometric transformations of different types and strengths, totalling in 54 instances per shape (Figure 3). Geometric transformations were divided into isometry+topology (real articulations and topological changes due to acquisition imperfections), and partiality (occlusions and addition of clutter such as the red ball in Figure 3). Photometric transformations included contrast (increase and decrease by scaling of the channel), brightness (brighten and darken by shift of the channel), hue (shift in the channel), saturation (saturation and desaturation by scaling of the channels), and color noise (additive Gaussian noise in all channels). Mixed transformations included isometry+topology transformations in combination with two randomly selected photometric transformations. In each class, the transformation appeared in five different versions numbered 1–5 corresponding to the transformation strength levels. One shape of each of the five classes was added to the queried corpus in addition to other 75 shapes used as clutter (Figure 4).
Retrieval was performed by matching 270 transformed queries to the 75 null shapes. Each query had exactly one correct corresponding null shape in the dataset. Performance was evaluated using the precisionrecall characteristic. Precision is defined as the percentage of relevant shapes in the first topranked retrieved shapes. Mean average precision (mAP), defined as , where is the relevance of a given rank, was used as a single measure of performance. Intuitively, mAP is interpreted as the area below the precisionrecall curve. Ideal retrieval performance results in first relevant match with mAP=100%. Performance results were broken down according to transformation class and strength.
Methods. In additional to the proposed approach, we compared purely geometric, purely photometric, and joint photometricgeometric descriptors. As a purely geometric descriptor, we used bags of features based on HKS according to [7]; purely photometric shape descriptor was a color histogram. As joint photometricgeometric descriptors, we used bags of features computed with the MeshHOG [37] and the proposed color HKS (cHKS).
For the computation of the bag of features descriptors, we used the Shape Google framework with most of the settings as proposed in [7]. More specifically, HKS were computed at six scales (, and
). Soft vector quantization was applied with variance taken as twice the median of all distances between cluster centers. Approximate nearest neighbor method
[2] was used for vector quantization. The LaplaceBeltrami operator discretization was computed using the MeshLaplace scheme [4] with scale parameter . Heat kernels were approximated using the first eigenpairs of the discrete Laplacian. The MeshHOG descriptor was computed at prominent feature points (typically 1002000 per shape), detected using the MeshDOG detector [37]. The vocabulary size in all the cases was set to .In cHKS, in order to avoid the choice of an arbitrary value , we used a set of three different weights () to compute the cHKS and the corresponding BoFs. The distance between two shapes was computed as the sum of the distances between the corresponding BoFs for each , weighted by , and 1 in case of , .
Results. Tables 1–4 summarize the results of our experiments. Geometry only descriptor (HKS) [7] is invariant to photometric transformations, but is somewhat sensitive to topological noise and missing parts (Table 1). On the other hand, the coloronly descriptor works well only for geometric transformations that do not change the shape color. Photometric transformations, however, make such a descriptor almost useless (Table 2). MeshHOG is almost invariant to photometric transformations being based on texture gradients, but is sensitive to color noise (Table 3). The fusion of the geometric and photometric data using our approach (Table 4) achieves nearly perfect retrieval for mixed and photometric transformations and outperforms other approaches. Figure 5 visualizes a few examples of the retrieved shapes ordered by relevance, which is inversely proportional to the distance from the query shape.
Strength  
Transform.  1  2  3  4  5 
Isom+Topo  100.00  100.00  96.67  95.00  90.00 
Partial  66.67  60.42  63.89  63.28  63.63 
Contrast  100.00  100.00  100.00  100.00  100.00 
Brightness  100.00  100.00  100.00  100.00  100.00 
Hue  100.00  100.00  100.00  100.00  100.00 
Saturation  100.00  100.00  100.00  100.00  100.00 
Noise  100.00  100.00  100.00  100.00  100.00 
Mixed  90.00  95.00  93.33  95.00  96.00 
Strength  
Transform.  1  2  3  4  5 
Isom+Topo  100.00  100.00  100.00  100.00  100.00 
Partial  100.00  100.00  100.00  100.00  100.00 
Contrast  100.00  90.83  80.30  71.88  63.95 
Brightness  88.33  80.56  65.56  53.21  44.81 
Hue  11.35  8.38  6.81  6.05  5.49 
Saturation  17.47  14.57  12.18  10.67  9.74 
Noise  100.00  100.00  93.33  85.00  74.70 
Mixed  28.07  25.99  20.31  17.62  15.38 
Strength  
Transform.  1  2  3  4  5 
Isom+Topo  100.00  95.00  96.67  94.17  95.33 
Partial  75.00  61.15  69.93  68.28  68.79 
Contrast  100.00  100.00  100.00  98.33  94.17 
Brightness  100.00  100.00  100.00  100.00  99.00 
Hue  100.00  100.00  100.00  100.00  100.00 
Saturation  100.00  100.00  100.00  98.75  99.00 
Noise  100.00  100.00  88.89  83.33  78.33 
Mixed  100.00  100.00  100.00  93.33  83.40 
Strength  
Transform.  1  2  3  4  5 
Isom+Topo  100.00  100.00  96.67  97.50  94.00 
Partial  68.75  68.13  69.03  67.40  67.13 
Contrast  100.00  100.00  100.00  100.00  100.00 
Brightness  100.00  100.00  100.00  100.00  100.00 
Hue  100.00  100.00  100.00  100.00  100.00 
Saturation  100.00  100.00  100.00  100.00  100.00 
Noise  100.00  100.00  100.00  100.00  100.00 
Mixed  100.00  100.00  96.67  97.50  98.00 
5.2 Shape distributions
Spectral shape distances. Recent works [24, 19] showed that global shape descriptors can be constructed considering distributions of intrinsic distances. Given some intrinsic distance metric , its cumulative distribution is computed as
(8) 
where denotes an indicator function. Given two shapes and with the corresponding distance metrics , the similarity (referred to as spectral distance) is computed as a distance between the corresponding distributions and .
Using the proposed framework, we construct diffusion distances according to (4), where the standard LaplaceBeltrami operator is again replaced by the one associated with the geometricphotometric embedding. Such distances account for photometric information, and, as we show in the following, show superior performance.
Methods. Using the same benchmark as above, we compared shape retrieval approaches that use distance distribution as shape descriptors. Two methods were compared: pure geometric and joint geometricphotometric distances. In the former, we used average of diffusion distances
(9) 
computed at two scales, . In the latter, the distances were also computed at multiple scales of the photometric component,
(10) 
The values were used. For the computation of distributions, the shapes were subsampled at points using the farthest point sampling algorithm.
Results. Tables LABEL:tab:regular:diff–LABEL:tab:mltscsum summarize the results. Both descriptors appear insensitive to photometric transformations. The joint distance has superior performance in pure geometric and mixed transformations. We conclude that the use of nonzero weight for the color component adds discriminativity to the distance distribution descriptor, while being still robust under photometric transformations.
6 Conclusions
In this paper, we explored a way to fuse geometric and photometric information in the construction of shape descriptors. Our approach is based on heat propagation on a manifold embedded into a combined geometrycolor space. Such diffusion processes capture both geometric and photometric information and give rise to local and global diffusion geometry (heat kernels and diffusion distances), which can be used as informative shape descriptors. We showed experimentally that the proposed descriptors outperform other geometryonly and photometryonly descriptors, as well as stateoftheart joint geometricphotometric descriptors. In the future, it would be important to formally characterize the isometry group induced by the joint metric in order to understand the invariant properties of the proposed diffusion geometry, and possibly design applicationspecific invariant descriptors.
References
 [1] J. Amores, N. Sebe, and P. Radeva. Contextbased objectclass recognition and retrieval by generalized correlograms. Trans. PAMI, 29(10):1818–1833, 2007.
 [2] S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching. J. ACM, 45:891–923, 1998.
 [3] J. Assfalg, M. Bertini, A.D. Bimbo, and P. Pala. Contentbased retrieval of 3d objects using spin image signatures. Multimedia, IEEE Transactions on, 9(3):589 –599, apr. 2007.
 [4] M. Belkin, J. Sun, and Y. Wang. Constructing Laplace operator from point clouds in Rd. In Proc. Symp. Discrete Algorithms, pages 1031–1040, 2009.
 [5] M. Belkin, J. Sun, and Y. Wang. Discrete Laplace operator on meshed surfaces. In Proc. Symp. Computational Geometry, pages 278–287, 2009.
 [6] A. M. Bronstein, M. M. Bronstein, U. Castellani, B. Falcidieno, A. Fusiello, A. Godil, L. J. Guibas, I. Kokkinos, Z. Lian, M. Ovsjanikov, G. Patané, M. Spagnuolo, and R. Toldo. Shrec 2010: robust largescale shape retrieval benchmark. In Proc. 3DOR, 2010.

[7]
A. M. Bronstein, M. M. Bronstein, M. Ovsjanikov, and L. J. Guibas.
Shape google: a computer vision approach to invariant shape retrieval.
In Proc. NORDIA, 2009.  [8] M. M. Bronstein and A. M. Bronstein. Shape recognition with spectral distances. Trans. PAMI, 2010. to appear.
 [9] M. M. Bronstein and I. Kokkinos. Scaleinvariant heat kernel signatures for nonrigid shape recognition. In Proc. CVPR, 2010.
 [10] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. In Proc. ICCV, 2007.
 [11] R. R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21:5–30, July 2006.
 [12] N. Gelfand, N. J. Mitra, L. J. Guibas, and H. Pottmann. Robust global registration. In Proc. SGP, 2005.
 [13] P. W. Jones, M. Maggioni, and R. Schul. Manifold parametrizations by eigenfunctions of the Laplacian and heat kernels. PNAS, 105(6):1803, 2008.
 [14] M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz. Rotation invariant spherical harmonic representation of 3D shape descriptors. In Proc. SGP, pages 156–164, 2003.
 [15] R. Kimmel, R. Malladi, and N. Sochen. Images as embedded maps and minimal surfaces: movies, color, texture, and volumetric medical images. IJCV, 39(2):111–129, 2000.
 [16] B. Lévy. LaplaceBeltrami eigenfunctions towards an algorithm that “understands” geometry. In Proc. Shape Modeling and Applications, 2006.
 [17] H. Ling and D. W. Jacobs. Deformation invariant image matching. In In ICCV, pages 1466–1473, 2005.
 [18] D. Lowe. Distinctive image features from scaleinvariant keypoint. IJCV, 2004.
 [19] M. Mahmoudi and G. Sapiro. Threedimensional point cloud recognition via distributions of geometric distances. Graphical Models, 71(1):22–31, January 2009.
 [20] R. Ohbuchi, K. Osada, T. Furuya, and T. Banno. Salient local visual features for shapebased 3d model retrieval. pages 93 –102, jun. 2008.
 [21] R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin. Shape distributions. TOG, 21(4):807–832, 2002.
 [22] X. Pan, Y. Zhang, S. Zhang, and X. Ye. Radiusnormal histogram and hybrid strategy for 3d shape retrieval. pages 372 – 377, jun. 2005.
 [23] M. Reuter, F.E. Wolter, and N. Peinecke. Laplacespectra as fingerprints for shape matching. In Proc. ACM Symp. Solid and Physical Modeling, pages 101–106, 2005.
 [24] R. M. Rustamov. LaplaceBeltrami eigenfunctions for deformation invariant shape representation. In Proc. SGP, pages 225–233, 2007.
 [25] J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In Proc. CVPR, 2003.
 [26] J. Sun, M. Ovsjanikov, and L. J. Guibas. A concise and provably informative multiscale signature based on heat diffusion. In Proc. SGP, 2009.
 [27] K. Thangudu. Practicality of Laplace operator, 2009.
 [28] N. Thorstensen and R. Keriven. Nonrigid shape matching using geometry and photometry. In Proc. CVPR, 2009.
 [29] C. Tomasi and R. Manduchi. Bilateral fitering for gray and color images. In Proc. ICCV, pages 839–846, 1998.
 [30] D. V. Vranic, D. Saupe, and J. Richter. Tools for 3Dobject retrieval: KarhunenLoeve transform and spherical harmonics. In Proc. Workshop Multimedia Signal Processing, pages 293–298, 2001.
 [31] M. Wardetzky, S. Mathur, F. Kälberer, and E. Grinspun. Discrete Laplace operators: no free lunch. In Conf. Computer Graphics and Interactive Techniques, 2008.
 [32] C. Wu, B. Clipp, X. Li, J.M. Frahm, and M. Pollefeys. 3d model matching with viewpointinvariant patches (vip). pages 1 –8, jun. 2008.
 [33] J.V. Wyngaerd. Combining texture and shape for automatic crude patch registration. pages 179 – 186, oct. 2003.
 [34] G. Xu. Convergence of discrete LaplaceBeltrami operators over surfaces. Technical report, Institute of Computational Mathematics and Scientific/Engineering Computing, China, 2004.

[35]
K.J. Yoon, E. Prados, and P. Sturm.
Joint estimation of shape and reflectance using multiple images with known illumination conditions, 2010.
 [36] A. Zaharescu, E. Boyer, and R. P. Horaud. Transformesh: a topologyadaptive meshbased approach to surface evolution, November 2007.
 [37] A. Zaharescu, E. Boyer, K. Varanasi, and R Horaud. Surface feature detection and description with applications to mesh matching. In Proc. CVPR, 2009.
Comments
There are no comments yet.