Unsupervised Deep Learning for Structured Shape Matching

12/10/2018 ∙ by Jean-Michel Roufosse, et al. ∙ 0

We present a novel method for computing correspondences across shapes using unsupervised learning. Our method allows to compute a non-linear transformation of given descriptor functions, while optimizing for global structural properties of the resulting maps, such as their bijectivity or approximate isometry. To this end, we use the functional maps framework, and build upon the recently proposed FMNet architecture for descriptor learning. Unlike the method proposed in that work, however, we show that learning can be done in a purely unsupervised setting, without having access to any ground truth correspondences. This results in a very general shape matching method, which can be used to establish correspondences within shape collections or even just a single shape pair, without any prior information. We demonstrate on a wide range of challenging benchmarks, that our method leads to significant improvement compared to the existing axiomatic methods and achieves comparable, and in some cases superior results to even the supervised learning techniques.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 8

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Shape matching is a fundamental problem in computer vision and geometric data analysis more widely, with applications in deformation transfer

[35] or statistical shape modeling [5], to name a few.

During the past decades, a large number of techniques have been proposed for both rigid and non-rigid shape matching [37]. The latter case is both more general and more challenging since the shapes can potentially undergo arbitrary deformations, which are not easy to characterize by purely axiomatic approaches. As a result several recent methods have proposed to consider learning-based techniques for addressing the shape correspondence problem, e.g. [21, 9, 22, 44] among many others. Most of these approaches are based on the idea that the underlying correspondence model can be learned from data, typically given in the form of ground truth correspondences between some shape pairs. In the simplest case, this can be formulated as a labeling problem, where different points, e.g., in a template shape, correspond to labels to be predicted [44, 23].

More recently, several methods have been proposed for structured map prediction, which aim to infer an entire map, rather than labeling each point independently [9, 19]. These techniques are based on learning pointwise descriptors, but, crucially, impose a penalty on the entire map, obtained after inference using these descriptors, which results in higher quality, globally consistent correspondences.

At the same time, while learning-based methods have achieved impressive performance, their utility is severely limited by requiring the presence of high-quality ground truth maps between a sufficient number of training examples. This makes it difficult to apply such approaches to new shape classes for which ground truth is not available.

In our paper, we show that this limitation can be lifted and propose a purely unsupervised strategy, which combines the benefits and accuracy of learning-based methods with the generality of axiomatic techniques for structured shape correspondence. Key to our approach is a bi-level optimization scheme, which optimizes given descriptors, but imposes a penalty on the entire map, inferred from them. For this, we use the recently proposed FMNet architecture [19], which exploits the functional map representation [26]. However, rather than penalizing the deviation of the map from the ground truth, we enforce structural properties on the map, such as its bijectivity or approximate isometry. This results in a very general shape matching method, that, perhaps surprisingly, achieves comparable or even superior performance to existing methods, but without any supervision.

2 Related Work

Computing correspondences between 3D shapes is a very well-studied area of computer vision and computer graphics, and its full overview is beyond the scope of our paper. Below we only review the most closely related methods and refer the interested readers to recent surveys including [39, 37, 4] for a more in-depth discussion of other shape matching approaches.

Functional Maps

Our method is built on the functional map representation, which was originally introduced in [26] for solving non-rigid shape matching problems, and then extended significantly in follow-up works, including [17, 2, 16, 30, 13, 8] among many others (see also [27] for a recent overview).

One of the key benefits of this framework is that it allows to represent maps between shapes as small matrices, which encode relations between basis functions defined on the shapes. Moreover, as observed by several works in this domain, [26, 17, 34, 30, 8] many natural properties on the underlying pointwise correspondences can be expressed as objectives on functional maps. This includes orthonormality of functional maps, which corresponds to the local area-preservation nature of pointwise correspondences [26, 17, 34]; commutativity with the Laplacian operators, which corresponds to intrinsic isometries [26], preservation of inner products of gradients of functions, which corresponds to conformal maps [34, 8, 43]; preservation of pointwise products of functions, which corresponds to functional maps arising from point-to-point correspondences [25, 24]; and slanted diagonal structure of functional map in the context of partial shapes [30, 20] among others.

Similarly, several other regularizers have been proposed, including exploiting the relation between functional maps in different directions [12], the map adjoint [15], and powerful cycle-consistency constraints [14] in shape collections, among many others. More recently constraints on functional maps have been introduced to promote continuity of the recovered pointwise correspondence [28] and kernel-based techniques for extracting more information from given descriptor constraints [42] among others.

All these methods, however, are based on combining first order penalties, that arise from enforcing descriptor preservation constraints

with these additional desirable structural properties of functional maps. As a result, any error or inconsistency in the pre-computed descriptors will inevitably lead to severe map estimation errors. Several methods have been suggested to use robust norms on descriptor constraints

[17, 16], which can help reduce the influence of certain descriptors but still does not to control the global map consistency properties.

Learning, inc. Deep Learning-based Methods

To overcome the inherent difficulty of axiomatic modeling non-rigid shape correspondence, several methods have been proposed to learn the correct deformation model from data with learning-based techniques. Some early approaches in this direction were used to learn either optimal parameters of spectral descriptors [21]

or exploited random forests

[32] or metric learning [10] for learning optimal constraints given some ground truth matches.

More recently, with the advent of deep learning methods, several approaches have been proposed to learning transformations in the context of non-rigid shape matching. Most of the proposed methods either use Convolutional Neural Networks (CNNs) on depth maps, e.g. for dense human body correspondence

[44] or propose extensions of CNNs directly to curved surfaces, either using the link between convolution and multiplication in the spectral domain [6, 11], or directly defining local parametrizations, for example via the exponential map, which allows convolution in the tangent plane of a point, e.g. [22, 7, 23] among others.

These methods have been applied to non-rigid shape matching, in most cases modeling it as a label prediction problem, with points corresponding to different labels. Although successful in the presence of sufficient amount of training data, such approaches typically do not impose global consistency, which can lead to significant artefacts, such as outliers, and require post-processing to achieve high-quality maps.

Learning for Structured Prediction

Perhaps most closely related to our approach are recent works that apply learning for structured map prediction [9, 19]. These methods learn a transformation of given input descriptors, while optimizing for the deviation of the map computed from them using the functional map framework, from some known ground truth correspondences. As shown in these works [9, 19] imposing a penalty on entire maps, and thus evaluating the ultimate use of the descriptors, can lead to significant accuracy improvements in practice.

Contribution

Unlike these existing methods, we propose an unsupervised learning-based approach that transforms given input descriptors, while optimizing for structural map properties, without any ground truth knowledge. Our method, which can be seen as a bi-level optimization strategy, allows to explicitly control the interaction between the pointwise descriptors and the global map consistency, computed via the functional map framework. As a result, our technique is both scalable with respect to shape complexity and, as we show below, leads to significant improvement compared to the standard axiomatic methods, and achieves comparable, and in some cases superior, performance even to supervised approaches.

3 Background & Motivation

3.1 Shape Matching and Functional Maps

Our work is based on the functional map framework and representation. For completeness, we briefly review the basic notions and pipeline for estimating functional maps, and refer the interested reader to a recent course [27] for a more in-depth discussion.

Basic Pipeline

Given a pair of shapes, represented as triangle meshes, and containing, respectively, and vertices, the basic pipeline for computing a map between them using the functional map framework, consists of the following main steps (see Chapter 2 in [27]) :

  1. Compute a small set of

    of basis functions on each shape, e.g. by taking the first few eigenfunctions of the corresponding Laplace-Beltrami operator.

  2. Compute a set of descriptor functions on each shape that are expected to be approximately preserved by the unknown map. For example, a descriptor function can correspond to a particular dimension (e.g. choice of time parameter of the Heat Kernel Signature [36]) computed at every point. Store their coefficients in the corresponding bases as columns of matrices .

  3. Compute the optimal functional map by solving the following optimization problem:

    (1)

    where the first term aims at the descriptor preservation: , whereas the second term regularizes the map by promoting the correctness of its overall structural properties. The simplest approach penalizes the failure of the unknown functional map to commute with the Laplace-Beltrami operators, which can be written as:

    (2)

    where and

    are diagonal matrices of the Laplace-Beltrami eigenvalues on the two shapes.

  4. Convert the functional map to a point-to-point map, for example using nearest neighbor search in the spectral embedding, or using other more advanced techniques [31, 13].

One of the strengths of this pipeline is that typically Eq. (1) leads to a simple (e.g., least squares) problem with the unknowns, independent of the number of points on the shapes. This formulation has been extended using e.g. manifold optimization [18], descriptor preservation constraints via commutativity [25] and, more recently, with kernelization [42] among many others (see also Chapter 3 in [27]).

3.2 Deep Functional Maps

Despite its simplicity and efficiency, the functional map estimation pipeline described above is fundamentally dependent on the initial choice of descriptor functions. To alleviate this dependence, several approaches have been proposed to learn the optimal descriptors from data [9, 19]. In our work, we build upon a recent deep learning-based framework, called FMNet, introduced by Litany et al. [19] that aims to transform a given set of descriptors so that the optimal map computed using them is as close as possible to some ground truth map given during training.

In particular, the approach proposed in [19] assumes, as input, a set of shape pairs for which ground truth point-wise maps are given, and aims to solve the following problem:

(3)
(4)

Here is a non-linear transformation, in the form of a neural network, to be applied to some input descriptor functions , Train is the set of training pairs for which ground truth correspondence is known, is the soft error loss, which penalizes the deviation of the computed functional map , after converting it to a soft map from the ground truth correspondence, and denotes the transformed descriptors written in the basis of shape . In other words, the FMNet framework [19] aims to learn a transformation of descriptors, so that the transformed descriptors , , when used within the functional map pipeline result in a soft map that is as close as possible to some known ground truth correspondence. Unlike methods based on formulating shape matching as a labeling problem this approach evaluates the quality of the entire map, obtained using the transformed descriptors, which as shown in [19] leads to significant improvement compared to several strong baselines.

Motivation

Similarly to other supervised learning methods, although FMNet [19] can result in highly accurate correspondences in the presence of sufficient training data, its applicability is limited to shape classes for which high-quality ground truth maps are available. Moreover, perhaps less crucially, the soft map loss in FMNet is based on the knowledge of geodesic distances between all pairs of points, which makes it computationally expensive. Our goal, therefore, is to show that a similar approach can be used more widely, without any training data, while also leading to a more efficient and scalable framework.

4 Our method

4.1 Overview

In this paper, we propose to use a neural network in order to optimize for non-linear transformations of descriptors, in order to obtain high-quality functional, and thus pointwise maps. For this, we follow the same general strategy proposed in the FMNet approach [19]. However, crucially, rather than penalizing the deviation of the computed map from some known ground truth correspondence, we evaluate the structural properties of the inferred functional maps, such as their bijectivity or orthogonality. Importantly, we express all these desired properties, and thus the penalties during optimization, purely in the spectral domain, which allows us to avoid the conversion of functional maps to soft maps during optimization as was done in [19]. Thus, in addition to being purely unsupervised our approach is also more efficient since it does not require pre-computation of geodesic distance matrices or expensive manipulation of large soft map matrices during training.

To achieve these goals, we modify the FMnet problem, described in Eq. (3) and (4) in several ways: first, we propose to consider functional maps in both directions, i.e. by treating the two shapes as both source and target, second, we remove the conversion step from functional to soft maps, and, most importantly, third, we replace the soft map loss with respect to ground truth with a set of penalties on the computed functional maps, which are described in detail below. This means that the optimization problem we aim to solve can be written as:

(5)
(6)
(7)

Here, similarly to Eq. (3) above, denotes a non-linear transformation in the form of a neural network, is a set of pairs of shapes in a given collection, are scalar weights, and are the penalties, described below. In other words, we aim to optimize for a non-linear transformation of some descriptor functions, such that functional maps computed from transformed descriptors, possess certain desirable structural properties, expressed via penalty minimization.

When deriving the penalties used in our approach, we exploit the links between properties of functional maps and associated pointwise maps, that have been established in several previous works [26, 34, 12, 25]. Unlike all these methods, however, we decouple the descriptor preservation constraints from structural map properties. This allows us to optimize for descriptor functions, and thus, gain very strong resilience in the presence of noisy or uninformative descriptors, while still exploiting the compactness and efficiency of the functional map representation.

4.2 Penalties

In our work we propose to use four penalties, all inspired by desirable map properties.

Bijectivity

Given a pair of shapes and the functional maps in both directions, perhaps the simplest requirement is for them to be inverses of each other, which can be enforced by penalizing the difference between their composition and the identity. This penalty, used for functional map estimation in [12], can be written, simply as:

(8)

Orthogonality

As observed in several works [26, 34] a point-to-point map is locally area preserving if and only if the corresponding functional map is orthonormal. Thus, for shape pairs, approximately satisfying this assumption, a natural penalty to incorporate in our unsupervised pipeline:

(9)

Laplacian commutativity

Similarly, it is well-known that a pointwise map is an intrinsic isometry if and only the associated functional map commutes with the Laplace-Beltrami operator [33, 26]. This has motivated using the lack of commutativity as a regularizer for functional map computations, as mentioned in Eq. (2). In our work, we use it to introduce the following penalty:

(10)

where and are diagonal matrices of the Laplace-Beltrami eigenvalues on the two shapes.

Descriptor preservation via commutativity

The previous three penalties express desirable properties of pointwise correspondences when expressed as functional maps. However, since the space of functional maps is larger that that of pointwise ones, in practice, we would like to penalize functional maps do not arise from any point-to-point correspondences. One approach for this has been proposed in [25], where the authors argued that preserving descriptors as linear operators acting on functions through multiplication, both allows to extract more information from given descriptor functions and results in functional maps that are more likely to arise from point-to-point ones.

Following [25], we incorporate this penalty into our approach via commutativity of the functional map with the multiplicative operators, which can be expressed as follows:

(11)

Here and are the optimized descriptors on source and target shape, obtained by the neural network, and expressed in the full (hat basis), whereas are the fixed basis functions on the two shapes, and denotes the Moore-Penrose pseudoinverse.

4.3 Optimization

As mentioned in Section 4.1, we incorporate these four penalties into the energy in Eq. (5). Importantly, the only unknowns in this optimization are the parameters of the neural network applied to the descriptor functions. The functional maps and are fully determined by the optimized descriptors via the solution of the corresponding linear systems in Eq. (6) and Eq. (7), and are thus differentiable with respect to the neural network parameters. Moreover, importantly, all of the penalties are differentiable with respect to the functional maps . This means that the total energy and thus its gradient can be back-propagated to the neural network in Eq. (5), allowing us to optimize for the descriptors while penalizing the structural properties of the functional maps.

5 Implementation & Parameters

Implementation details

We implemented our method in Tensorflow

[1] by adapting the open-source implementation of FMNet [19]. Thus, the neural network used for transforming descriptors in our approach, in Eq. (5) is exactly identical to that used in FMNet, as mentioned in Eq. (3). Namely, this network is based on a residual architecture, consisting of 7 fully connected residual layers with exponential linear units, without dimensionality reduction. Please see Section 5 in [19] for more details.

Following the approach of FMNet [19], we also sub-sample a random set of 1500 points at each training step, for efficiency. However, unlike their method, sub-sampling is done independently on each shape, without enforcing consistency. We also randomly sub-sample 20% of the descriptors to enforce our penalty at each training step, to avoid manipulating a large set of operators. Note that this sub-sampling is random at each step and different optimized descriptors are used in throughout optimization. We observed that this sub-sampling not only helps to gain speed but also robustness during optimization. Note also that we do not form large diagonal matrices explicitly, but rather define the multiplicative operators in objective

directly via pointwise products and summation using contraction between tensors. Finally, we also tested two approaches for functional map conversion: either using the soft-map approach of FMNet

[19] or via standard KD-tree method in the spectral domain [26], and we report the results with both methods in the ablation study below.

Parameters

Our method has two key parameters: the input descriptors, and the scalar weights in Eq. 5. In all experiments below we used the same SHOT [38] descriptors as in FMNet [19]

with the same parameters, which lead to 352-dimensional vector per point, or equivalently, 352 descriptor functions on each shape.

For the scalar weights, , we used the same four fixed values for all experiments below (namely, , , and ), which were obtained by examining the relative penalty values obtained throughout the optimization on a small set of shapes, and setting the weights inversely proportionally to those values.

6 Results


Methods
Mean Geodesic Error
with KDTree with Softmap
FMNet [19] 0.018 0.025
E1+E2+E3+E4 0.027 0.044
E3 0.073 0.073
E1+E2+E3 0.079 0.081
E1+E3+E4 0.082 0.077
E1 0.083 0.111
E2+E3+E4 0.087 0.079
E1+E2+E4 0.138 0.126
E2 0.152 0.135
E4 0.252 0.330
Ours optimized all 0.009 0.017
Table 1: Ablation study of the different penalty terms in our method and comparison with the supervised FMNet approach on the FAUST shape matching benchmark.

Ablation study

We first evaluated our approach by comparing it to the baseline FMNet [19] method on the FAUST shape dataset [5], while also evaluating the relative importance of the different penalties in our method.

The FAUST dataset consists of 100 human shapes in different poses with known correspondences between them. In our first evaluation we split this dataset into training and test set containing 80 and 20 shapes respectively, as done in [19]. We used the training set to train the FMNet architecture using the ground truth correspondences. We used the same set in our unsupervised method to optimize for the non-linear descriptor transformation. We stress that unlike FMNet, our method is purely unsupervised and the “training set” was only used for descriptor optimization with the functional map penalties introduced above. We then applied the optimized network to the shapes in the test set and evaluated the average correspondence error obtained by different methods, with respect to the ground truth maps. Note that for fairness of comparison, we did not refine the computed maps with any post-processing techniques.

Table 1 summarizes the quality of the computed correspondences between shapes in the test set, using FMNet [19] and our approach when using different combination of penalties, and when using conversion to pointwise maps with both the soft-map approach used in [19] and using nearest neighbor search in the spectral domain using a KDTree [26]. We can make several observations: first, KDTree conversion gives, in most cases, better accuracy than the soft map one; second, the combination of all four penalties significantly out-performs any other subset, and comes close to achieving the accuracy obtained with the supervised FMNet; third, among individual penalties used independently, the Laplacian commutativity gives the best result. In the last row of Table 1 we also show the result of our method using all four penalties, while optimizing the neural network on pairs taken from all 100 shapes in the FAUST dataset, and testing on the same subset containing only the last 20. Note that in our case, unlike FMNet, this is reasonable, since we never use ground truth correspondences during optimization. As can be seen, our method, when optimized on all shapes gives superior performance even compared to FMNet, despite being purely unsupervised. Figure 1 shows qualitative comparison of correspondences obtained by different methods.

(a) FMnet
(b) , as only penalty
(c) Ours optimized on all
(d) Ours optimized on subset
Figure 1: We compare matches on meshes from FAUST with 6890 vertices and see how adding penalties compare to having all of them, as well as how training on more shapes improves matching.

Datasets

We evaluated our method on the following datasets: the original FAUST dataset [5] containing 100 human shapes in 1-1 correspondence and two datasets obtained by independently remeshing each shape in the FAUST and SCAPE [5, 3] shape collections, to approximately 5000 vertices, using the LRVD remeshing method [45]. This algorithm results in a triangle mesh adapted to the structure of each shape, which means that different meshes are no longer in 1-1 correspondence, and indeed can have different number of vertices. The resulting remeshed datasets therefore offer significantly more variability in terms of shape structure, including e.g. point sampling density, compared to the original ones, making them more challenging for existing algorithms. Let us note also that the SCAPE dataset is slightly more challenging since the shapes are less regular (e.g., there are often reconstruction artefacts on hands and feet) and have fewer features than those in FAUST.

Figure 2: Example pair of shapes from the remeshed FAUST dataset. Note the significant changes in point sampling density in various shape regions.

We stress that although we also evaluated on the original FAUST dataset, we view the remeshed datasets as both more realistic and provide a more faithful representation of the accuracy and generalization power of different techniques. Figure 2 shows an example of a shape pair from the remeshed FAUST dataset. For reference, we also include illustrations of shapes from these datasets in the supplementary material.

Baselines

We compared our method to several techniques, both supervised and fully automatic. In the former category we tested the original FMNet approach [19] and the Geodesic Convolutional Neural Networks (GCNN) method of [22] based on local shape parameterization. Both of these techniques assume, as input, ground truth maps between a subset of the training shapes. For supervised methods we always split the datasets into 80 (resp. 60) shapes for training and 20 (resp. 10) for testing in the FAUST and SCAPE datasets respectively. Among unsupervised methods we used the Product Manifold Filter method with the Gaussian kernel [41] (PMF Gauss) and its variant with the Heat kernel [40] (PMF Heat). Note that FMNet has further been compared and shown to outperform a large number of other baseline methods in [19].

Finally, we also evaluated the basic functional map approach, based on directly optimizing the functional maps as outlined in Section 3.1, but using all four of our energies for regularization. This method, which we call “Fmap basic” can be viewed as a combination of the approaches of [12] and [24], as it incorporates functional map coupling (via energy ) and descriptor commutativity (via ). Unlike our technique, however, it does not optimize the descriptor functions, and uses descriptor preservation constraints with the original, noisy descriptors.

For fairness of comparison, we used SHOT descriptors [38] as input to all methods that we tested, both supervised and unsupervised. Moreover, we did not apply any post-processing to the results obtained by any method, except PMF Gauss and PMF Heat, which are, by nature, iterative refinement algorithms. Therefore, the results that we obtained can likely be improved further using existing map refinement techniques.

Evaluation and Results

Methods Geodesic Error
Mean Percentile
FMNet with KDTree [19] 0.018 0.045
FMNet with Softmap [19] 0.025 0.064
Ours optimized on subset 0.027 0.048
Ours optimized all 0.009 0.028
PMF (Gaussian Kernel)[41] 0.029 0.079
PMF (Heat Kernel)[40] 0.017 0.024
Table 2: Geodesic errors of different methods obtained on the FAUST original dataset with 6890 vertices.
Methods Geodesic Error
Mean Percentile
FMNet [19] 0.171 0.771
Ours optimized on subset 0.112 0.686
Ours optimized all 0.020 0.052
GCNN [22] 0.051 0.194
Fmap basic [12, 24] 0.388 0.757
PMF (Gaussian Kernel) [41] 0.039 0.126
PMF (Heat Kernel) [40] 0.038 0.112
Table 3: Geodesic errors of different methods obtained on the remeshed FAUST dataset.
Methods Geodesic Error
Mean Percentile
FMNet [19] 0.218 0.825
Ours optimized on subset 0.139 0.737
Ours optimized all 0.023 0.050
GCNN [22] 0.074 0.395
Fmap basic [12, 24] 0.739 1.221
PMF (Gaussian Kernel) [41] 0.073 0.198
PMF (Heat Kernel) [40] 0.069 0.186
Table 4: Geodesic errors of different methods obtained on the remeshed SCAPE dataset.
Source
Ground-Truth
FMnet
PMF (heat)
PMF (gauss)
Ours on subset
Ours all shapes
Figure 3: Comparison of our method with texture transfer on shapes from the SCAPE remeshed dataset.
Figure 4: Point-to-Point correspondences plot comparing methods trained and tested on FAUST original dataset.

Tables 2, 3 and 4 summarize the accuracy obtained by different methods on the three datasets. Note that in all cases, our method, when optimized on all shapes gives the best accuracy and the gap compared to other methods is especially prominent on the remeshed FAUST and remeshed SCAPE datasets. Remarkably, our method outperforms even supervised learning techniques, GCNN [6] and FMNet [19] despite being purely unsupervised.

We also plot the error rates of different methods in Figures 4, 6, and 5. Note that on the original FAUST dataset, the results of PMF Heat as shown in Figure 4 start at 70% perfect correspondences and contain mostly low-error matches. This is greatly facilitated by the consistent sampling in the dataset, an assumption exploited by PMF which aims to find pointwise bijective maps. However, this method also leads to correspondences with very high error, which leads to slow saturation at 100% and is reason why the average error for this method as reported in Table 2 is higher than that of ours.

Remark that the remeshed datasets are significantly harder for both supervised and unsupervised methods, since the shapes are no longer identically meshed and in 1-1 correspondence. We have observed this difficulty also while training supervised FMNet and GCNN techniques with very slow error convergence during training. On both of these datasets, our approach achieves the lowest average error, shown in Tables 3 and 4. Note that on the remeshed FAUST dataset, as shown in Fig. 6 only GCNN [6] produces a similarly large fraction of correspondences with small error. However, this method is supervised, and moreover still results in significantly higher average error than our approach on this dataset, primarily due to strong outliers. On the remeshed SCAPE dataset, summarized in Table 3 and Figure 5 our method leads to the best results across all measures. We find this especially remarkable since our method is both unsupervised and no post-processing was applied to the computed correspondences.

Figure 8 shows an example of a pair of shapes and maps obtained between them using different methods visualized using texture transfer. Note the continuity and quality of the map obtained using our method.

Figure 5: Point-to-Point correspondences plot comparison for methods trained and tested on SCAPE remeshed dataset.
Figure 6: Point-to-Point correspondences plot comparison for methods trained and tested on FAUST remeshed dataset.

Runtime

One further advantage of our method is its efficiency, since we do not rely on the computation of geodesic matrices and operate entirely in the spectral domain. FMnet [19]

uses pairwise geodesic distance matrices for enforcing the soft map loss, which requires time and memory for preprocessing and during training. For comparison, running one epoch with the same batch size takes 1.1 second using our methods compared to 18.8 with FMNet with an NVIDIA Tesla P100 GPU.

7 Conclusion & Future Work

We have presented an unsupervised learning-based method for computing correspondences between shapes. Key to our approach is a bilevel optimization formulation, aimed to optimize descriptor functions, while penalizing the structural properties of the entire map, obtained via the functional maps framework, from the optimized descriptors. This allows us to achieve high-quality, globally-consistent correspondences without relying on any externally provided ground-truth maps. Remarkably, our approach achieves similar, and in some cases superior performance to even supervised correspondence techniques.

In the future, we plan to incorporate other penalties on functional maps, e.g., those arising from recently-proposed kernalization approaches [42], or for promoting orientation preserving maps [29]. Moreover, it might be beneficial to incorporate cycle consistency constraints [14], going beyond pairwise map consistency used in our method. Finally, it would be interesting to extend our method to partial shapes and to study its performance for non-isometric shape correspondence, and matching other modalities, such as images or point clouds, since it opens the door to linking the properties of local descriptors to global map consistency, expressed through a very general functional framework.

References

  • [1] M. Abadi et al.

    TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems

    Software available from tensorflow.org. 2015.
  • [2] Y. Aflalo, A. Dubrovina, and R. Kimmel. Spectral generalized multi-dimensional scaling. International Journal of Computer Vision, 118(3):380–392, 2016.
  • [3] D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rodgers, and J. Davis. SCAPE: Shape Completion and Animation of People. In ACM Transactions on Graphics (TOG), volume 24, pages 408–416. ACM, 2005.
  • [4] S. Biasotti, A. Cerri, A. Bronstein, and M. Bronstein. Recent trends, applications, and perspectives in 3d shape similarity assessment. In Computer Graphics Forum, volume 35, pages 87–119, 2016.
  • [5] F. Bogo, J. Romero, M. Loper, and M. J. Black. FAUST: Dataset and evaluation for 3D mesh registration. In

    Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)

    , Piscataway, NJ, USA, June 2014. IEEE.
  • [6] D. Boscaini, J. Masci, S. Melzi, M. M. Bronstein, U. Castellani, and P. Vandergheynst. Learning class-specific descriptors for deformable shapes using localized spectral convolutional networks. In Computer Graphics Forum, volume 34, pages 13–23. Wiley Online Library, 2015.
  • [7] D. Boscaini, J. Masci, E. Rodola, and M. M. Bronstein. Learning shape correspondence with anisotropic convolutional neural networks. In Proc. NIPS, pages 3189–3197, 2016.
  • [8] O. Burghard, A. Dieckmann, and R. Klein. Embedding shapes with Green’s functions for global shape matching. Computers & Graphics, 68:1–10, 2017.
  • [9] E. Corman, M. Ovsjanikov, and A. Chambolle. Supervised descriptor learning for non-rigid shape matching. In Proc. ECCV Workshops (NORDIA), 2014.
  • [10] L. Cosmo, E. Rodola, J. Masci, A. Torsello, and M. M. Bronstein. Matching deformable objects in clutter. In 3D Vision (3DV), 2016 Fourth International Conference on, pages 1–10. IEEE, 2016.
  • [11] M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pages 3844–3852, 2016.
  • [12] D. Eynard, E. Rodola, K. Glashoff, and M. M. Bronstein. Coupled functional maps. In 3D Vision (3DV), pages 399–407. IEEE, 2016.
  • [13] D. Ezuz and M. Ben-Chen. Deblurring and denoising of maps between shapes. In Computer Graphics Forum, volume 36, pages 165–174. Wiley Online Library, 2017.
  • [14] Q. Huang, F. Wang, and L. Guibas. Functional map networks for analyzing and exploring large shape collections. ACM Transactions on Graphics (TOG), 33(4):36, 2014.
  • [15] R. Huang and M. Ovsjanikov. Adjoint map representation for shape analysis and matching. In Computer Graphics Forum, volume 36, pages 151–163. Wiley Online Library, 2017.
  • [16] A. Kovnatsky, M. M. Bronstein, X. Bresson, and P. Vandergheynst. Functional correspondence by matrix completion. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 905–914, 2015.
  • [17] A. Kovnatsky, M. M. Bronstein, A. M. Bronstein, K. Glashoff, and R. Kimmel. Coupled quasi-harmonic bases. In Computer Graphics Forum, volume 32, pages 439–448, 2013.
  • [18] A. Kovnatsky, K. Glashoff, and M. M. Bronstein. MADMM: a generic algorithm for non-smooth optimization on manifolds. In Proc. ECCV, pages 680–696. Springer, 2016.
  • [19] O. Litany, T. Remez, E. Rodolà, A. M. Bronstein, and M. M. Bronstein. Deep functional maps: Structured prediction for dense shape correspondence. 2017 IEEE International Conference on Computer Vision (ICCV), pages 5660–5668, 2017.
  • [20] O. Litany, E. Rodolà, A. M. Bronstein, and M. M. Bronstein. Fully spectral partial shape matching. In Computer Graphics Forum, volume 36, pages 247–258. Wiley Online Library, 2017.
  • [21] R. Litman and A. M. Bronstein. Learning spectral descriptors for deformable shape correspondence. IEEE transactions on pattern analysis and machine intelligence, 36(1):171–180, 2014.
  • [22] J. Masci, D. Boscaini, M. Bronstein, and P. Vandergheynst. Geodesic convolutional neural networks on riemannian manifolds. In Proceedings of the IEEE international conference on computer vision workshops, pages 37–45, 2015.
  • [23] F. Monti, D. Boscaini, J. Masci, E. Rodolà, J. Svoboda, and M. M. Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns. In CVPR, pages 5425–5434. IEEE Computer Society, 2017.
  • [24] D. Nogneng, S. Melzi, E. Rodolà, U. Castellani, M. Bronstein, and M. Ovsjanikov. Improved functional mappings via product preservation. In Computer Graphics Forum, volume 37, pages 179–190. Wiley Online Library, 2018.
  • [25] D. Nogneng and M. Ovsjanikov. Informative descriptor preservation via commutativity for shape matching. Computer Graphics Forum, 36(2):259–267, 2017.
  • [26] M. Ovsjanikov, M. Ben-Chen, J. Solomon, A. Butscher, and L. Guibas. Functional Maps: A Flexible Representation of Maps Between Shapes. ACM Transactions on Graphics (TOG), 31(4):30, 2012.
  • [27] M. Ovsjanikov, E. Corman, M. Bronstein, E. Rodolà, M. Ben-Chen, L. Guibas, F. Chazal, and A. Bronstein. Computing and processing correspondences with functional maps. In ACM SIGGRAPH 2017 Courses, SIGGRAPH ’17, pages 5:1–5:62, 2017.
  • [28] A. Poulenard, P. Skraba, and M. Ovsjanikov. Topological function optimization for continuous shape matching. In Computer Graphics Forum, volume 37, pages 13–25. Wiley Online Library, 2018.
  • [29] J. Ren, A. Poulenard, P. Wonka, and M. Ovsjanikov. Continuous and orientation-preserving correspondences via functional maps. ACM Transactions on Graphics (TOG), 37(6), 2018.
  • [30] E. Rodolà, L. Cosmo, M. M. Bronstein, A. Torsello, and D. Cremers. Partial functional correspondence. In Computer Graphics Forum, volume 36, pages 222–236. Wiley Online Library, 2017.
  • [31] E. Rodolà, M. Moeller, and D. Cremers. Point-wise map recovery and refinement from functional correspondence. In Proc. Vision, Modeling and Visualization (VMV), 2015.
  • [32] E. Rodolà, S. Rota Bulo, T. Windheuser, M. Vestner, and D. Cremers. Dense non-rigid shape correspondence using random forests. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4177–4184, 2014.
  • [33] S. Rosenberg. The Laplacian on a Riemannian manifold: an introduction to analysis on manifolds, volume 31. Cambridge University Press, 1997.
  • [34] R. Rustamov, M. Ovsjanikov, O. Azencot, M. Ben-Chen, F. Chazal, and L. Guibas. Map-based exploration of intrinsic shape differences and variability. ACM Trans. Graphics, 32(4):72:1–72:12, July 2013.
  • [35] R. W. Sumner and J. Popović. Deformation transfer for triangle meshes. In ACM Transactions on Graphics (TOG), volume 23, pages 399–405. ACM, 2004.
  • [36] J. Sun, M. Ovsjanikov, and L. Guibas. A Concise and Provably Informative Multi-Scale Signature Based on Heat Diffusion. In Computer graphics forum, volume 28, pages 1383–1392, 2009.
  • [37] G. K. Tam, Z.-Q. Cheng, Y.-K. Lai, F. C. Langbein, Y. Liu, D. Marshall, R. R. Martin, X.-F. Sun, and P. L. Rosin. Registration of 3d point clouds and meshes: a survey from rigid to nonrigid. IEEE transactions on visualization and computer graphics, 19(7):1199–1217, 2013.
  • [38] F. Tombari, S. Salti, and L. Di Stefano. Unique signatures of histograms for local surface description. In International Conference on Computer Vision (ICCV), pages 356–369, 2010.
  • [39] O. Van Kaick, H. Zhang, G. Hamarneh, and D. Cohen-Or. A survey on shape correspondence. In Computer Graphics Forum, volume 30, pages 1681–1707, 2011.
  • [40] M. Vestner, Z. Lähner, A. Boyarski, O. Litany, R. Slossberg, T. Remez, E. Rodola, A. Bronstein, M. Bronstein, R. Kimmel, and D. Cremers. Efficient deformable shape correspondence via kernel matching. In Proc. 3DV, 2017.
  • [41] M. Vestner, R. Litman, E. Rodolà, A. Bronstein, and D. Cremers.

    Product manifold filter: Non-rigid shape correspondence via kernel density estimation in the product space.

    In Proc. CVPR, pages 6681–6690, 2017.
  • [42] L. Wang, A. Gehre, M. M. Bronstein, and J. Solomon. Kernel functional maps. In Computer Graphics Forum, volume 37, pages 27–36. Wiley Online Library, 2018.
  • [43] Y. Wang, B. Liu, K. Zhou, and Y. Tong. Vector field map representation for near conformal surface correspondence. In Computer Graphics Forum, volume 37, pages 72–83. Wiley Online Library, 2018.
  • [44] L. Wei, Q. Huang, D. Ceylan, E. Vouga, and H. Li. Dense human body correspondences using convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1544–1553, 2016.
  • [45] D.-M. Yan, G. Bao, X. Zhang, and P. Wonka. Low-resolution remeshing using the localized restricted voronoi diagram. IEEE transactions on visualization and computer graphics, 20(10):1418–1427, 2014.

8 Appendix

Source
Ground-Truth
FMnet
PMF (heat)
PMF (gauss)
Ours on subset
Ours all shapes
Figure 7: Comparison of our method with texture transfer on shapes from the FAUST remeshed dataset.
Source
Ground-Truth
FMnet
PMF (heat)
PMF (gauss)
Ours on subset
Ours all shapes
Figure 8: Comparison of our method with texture transfer on shapes from the SCAPE remeshed dataset.