Convolution is one of the most widely used operators in applied mathematics, computer science and engineering. It is also the most important building block of Convolutional Neural Netowrks (CNNs) which are the main driven force in the recent success of deep learninglecun2015deep ; goodfellow2016deep .
In the Euclidean space , the convolution of function with a kernel (or filter) is defined as
This operation can be easily calculated in Euclidean spaces due to the shift-invariance of the space so that the translates of the filter , i.e. is naturally defined. With the rapid development of data science, more and more non-Euclidean data emerged in various fields including network data from social science, 3D shapes from medical images and computer graphics, data from recommending systems, etc. Therefore, geometric deep learning, i.e. deep learning of non-Euclidean data, is now a rapidly growing branch of deep learning 7974879 . In this paper, we will discuss how we can generalize the definition of convolution to (manifold-structured) point clouds in a way that it inherits desirable properties of the planar convolution, thus it enables to design convolutional neural networks on point clouds.
One of the main challenges of defining convolution on manifolds and point clouds (a discrete form of manifolds) is to define translation on the non-Euclidean domain. Other than convolutions, we also need to properly define pooling to enable networks to extract global features and to save memory during training. Multiple types of generalization of convolutions on manifolds, graphs and point clouds have been proposed in recent years. We shall recall some of them and discuss the relation between existing definitions of convolutions and the proposed narrow-band parallel transport convolution.
1.1 Related Work of Convolutions on Non-Euclidean Domains
Spectral methods avoid the direct definition of translation by utilizing the convolution theorem: for any two functions and , . Therefore, we have , where and
represent generalized Fourier transform and inverse Fourier transform provided through the associated Laplace-Beltrami (LB) eigensystem on manifolds. To avoid computing convolution through full eigenvalue decomposition, polynomial approximation is proposed and yields convolution as action of polynomials of the LB operatorHAMMOND2011129 ; DONG2017452 . Thus, convolutional neural networks can be designed bruna2014spectral ; NIPS2016_6081 ; 8521593 . Spectral methods, however, suffer two major drawbacks. First, these methods define convolution in the frequency domain. As a result, the learned filters are not spatially localized. Secondly, spectral convolutions are domain-dependent as deformation of the ground manifold will change the corresponding LB eigensystem. This obstructs the use of learning networks from one training domain to a new testing domain DBLP:journals/corr/abs-1805-07857 .
Spatial mesh-based methods are more intuitive and similar to the Euclidean case, and this is one of the reasons why most of the existing works fall into this category. The philosophy behind these methods is that the tangent plane of a -dimensional manifold is embedded to a -dimensional Euclidean domain where convolution can be easily defined. In this paper, we make the first attempt to interpret some of the existing mesh-based methods in a unified framework. We claim that most of the spatial mesh-based methods can be formulated as
Here, is a convolution kernel and with being the size of the kernel. The mapping is defined as
where , . For simplicity, we will denote . Most of the designs of the existing manifold convolutions focused on the designs of
. We remark that possible singularities will lead to no convolution operation at those points. These are isolated points on a closed manifold and do not effect experiment results. In addition, singularities from a given vector field can be overcome using several pairs of vector fields and poolingDBLP:journals/corr/abs-1805-07857 .
is a local interpolation function with interpolation domain an isotropic disc for GCNN and an anisotropic ellipse for ACNN. A local geodesic polar coordinate system on a manifold can also be transformed to a 2-dimensional planar coordinate system on its tangent plane. Such transformation is the mappingwhich is defined by the inverse exponential map: with being a point in the local geodesic polar coordinate system at with coordinates . With this, we can easily interpret ACNN within the framework of (2). Indeed, ACNN essentially chooses as the directions of the principal curvature at point
. For GCNN, on the other hand, it avoids choosing a specific vector field on the manifold by taking max-pooling among all possible directions ofat each point. Such definition of convolution, however, ignores the correspondence of the convolution kernels at different locations.
The newly proposed PTC DBLP:journals/corr/abs-1805-07857 defines convolution directly on the manifold, while uses tangent planes to transport kernels by a properly chosen parallel transport. PTC can be equivalently cast into the form of (2) using the inverse exponential map, and implementation of the proposed parallel transported is realized through choosing specific vector fields guided by a Eikonal equation for transforming vectors along geodesic curves on manifolds.
Spatial point-based methods have wider applications due to their weaker assumptions on the data structure, a point cloud consists of points in a -dimensional Euclidean space with the coordinates of the points as the only available information. Manifolds can be approximated by point clouds via sampling. Computing -nearest neighborhoods of or neighbors within a fixed radius can easily convert a point cloud to a local graph or mesh. This is why point cloud is simple, flexible and attracts much attentions lately.
There are mainly two types of point-based convolution. The first type is to combine the information of points directly. These methods can be formulated as
where is a neighborhood of and kernel takes different forms in different methods. PointNet Qi_2017_CVPR is an early attempt to extract features on point cloud. PointNet is a network structure without convolution, or alternatively we can interpret the convolution defined by PointNet has the simplest kernel where is the Kronecker-Delta. Various later works attempt to improve PointNet by choosing different forms of the kernel . For example, PointNet++ NIPS2017_7095 introduces a max pooling among local points, i.e. choosing kernel as an indicator function: . Pointcnn NIPS2018_7362 chooses where and are trainable variables with . DGCNN DBLP:journals/corr/abs-1801-07829 proposes an "edge convolution" that can be viewed as fixing and MLP
, where MLP means the Multi-Layer Perceptron.
The second type of convolution is defined by first projecting the point cloud locally on an Euclidean domain and then employ regular convolution. This type of methods can also be formulate as (2). For example, Tangent convolutions Tatarchenko_2018_CVPR define kernels on the tangent plane, and use 2 principal directions of a local PCA as . Pointconv DBLP:journals/corr/abs-1811-07246 constructs local kernels by interpolation in , i.e. letting which is essentially a local Euclidean convolution.
1.2 The Proposed Convolution: NPTC
In this paper, we propose Narrow-Band Parallel Transport Convolution (NPTC), which is a geometric convolution based on point cloud discretization of a manifold parallel transport defined in a specific way. As we observed in the previous section, convolutions in many methods can be written in the form of (2) and (3), while the differences mostly lie in the choices of the vector field . As observed by DBLP:journals/corr/abs-1805-07857 that choosing the vector field properly, the associated convolution can be interpreted as parallel transporting the kernels using the parallel transport associated to the prescribed vector field. For NPTC, we choose as the projections of the tangent directions of a narrow-band-based distance propagation. Both the PTC DBLP:journals/corr/abs-1805-07857 and NPTC define geometric convolutions that can be viewed as translating kernels on the manifold in a parallel fashion. The main difference of NPTC from PTC is how the vector field is defined which leads to a different parallel transport convolution associated to a different connection. The formal definition of parallel transport and connection will presented in Section 2 and detailed descriptions on NPTC will be given in Section 3.
Compared with methods that translate kernels directly in Euclidean space, NPTC translates kernels on the tangent planes which effectively avoids having convolution kernels defined away from the underlying manifold of the point cloud. In other words, NPTC can well reflect point cloud geometry and is a natural generalization of planar convolution in the sense that when the point cloud reduces to planar grids, the NPTC reduces to the planar convolution.
We introduce a new point cloud convolution, NPTC, based on parallel transport defined by a narrow-band approximation of the point cloud. The proposed NPTC is a natural generalization of planar convolution.
Based on NPTC, we designed convolutional neural networks, called NPTC-net, for point clouds classification and segmentation with state-of-the-art performance.
2.1 Manifold and Parallel Transport
Let be a dimensional differential manifold embed in and be the tangent plane at point . can be defined as the -dimensional linear space formed by the span of tangent vectors. The disjoint union of the tangent planes at each point on the manifold defines the tangent bundle . A vector field X is a smooth assignment: . Collection of all smooth vector fields is denoted as .
An affine connection is a bilinear mapping : , such that for all smooth functions and in and all vector fields and on :
A section of a vector field is called parallel along a curve if for . Suppose we are given an vector at . The parallel transport of along is the extension of to a parallel section X on . More precisely, X is the unique section of along
satisfying the ordinary differential equationwith the initial value .
In differential geometry, a geodesic is a curve representing in some sense the shortest path between two points on a surface, or more generally on a Riemannian manifold. It is a generalization of the notion of a "straight line". Formally, a geodesic is if .
For any two points and on , there will be a geodesic connecting and . A geodesic on a smooth manifold with an affine connection is a curve such that parallel transport along the curve preserves the tangent vector to the curve. That is to say, all tangential directions of one geodesic form a parallel vector field.
For convenience, instead of transporting the kernel on the manifold, we can locally construct the parallel transported kernel at every point by formulating as , where is defined in (3). It is known in differential geometry that transporting the kernel to every point on the manifold is the same as locally reconstructing the kernel in the aforementioned way.
2.2 The Eikonal Equation
The proposed NPTC relies on the computation of a distance function on a narrow-band voxel-based approximation of the given point cloud. Therefore, we review how a distance functions are calculated.
Distance functions can be easliy computed by solving the Eikonal equation. The Eikonal equation is a non-linear partial differential equation describing wave propagation:
where and is a strictly positive function. The solution of (6) can be viewed as the shortest time needed to travel from to with being the speed of the wave at . For the special case when , the solution represents the distance from to limited in the . The Eikonal equation can be solved by the fast marching method sethian1996fast .
3 Narrowband Parallel Transport Convolution (NPTC) and Network Design
Generalization of convolution defined by parallel transport on triangulated surfaces has already been proposed in DBLP:journals/corr/abs-1805-07857 . In this section, we discuss how to transport kernels on point clouds in a similar fashion.
3.1 Narrowband Parallel Transport Convolution (NPTC)
Given a function , the NPTC of with kernel takes the same form as (2). Under such formulation, the key to design a convolution is to design vector fields , . In this subsection, we discuss the general idea of NPTC and the interpretation of it in terms of parallel transport.
3.1.1 General Idea of NPTC
To select a suitable vector field, we first recall the choice of the vector fields of PTC which defines convolution on triangulated surfaces via parallel transport with respect to the Levi-Civita connection DBLP:journals/corr/abs-1805-07857 . Geodesic curve represents, in some sense, the shortest path between two points on a Riemannian manifold. Given a geodesic connecting two points and , the tangential direction at corresponds to the ascend direction of geodesic distance from . PTC chooses such direction as and defines with the normal vector at .
If we want to construct a vector field on point cloud, gradients of some sort of distance function can be a good choice. However, unlike triangulated surfaces, distance function is not easily defined on point clouds due to the lack of connectivity. It is then natural to approximate the point cloud with another data structure with connectivity, so that distance function can be easily calculated. We use voxelization Wu_2015_CVPR to approximate the point cloud in a narrow-band in covering the point cloud. We denote such distance function as . We will elaborate how can be calculated in later parts of this subsection.
Note that if the point cloud is sampled from a plane, the narrow-band is flat as well. Then, by a proper choice of the distance function, the vector fields , can be reduced to the global coordinate on the plane. This means that NPTC is reduced to the traditional planar convolution.
Once the distance function is computed, we choose , where is a projection of on an approximated tangent plane at . Then, can be calculated by the outer product with the normal vector at . The value is computed by where is the closest point to . Note that one may use a more sophisticated method to compute rather than using the closest point interpolation. We choose the closest point interpolation because of its simplicity.
We finally note that NPTC defined by the aforementioned way indeed defines certain parallel transport convolution. In fact, by given smooth vector fields
, we can define linear transformation among tangent planes, then the corresponding parallel transport through the associated infinitesimal connection can be induced knebelman1951spaces . Therefore, convolution defined by NPTC can be viewed as parallel transporting the kernels on the manifold with respect to the connection that is reconstructed from the vector field .
3.1.2 Computing Distance Function on Point Clouds
For simplicity, we consider point clouds in in this subsection, though the arguments is also valid in . A point cloud is entirely discrete without inherent connectivity. Therefore, it is not straightforwardly compute distance function on point clouds although the local mesh method lai2013local can be applied to solve the Eikonal equation. For simplicity, we use voxels to approximate point clouds and to compute distance functions on the voxels using the well-known fast marching method based on regular grid provided by the voxelization sethian1996fast . Note that, using voxels to compute distance functions is fast and robust to noise and local deformations.
The solution of the Eikonal equation presents the distance form to limited inside the narrow-band. Here is chosen as certain point on the point cloud.
Although generating multiple vector fields by selecting different starting points is helpful to eliminate singularities, experiments show that the directly selecting one point as the initial point already provides satisfactory results. Finally, we interpolate the distance function from the voxels to point cloud.
3.1.3 Computing the Vector Fields on Point Clouds
We first compute the tangent plane on each point. Tangent planes are important features of manifolds and have been well-studied in the literature lai2013localusing the covariance matrix
where is the set of neighboring points of
. The eigenvectors of the covariance matrix form an orthogonal basis. If the point cloud is sampled from a two dimensional manifold, and the local sampling is dense enough to resolve local features, the eigenvectors corresponding to the largest two eigenvalues provide the two orthogonal directions of the tangent plane, and the remaining vector represents the normal direction at. Here, we denote the space spanned by the two eigenvectors of the covariance matrix at as .
With the computed distance function , it is nature to define the vector field by projecting on the approximated tangent planes of the point cloud. Given a point close enough to , we have
where and are known. If we consider -nearest neighbors of , we have equations with 3 unknowns that are the three components of . We can use least squares to find . We then project the vector onto the tangent plane at . We denote the projected vector , which is the vector we eventually need to define NPTC as described in Section 3.1.1.
3.2 NPTC-net: Architecture Design for Classification and Segmentation
This section, we present how to use NPTC to design convolutional neural networks on point clouds for classification and segmentation tasks. For that, other than the NPTC, we need to define some other operations that are frequently used in neural networks. Note that, some point-wise operations like MLP and ReLu are the same on point cloud as the Euclidean case. Here, we only focus on the operations that are not readily defined on point clouds.
Down-sampling: In our implement, the sub-sampled set of points of the next layer is generated by the farthest point sampling eldar1997farthest .
Convolution layer: Our -th convolution layer takes points and their corresponding feature maps as input, where is the number of the points and is the number of channels at layer . The corresponding output is living on the points . The NPTC-net have encoding and decoding stages. Normally, during encoding and during decoding. Convolution at the -th layer is only performed on the point set
, which resembles convolution with stridefor planar convolutions.
Residual block: One residual block takes the feature maps on the point set as input and same number of points and same number of channels of features as output. One residual layer consists of three components: MLP from channels to channels, convolution layer from channels to channels, MLP from channels to channels plus the feature maps from the bypass connection. A residual block consists of several residual layers.
NPTC-net consists of the aforementioned operations and its architecture is given by Figure 3
. The left half of the NPTC-net is the encoder part of the network for feature extraction. For classification, features at the bottom of the network are directly attached to a classification network; while for segmentation, features are decoded using the right half of the NPTC-net (decoder part of the network) to output the segmentation map.
. We implement the model with Tensorflow using SGD optimizer with an initial learning ratefor Modelnet40 and ADAM optimizer with an initial learning rate for Shapenet Part on a GTX TITAN Xp GPU. For each experiment, we report the max and average final test accuracy of 3 runs. The average accuracy is put in the brackets.
|ours||92.7 (92.6)||90.2 (89.7)||85.8 (85.6)||83.3 (82.9)|
4.1 Classification on ModelNet40
We test the NPTC-net on ModelNet40 for classification tasks. ModelNet40 contains 12,311 CAD models from 40 categories with 9,842 samples for training and 2,468 samples for testing. For comparison, we use the data provided by Qi_2017_CVPR sampling 2,048 points uniformly and computing the normal vectors from the mesh. During the training procedure, the data is augmented by random rotation, scaling and Gaussian perturbation on the coordinates of the points.
As shown on Tabel 1, our networks outperform other state-of-art methods. (If a compared method has results on both 2048 (or 1024) and 5000 points, we only compare with the former.).
|method||pointnet Qi_2017_CVPR||pointnet++ NIPS2017_7095||DGCNN DBLP:journals/corr/abs-1801-07829||pointcnn NIPS2018_7362||ours|
As shown in Table 2, we summarize our running statistics based with model for Modelnet40 with batch size 16. In comparison with several other methods, although we use ResNet structure, the fewer channels, smaller kernels and simpler interpolation(nearest neighboring) make NPTC use similiar parameters and even fewer FLOPs.
4.2 Part Segmentation on ShapeNet
We evaluate the NPTC-net on ShapeNet Part for segmentation tasks. ShapeNet Part contains 16,680 models from 16 shape categories with 14,006 for training and 2,874 for testing, each annotated with 2 to 6 parts and there are 50 different parts in total. During the training procedure, the data is augmented by scaling and Gaussian perturbation on the coordinates of the points.
We follow the experiment setup of previous works, putting object category into networks as known information. We use point intersection-over-union (IoU) to evaluate our NPTC-net. Table 1 shows that our model ranks second on this dataset and is fairly close to the best known result.
To visualize the effects of the proposed NPTC in the NPTC-net, we trained the network on ShapeNet Part and visualize learned features by coloring the points according to their level of activation. In Figure 4, filters from the the first Convolution layer in the the first Residual block and final Convolution layer in the second Residual block are chosen. In order to easily compare the features at different levels, we interpolate them on the input point cloud. Observe that low-level features mostly represent simple structures like edges (top of (a)) and planes (bottom of (a)) with low variation in their magnitudes. In deeper layers, features are richer and more distinct from each other, like bottleneck (upper left of (b)), "big-head"(upper right of (b)), plane base (lower left of (b)), bulge (lower right of (b)).
This paper proposed a new way of defining convolution on point clouds, called the narrow-band parallel transport convolution (NPTC), based on a point cloud discretization of a manifold parallel transport. The parallel transport was defined specifically by a vector field generated by the gradient field of a distance function on a narrow-band approximation of the point cloud. The NPTC was used to design a convolutional neural network (NPTC-net) for point cloud classification and segmentation. Comparisons with state-of-the-art methods indicated that the proposed NPTC-net is competitive with the best existing methods.
- (1) Stefan C. Schonsheck, Bin Dong, and Rongjie Lai. Parallel transport convolution: A new tool for convolutional neural networks on manifolds. CoRR, abs/1805.07857, 2018.
- (2) Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436, 2015.
- (3) Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
- (4) M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst. Geometric deep learning: Going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18–42, July 2017.
- (5) David K. Hammond, Pierre Vandergheynst, and Rémi Gribonval. Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis, 30(2):129 – 150, 2011.
- (6) Bin Dong. Sparse representation on graphs by tight wavelet frames and applications. Applied and Computational Harmonic Analysis, 42(3):452 – 479, 2017.
- (7) Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Lecun. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations (ICLR2014), CBLS, April 2014, 2014.
- (8) Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 3844–3852. Curran Associates, Inc., 2016.
- (9) R. Levie, F. Monti, X. Bresson, and M. M. Bronstein. Cayleynets: Graph convolutional neural networks with complex rational spectral filters. IEEE Transactions on Signal Processing, 67(1):97–109, Jan 2019.
Jonathan Masci, Davide Boscaini, Michael M. Bronstein, and Pierre
Geodesic convolutional neural networks on riemannian manifolds.
The IEEE International Conference on Computer Vision (ICCV) Workshops, December 2015.
- (11) Davide Boscaini, Jonathan Masci, Emanuele Rodolà, and Michael Bronstein. Learning shape correspondence with anisotropic convolutional neural networks. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 3189–3197. Curran Associates, Inc., 2016.
Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas.
Pointnet: Deep learning on point sets for 3d classification and
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
- (13) Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5099–5108. Curran Associates, Inc., 2017.
- (14) Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. Pointcnn: Convolution on x-transformed points. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 820–830. Curran Associates, Inc., 2018.
- (15) Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. Dynamic graph CNN for learning on point clouds. CoRR, abs/1801.07829, 2018.
- (16) Maxim Tatarchenko, Jaesik Park, Vladlen Koltun, and Qian-Yi Zhou. Tangent convolutions for dense prediction in 3d. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
- (17) Wenxuan Wu, Zhongang Qi, and Fuxin Li. Pointconv: Deep convolutional networks on 3d point clouds. CoRR, abs/1811.07246, 2018.
- (18) James A Sethian. A fast marching level set method for monotonically advancing fronts. Proceedings of the National Academy of Sciences, 93(4):1591–1595, 1996.
- (19) Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
- (20) MS Knebelman. Spaces of relative parallelism. Annals of Mathematics, pages 387–399, 1951.
- (21) Rongjie Lai, Jiang Liang, and Hongkai Zhao. A local mesh method for solving pdes on point clouds. Inverse Prob. and Imaging, 7(3):737–755, 2013.
- (22) Yuval Eldar, Michael Lindenbaum, Moshe Porat, and Yehoshua Y Zeevi. The farthest point strategy for progressive image sampling. IEEE Transactions on Image Processing, 6(9):1305–1315, 1997.
- (23) Li Yi, Vladimir G. Kim, Duygu Ceylan, I-Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Sheffer, and Leonidas Guibas. A scalable active framework for region annotation in 3d shape collections. ACM Trans. Graph., 35(6):210:1–210:12, November 2016.
- (24) R. Klokov and V. Lempitsky. Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 863–872, Los Alamitos, CA, USA, oct 2017. IEEE Computer Society.
- (25) Yizhak Ben-Shabat, Michael Lindenbaum, and Anath Fischer. 3d point cloud classification and segmentation using 3d modified fisher vector representation for convolutional neural networks. CoRR, abs/1711.08241, 2017.
- (26) Jiaxin Li, Ben M. Chen, and Gim Hee Lee. So-net: Self-organizing network for point cloud analysis. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
- (27) Chu Wang, Babak Samari, and Kaleem Siddiqi. Local spectral graph convolution for point set feature learning. In The European Conference on Computer Vision (ECCV), September 2018.
- (28) Yifan Xu, Tianqi Fan, Mingye Xu, Long Zeng, and Yu Qiao. Spidercnn: Deep learning on point sets with parameterized convolutional filters. In The European Conference on Computer Vision (ECCV), September 2018.