Analysis and classification of 3D point cloud is an important problem in computer vision and graphics, due to its wide applications in robot manipulation , autonomous driving  etc. The challenge of this problem comes from several aspects. Firstly, the point cloud are sparsely sampled from 3D surfaces in an irregular and off-order way. Secondly, the point cloud usually undergoes large geometric transformations and deformations. It is important to achieve robustness to transformation and permutation for analyzing and classifying 3D point cloud.
aims to represent the irregular 3D point cloud using regular data, in that way they can use classical convolution neural network to process the regular data. Two of the most common regular representations are voxels and multi-view images. However, both these representations have limitations. Dense voxels representation is inefficient due to the sparsity of input point clouds, while multi-view images may lose 3D structures of points and cause occlusion problem.
Another direction focuses on designing convolution operations for irregular points, which are inspired by the prominent success of CNNs on regular grid data, such as audio and images. PointNet 
learns a spatial encoding of each point directly on Euclidean space and aggregates all individual point features by max pooling to obtain a global point cloud signature. Max pooling as a symmetric operation can obtain permutation invariance. By its design, PointNet does not capture local geometry directly which is indispensable to the description of 3D shape. Other works[17, 29] mainly utilize group operations (k-nearest neighbors group or ball region group) to identify local points for convolution. But these group operations only focus on local neighborhood region in Euclidean space. Despite of the discreteness and irregularity of point cloud, these operations mainly account the local structures of each point and are not efficient to capture the holistic geometric information from distant information. The holistic geometry not only provides discriminative cues for classification but also help to achieve robustness to transformation. In addition, points with similar geometric structures can be far away from each other in Euclidean space. Previous works mentioned above largely neglect the geometric relationships among these distant points.
Inspired by the above analysis, this paper proposes Geometry Sharing Network (GS-Net) which aggregates features in both Euclidean space and Eigenvalue space. GS-Net exploits Eigen-Graph to calculate structure tensor for measuring local geometric properties of input points, which further allows us to identify points with similar local structures but located distant in Euclidean space. We prove that these structure tensors are invariant to transformations and yield rich local structural information. As shown in Figure1, given an anchor point for convolution, GS-Net identifies a group of neighbor points from Euclidean space and also another group of points with similar local structures (Eigen-graph features). Then the convolutions are performed for both groups to capture local and holistic geometric representation separately. The convolutional features from both groups are integrated for classification or segmentation. We conduct extensive experiments to examine the proposed methods. Our method achieves the state-of-the-art performance on ModelNet40 for classification (93.3%) and shows more robustness to geometric transformations than previous methods.
The main contributions of this paper are summarized as follows.
We propose a novel Geometry Similarity Connection (GSC) module which exploits Eigen-Graph to group distant points with similar and relevant geometric information and aggregate features from neighbors in both Euclidean space and Eigenvalue space which can capture local and holistic geometric information more efficiently.
We introduce 3D structure tensor and Eigen-Graph to capture the geometric features of points. Theoretically, we prove these features are invariant to translation and rotation.
Our GS-Net achieves the state-of-the-art performances on major datasets, ModelNet40, ShapeNet Part. Moreover, GSC module can be integrated into different existing pipelines for point cloud analysis.
2 Related Work
2.1 Deep Learning on Point Cloud Analysis
Deep neural networks have enjoyed remarkable success for various vision tasks, however it remains challenging to apply CNNs to domains lacking a regular structure such as 3D point cloud. These challenges include: (1) local and holistic geometric information representation; (2) permutation invariance; (3) rotation and translation invariance. However, not all networks can address these problems absolutely.
PointNet  and DeepSet  are pioneering architectures that directly process point cloud. The basic idea is to learn a spatial encoding of each point and then aggregate all individual point features to a holistic signature. But by this design, relations between points are not sufficiently captured. To remedy this, PointNet++  partitions point cloud into overlapping local regions by the distance metric of the underlying space and extracts local features capturing fine geometric structures from neighbors, but it still only considers every point in its local region independently. In our method, we address this issue by defining a convolution block that group the features from the neighbors in Euclidean space and Eigenvalue space.
DGCNN  captures local geometric structure while maintaining permutation invariance and reconstructs the -nn graph using nearest neighbors in the features space produced by each layer. Different with DGCNN, our method does not use dynamic strategy, we apply Eigen-Decomposition to choose the nearest neighbors and share the local features with distant points with similar geometric information.
directly uses eigenvalues and fuctions of eigenvalues as features in deep-learning setting. adds eigenvalues to its shape descriptors. Our method aggregates features from nearest neighbors in Eigenvalue space in order to capture holistic geometric information. And we also uses eigenvalue as features in our network settings.
2.2 Classical Geometric Representation
The local geometry of point cloud is estimated by the distribution of points in the neighborhood. propose a method which aims at finding the optimal neighborhood radius for each point, working directly and exclusively in the 3D domain, without relying on surface descriptors or structures. Firstly, they compute three dimensionality features for each point, between predefined minimal and maximal neighborhood scale. The three dimensionality features  are computed exhaustively, at each point and for each accepted neighborhood scale from local covariance matrix. Various geometrical features can be derived from the eigenvalues of the covariance matrix. describe linear, planar, and scatter respectively. In our GS-Net, we use operations on eigenvalues to improve the robustness of rotation and translation in GS-Net. Moreover, the Eigen-Graph (Sec 3.2) enhances the representation of local geometry.
2.3 Rotation Invariance for Point Cloud Analysis
In comparison to permutation invariance, rotation invariance is a more challenging problem. Previous works has dealt with issues of invariance or equivarance under particular input transformations. PointNet  and PointNet++ 
guarantee the permutation invariance by a symmetric pooling operator and PointNet employs a complex and computationally intensive spatial transformer network to learn 3D alignment, PCPNet also uses a learned transformer block, but these networks (including [25, 29]) do not include rotation invariance.  upgrades the existing neural network with rotation invariance property, a special convolutional operation is designed as a basis block in the network. But it causes the loss of information as there is no bijection between and 2-dimensional sphere. In our method, Eigen-Graph addresses the rotation invariance naturally with Eigen-Decomposition of 3D structure tensor.
As shown in Figure 2, we consider a -dimensional point cloud with points, denoted by . Usually, each point of point cloud contains 3D coordinates , which means that
; it is also possible to include other coordinates representing RGB information, normal vectors, and so on. In our network architecture, we use hierarchical structure to learn local and holistic features of point cloud. On each level, we use Geometry Similarity Connection(GSC) module (Sec3.2) to capture abundant local geometric information of each point and share geometric features with distant points. After that, we adopt the FPS algorithm to down-sample the points and the features (Sec 3.4). Low-level features represent the local geometric information, while high-level features provide semantic information.
As for classification task, instead of using only the last level’s features as the encoder’s output , we concatenate all levels’ features together and extract the holistic features by global max pooling and global average pooling. The concatenation of all levels’ features aims to fuse the features from different levels and the pooling operator urges to capture the most effective features for classification. Then we handle the holistic features by fully-connected layers with integrated dropout 
to calculate the probability for each category. The cross-entropy loss is used for training.
As for segmentation task, our segmentation network has an encoder which is the same as the classification network’s. We need to interpolate the features on each level of the encoder module and then concatenate them. Inspired by , we also concatenate repeated one-hot category label to the features before MLP . This mechanism is designed to apply the category supervision to the point-wise segmentation.
3.2 Geometry Similarity Connection Module
Eigen-Graph. As shown in Figure 4, we use
-nearest neighbors search (KNN) algorithm to get-nearest neighbors of each point in Euclidean space. Let be -nearest neighbors of . Let , where belongs to -nearest neighbors of in Euclidean space.
We define the 3D structure tensor as , even if the ground truth (surface) is locally flat, noise points cause unflatness of point cloud sampled from the surface. As long as the neighbor region of the given point is not flat, is a symmetric positive definite matrix. We have the decomposition , where is a rotation matrix and
is a diagonal and positive definite matrix, known as eigenvectors and eigenvalues matrices respectively. The positive eigenvaluesare ordered so that . At each point , we get the 3D structure tensor and denote the eigenvalues at point by . We use norm to calculate the distances between different points.
We choose the indices of -nearest neighbors of each point according to Eigen Matrix whose element is .
GroupLayer. Now we have -nearest neighbors’ indices in Euclidean space and -nearest neighbors’ indices in Eigenvalue space. As we have presented in Figure 3, we denote the input features of level by . For convenience, we omit the superscript . In GroupLayer, let be -nearest neighbors’ features of point , and let be -nearest neighbors’ features of point . We group the neighbor features as follows:
where means concatenation. Then we concatenate with as the features at each point:
In the first GSC module shown in Figure 2, the input features are the coordinates and the eigenvalues of points. We group coordinates using -nearest neighbors and group eigenvalues using -nearest neighbors. In the other GSC modules, we use the previous level’s output as the input features and group features in both Euclidean space and Eigenvalue space.
MLP and MaxPooing. In GSC module, we calculate features at each point from GroupLayer and implement the multilayer perception (MLP), then we use Max-Pool in neighbor domains to get the features of each point:
And the output of GSC module is denoted by .
3.3 Rotation and Translation Invariance
In this subsection we give some theoretical analysis about rotation and translation invariant robustness of our method. As we have mentioned in Sec 3.2, the 3D structure tensor is . We denote the eigenvalues of as and the corresponding eigenvectors are . Thus we have the following equation:
The way we get 3D structure tensor guarantees that 3D structure tensor of each point is invariant to translation. Let be an arbitrary rotation matrix in 3D Euclidean space. After applying rotation matrix to point cloud, we get the new 3D structure tensor . We can get the following equations:
From equations above, we know that are also the eigenvalues of 3D structure tensor . So the eigenvalues of each point is invariant to rotation and translation which ensure the indices of -nearest neighbors of each point are invariant (illustrated in Figure 5). This mechanism improves the robustness of our model to rotation and translation. The empirical experiment results also demonstrate what we have proved theoretically (Sec 4.2).
3.4 Complements of the Architecture
Hierarchical Feature Learning. Our method follows the design where the hierarchical structure  is composed of a set of abstract layers. By this way, we can enlarge receptive field of each point progressively along the hierarchy. As shown in Figure 2, the hierarchical structure is composed of three abstract levels. An abstract level takes points matrix and features matrix as input. The output are points matrix and features matrix. We use FPS algorithm to down-sample the points and features at 3 levels (1024-512-256 points in classification network).
Feature Interpolation for Segmentation Task. In segmentation task, to obtain the feature map which has the same number of points as the original input, we must interpolate features from the coarsest scale to the original scale . The -th features interpolation level takes decoder features matrix as input, let and be the spatial points set with and coordinates. To obtain the features of -st level, we simply find three nearest neighbors of in and then calculate the weighted sum of their features. The combination weights are acquired according to the neighbors’ normalized spatial distances.
In this section, we conduct comprehensive experiments to evaluate our GS-Net. In Sec 4.1, we evaluate our GS-Net for point cloud analysis on classification task and segmentation task. In Sec 4.2, we compare the rotation robustness of GS-Net with state-of-the-art methods.
4.1 Point Cloud Analysis
Classification on ModelNet40. ModelNet40  contains 12,311 CAD models from 40 categories. 9,843 models are used for training and 2,468 models are for testing. We evaluate our model on the ModelNet40  for classification task. Following the configuration in PointNet , we use the source code of PointNet to sample points uniformly from the mesh models. The results are summarized in Table 2. Our model achieves the state-of-the-art performance (93.3%).
Part Segmentation on ShapeNet Part. Part segmentation task is a challenging task for fine-gained shape analysis. We evaluate our method for this task on ShapeNet Part benchmark 
. ShapeNet Part consists of 16,880 models from 16 shape categories and 50 different parts in total, with 14,006 models for training and 2,874 models for testing split. Each point cloud is annotated with 2 to 6 parts. We choose mIoU as the evaluation metric which is averaged across all classes and instances. The results are summarized in Table1. The input consists of coordinates and normals. Our method can effectively deal with point clouds with geometric characteristic such as symmetrical structure. Figure 6 shows some segmentation examples.
4.2 Comparison of Rotation Robustness
We compare GS-Net with the state-of-the-art approaches on ModelNet40 classification for rotation-robustness evaluation. The results are summarized in Table 3 with four comparisons: (1) both training set and test set are augmented by random angle rotation for z axis(z/z); (2) training set with random angle rotation for z axis and test set with random angles rotation for all three axes (x,y,z) (z/s); (3) both training set and test set are augmented by random angles rotation for all three axes (s/s); (4) only test set with random angles rotation for all three axes (0/s).
Table 3 consists of two groups of approaches. The first group consists of four approaches: DGCNN , Point , Point++  and SpiderCNN , while the second group is our approach with different settings. Different from our model shown in Figure 2, last one of second group only use eigenvalues as the input features without any coordinates information and it achieves the best performances of comparison (2) and (4). While our original model achieves the best performance of comparison (3) and get a comparable result with DGCNN of comparison (1). These comparisons aim to validate the eigenvalues of each point is invariant to rotation and can improve robustness of our method to rotation.
5 Analysis of GS-Net
In Sec 5.1, we perform the ablation analysis of GS-Net. We discuss the effectiveness of architecture design and input features. Sec 5.2 is the complexity comparison of GS-Net and existing methods. Sec 5.3 shows that Eigen-Graph efficiently capture local and holistic geometric features such as symmetry and connectivity.
5.1 Ablation Analysis
Analysis of Architecture Design. We analyze the effectiveness of our method’s components on ModelNet40 benchmark for classification task. The results are summarized in Table 4. All experiments in the ablation study are conducted using nearest neighbors.
|FPS||-nn space||# Points||Accuracy(%)|
|On||EU + EI||1024||92.8|
|Off||EU + EI||1024||92.5|
|On||EU + EI||2048||92.9|
Input Features. The input features directly affect the representation of local geometry and relations between points, thus how to define the input features is an worth exploring issue. In order to find the most suitable feature combination, we experiment with six settings, whose results are summarized in Table 5. As can be seen, using only coordinates, the accuracy can also reach 92.5%; Inspire by  we use only shape context as the input feature of the points and the result can reach 91.9%; using the differences of coordinates, the result can reach 92.6%; with the combination of coordinates and their differences, the result improves to 92.7%; then we add eigenvalues of points and their differences to the input features, it gets an accuracy of 92.8%; on this basis, we add 3D Euclidean distance of points and their neighbors, it obtains the accuracy of 92.9%; however, with the addition of eigenvectors, it can not perform as well as other settings.
5.2 Complexity Analysis
We evaluate the model complexity in terms of model size and forward time in Table 6
. The forward time is recorded with a batch size of 8 on a single GTX 1080 GPU, which is the same hardware environment of the comparison models. These models are implemented by Pytorch. As illustrated, our method has the competitive performance with great parameter-efficiency and acceptable speed.
5.3 Visualization of GS-Net
As shown in Figure 7, we visualize the Eigen-Graph of the anchor points (red) from three point clouds. The blue points in the first row represent the nearest neighbors in Euclidean space, while the green points in the second row indicate the nearest neighbors in Eigenvalue space. As can be seen, the green points have similar local geometry with the anchor point. Moreover, the Eigen-Graph is rotation invariant, as Figure 5 shows, nearest neighbors in Eigenvalue space can not be influenced by rotations and translations of the point cloud.
We develop Geometry Sharing Net (GS-Net) for point cloud analysis. The core to GS-Net is GSC module, which can share the similar geometric information with distant points and can be integrated into different existing pipelines for point cloud analysis. Moreover, the Eigen-Graph of GSC module improves the rotation and translation robustness fundamentally. Experiments have shown that GS-Net achieves the state-of-the-art performance and has robustness to geometric transformations.
This work is partially supported by the National Key Research and Development Program of China (No. 2016YFC1400704), and National Natural Science Foundation of China (61876176, U1713208), Shenzhen Basic Research Program (JCYJ20170818164704758, CXB201104220032A), the Joint Lab of CAS-HK, Shenzhen Institute of Artificial Intelligence and Robotics for Society.
-  (2018) Point convolutional neural networks by extension operators. ACM Transactions on Graphics 37 (4), pp. 1–12. Cited by: Table 2.
-  (2018) Local spectral graph convolution for point set feature learning. Cited by: Table 2.
-  (2011) Dimensionality based scale selection in 3d lidar point clouds. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci 38 (5), pp. W12. Cited by: §2.2.
-  (2018) GVCNN: group-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 264–272. Cited by: §1.
-  (2018) Multiresolution tree networks for 3d point cloud processing. Cited by: Table 2.
-  (2018) Flex-convolution (million-scale point-cloud learning beyond grid-worlds). Cited by: Table 2.
-  (2018) PCPNet learning local shape properties from raw point clouds. In Computer Graphics Forum, Vol. 37, pp. 75–85. Cited by: §2.3.
-  (1991) Approximation capabilities of multilayer feedforward networks. Neural networks 4 (2), pp. 251–257. Cited by: §3.1, §3.2.
-  (2017) Point-wise convolutional neural network. Cited by: Table 2.
-  (2018) Recurrent slice networks for 3d segmentation of point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2626–2635. Cited by: Table 1.
-  (2017) Escape from cells: deep kd-networks for the recognition of 3d point cloud models. In Proceedings of the IEEE International Conference on Computer Vision, pp. 863–872. Cited by: Table 1, Table 2.
-  (2018) Large-scale point cloud semantic segmentation with superpoint graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4558–4567. Cited by: §2.1.
-  (2018) So-net: self-organizing network for point cloud analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9397–9406. Cited by: Table 1, Table 2.
-  (2015) VoxNet: a 3d convolutional neural network for real-time object recognition. Cited by: §1.
-  (2018) Frustum pointnets for 3d object detection from rgb-d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927. Cited by: §1.
-  (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660. Cited by: §1, §2.1, §2.3, Table 1, §4.1, §4.2, Table 2, Table 3, Table 6.
-  (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems, pp. 5099–5108. Cited by: §1, §2.1, §2.3, §3.1, §3.4, §3.4, Table 1, §4.2, Table 2, Table 3, Table 6.
-  (2008) Towards 3d point cloud based object maps for household environments. Robotics and Autonomous Systems 56 (11), pp. 927–941. Cited by: §1.
-  (2018) Mining point cloud local structures by kernel correlation and graph pooling. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4548–4557. Cited by: Table 1, Table 2.
-  (2017) Dynamic edge-conditioned filters in convolutional neural networks on graphs. Cited by: Table 2.
Dropout: a simple way to prevent neural networks from overfitting.
Journal of Machine Learning Research15 (1), pp. 1929–1958. Cited by: §3.1.
-  (2015) Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision, pp. 945–953. Cited by: §1.
-  (2018) Semantic classification of 3d point clouds with multiscale spherical neighborhoods. In 2018 International Conference on 3D Vision (3DV), pp. 390–398. Cited by: §2.1.
-  (2018) Tensor field networks: rotation- and translation-equivariant neural networks for 3d point clouds. Cited by: §2.3.
-  (2018) Dynamic graph cnn for learning on point clouds. arXiv preprint arXiv:1801.07829. Cited by: §2.1, §2.3, §3.1, Table 1, §4.2, Table 2, Table 3, Table 6.
-  (2015) 3D shapenets: a deep representation for volumetric shapes. In IEEE Conference on Computer Vision & Pattern Recognition, Cited by: §1, §4.1.
-  (2018-06) Attentional shapecontextnet for point cloud recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §5.1.
-  (2018) Attentional shapecontextnet for point cloud recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4606–4615. Cited by: Table 1, Table 2.
-  (2018) Spidercnn: deep learning on point sets with parameterized convolutional filters. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 87–102. Cited by: §1, §2.3, Table 1, §4.2, Table 2, Table 3.
-  (2019) Modeling point clouds with self-attention and gumbel subset sampling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3323–3332. Cited by: Table 2.
-  (2016) A scalable active framework for region annotation in 3d shape collections. ACM Transactions on Graphics (TOG) 35 (6), pp. 210. Cited by: §4.1.
-  (2017) Deep sets. Cited by: §2.1.