1 Introduction
Recently, 3D computer vision has been playing a pivotal role in many applications, e.g., autonomous driving [15, 21, 35], augmented reality [1], and robotics [7, 2]
. Massive attention has been paid to point cloud which is a basic type of 3D data representation. As a pioneer of deep learning method towards point cloud analysis, PointNet
[22] employs MLP to extract feature from raw 3D coordinates, which has been extensively following. Most previous works are evaluated on synthetic datasets, i.e., ModelNet40 [33] and ShapeNet [36], where the pointcloud models are well aligned. Nevertheless, it is nontrivial to get well aligned point clouds in the real world, where rotation is inevitable. Specifically, the pose of pointcloud models is arbitrary which includes translation and rotation. PointNet and its modified versions fail in this case because of the variation of coordinates caused by transformations. As shown in Fig. 1(a), the classification and segmentation results are significantly confused by the rotation transformation.Considering that the issue of translation can be easily addressed by centring the pointcloud models, some attempts have been developed concentrating on rotation robustness. An intuitive solution is augmenting the training data using arbitrary rotation. However, the augmentation is unable to cover all
rotations with 3 degrees of freedom due to the limited capacity. An alternative way is using Spherical Fourier Transform to achieve rotation equivariance
[8, 4], which still needs some extra process such as max pooling to achieve rotation invariance, and the loss of information is inevitable during the projection.
The issue of rotation sensitivity is able to be boiled down to changes of input coordinates. Inspired by this discovery, we expect to transform the raw coordinates into some rotationinvariant representations as the input of the network, which can achieve intrinsically invariant to the rotation. Although some rotationinvariant representations have been designed [37, 3], existing methods focus on utilizing distance and angles in some local regions, lacking the constrain of global information, which leads to the limited distinctiveness. In this regard, we present a simple yet effective solution to tackle the rotation problem, which combines global and local rotationinvariant features. Specifically, in the aspect of local representations, we extend Darboux feature [25]
into a more distinctive feature space, where the relative locations are distinguished by measuring the distance and difference of local coordinate systems between the query point and its neighbors. For global representations, we estimate a global coordinate system on the down sampling pointcloud model employing singular value decomposition
[10]. Subsequently, the original points are able to be projected into the estimated coordinate system which is invariant to the rotation.Additionally, in order to extract highly dimensional feature from our presented representations, we propose a twobrunch network where the global and local representations are individually processed. The group convolution layer is designed as a basic module, which hierarchically extracts and aggregates features. As illustrated in Fig.1(b), the presented method is fully invariant to the rotation in classification and segmentation tasks. Extensive experiments are developed in both synthetic datasets, i.e., ModelNet40 [33] and ShapeNet [36], and the realworld dataset, i.e., ScanObjectNN [29], which show that our method achieves stateoftheart performance for classification and segmentation tasks on rotationaugmented benchmark.
In a nutshell, Our major contributions are summarized as follows:

We present a combination of global and local representations which are intrinsically invariant to rotation changes.

We propose a twobrunch network^{1}^{1}1The code will be available at https://github.com/sailorz/GLRNet which employs group convolutions to hierarchically extract and aggregate features.

Our method achieves stateoftheart performance on comprehensive evaluation benchmarks which contain rotation in both synthetic and realworld data.
2 Related Work
Spatial transformations.
To alleviate rotation issue, a straightforward way is augmenting training data using arbitrary rotation transformations [22, 22]. However, there are three degrees of freedom in the real world, i.e., pitch, yaw, and roll, and each freedom ranges from to , which leads to innumerable rotations. Consequently, it is impractical to cover all kinds of rotations in the limited capacity. In order to improve the robustness to rotation in a more efficient way, an alternative attempt suggests to employ deep learning methods to immediately learn some spatial transformations [22]. Specifically, TNet is used in PointNet to regress a spatial transformation and a highly dimensional transformation, with the expectation of transforming the point clouds into a canonical coordinate system. Nevertheless, the learned transformations are still vulnerable against the nuisance of rotation, because the regression procedure lacks a theoretical support from the perspective of rotation invariance.
Rotation equivariance convolutions.
Inspired by the tremendous process of convolution networks in 2D computer vision, numerous works have been developed to bridge the success of convolutions from images to point clouds [34, 17, 32]. However, most previous works are sensitive to rotation, without taking the rotation invariance into account. In this regard, some efforts have been developed, which utilize spherical convolutions to achieve rotation equivariance [8, 4, 18]. First, the 3D mesh or voxel models are projected into spheres, translating the coordinates into angles. Second, a series of spherical convolutions are carried out on the spheres to generate a set of feature maps, accomplished by Fourier Transform. Third, Inverse Fourier Transform is employed to recover the angles. However, note that the equivariance means the output and input vary equally, which is not intrinsically invariant to rotation. The global process such as global max pooling is crucial in order to achieve the rotation invariance. Additionally, the loss of information is inevitable during the generation of mesh/voxel, the transformation and inverse transformation, which leads to the limited performance.
Rotation invariance representations. For the sake of interior rotation invariance, some approaches attempt to transform the raw point clouds into rotation invariance representations, where distance and angles are the most widely used features. Specifically, Deng et al. [5]
proposed a 4D point pair feature (PPF) for the task of rotationinvariant descriptors, which utilized the distance and angles between the central reference and neighbors in a local patch, combining the information of normal vectors. For the tasks of classification and segmentation, Chen et al.
[3] integrated distance, angles, , and features in local NN graphs into a cluster network. Zhang et al. [37] combined distance and angle features in local graphs and ones in reference points generated by down sampling. Nevertheless, all previous works concentrated on local features, i.e., relative distance and angles in local graphs, lacking directions of effective global features. It makes sense that only employs local information in the field of local descriptors, while the local features are prone to being ambiguous for the tasks of classification and segmentation. For instance, the relative distance and angles tend to be similar among different regions in the same plane of a desk. In this regard, we present a combination of global and local rotationinvariant representations, filling in the gap of global constrains. The details will be introduced in the following section.3 Method
3.1 Problem Statement
Our method directly processes raw point clouds as input, which are represented as a set of 3D points , i.e., with . The normal vector of each point is also utilized which is indicated by . The rotation issue is formulated by transforming through a orthogonal matrix (), which contains three degrees of freedom, i.e., , and . The task of rotation invariance is able to be boiled down to
(1) 
where is the function that generates the presented representations from raw points.
For the classification task with classes, the output of our approach are scores where the maximum score is expected to correspond to the correct class label. For semantic segmentation task, our method outputs a score map, which indicates the scores of categories for all points. Both the two tasks are supposed to be invariant to the rotation changes.
3.2 Local Branch
As aforementioned, local features have been proved to be critical for the pointcloud classification and segmentation [34, 17, 32]. To this end, we also design a local branch to extract local patterns in graphbased structures. Intuitively, the distance and angle are two kinds of rotationinvariant features, while the core issue is how to utilize these features in an effective way. Inspired by the classical Darboux [25], we dig out local geometrical features by estimating the relative relationships of local coordinate systems between the central point and its neighbors.
The representation is illustrated in Fig. 3. First, for a query point , a local graph is generated by knearest searching, where is one of the neighbors. Second, the Euclidean distance is calculated to indicate the local intensity around , where is the relative vector between and , i.e., . Third, the relationship between the local coordinate systems centred in and is recovered. Note that normal vectors are required to generate local coordinate systems. Specifically, in order to determine three orthogonal vectors ( as an example), we leverage cross product to estimate and as
(2) 
(3) 
Subsequently, the relationship is represented by , which respectively indicates the angle between , , , , , , . The angle ( as an example) is calculated as
(4) 
Note that and are employed to alleviate the ambiguity. Given neighors for a query point , the generated representation is a feature map which fully mines the local pattern around .
3.3 Global Branch
Although local information has been extensively employed in rotationinvariant point cloud analysis, the extraction of global information is still an intractable issue. As mentioned in [37], the limited accuracy of their method is due to the lack of original point coordinates, which reflect the absolute locations in a global coordinate system. The classification result significantly increases when they replace the presented features with raw 3D coordinates, while the approach is no loner invariant to rotation. This observation is reasonable because the local features represent relative relationships which are inevitable to be ambiguous in some cases. For instance, for some points located on a flat plane of a table, the local representations, i.e., distance and angles, tend to be similar among different neighbors. To this end, we design a global branch which contains a global feature extraction module, taking into account of rotation invariance.
An intuitive solution to extract global features is establishing a global coordinate system leveraging singular value decomposition to dig out three main directions which are equivariant to rotation changes. Nevertheless, it is time consuming that uses SVD in the original pointcloud model which may contain thousands of points, and SVD is also sensitive to the data missing. In order to achieve a efficient and robust solution, as shown in Fig. 4, we establish a downsampling subset of the original model , which contains much fewer points, while remains the major geometrical structure, i.e., skeleton. The downsampling procedure is implemented by farthest point sampling in this paper which is able to increase the robustness against nuisances. SVD is then carried out on formulated as
(5) 
where contains the generated three orthogonal axes. The invaraince against rotation is achieved by transforming points of the original model into the established global coordinate system as
(6) 
3.4 Group Convolution
For purpose of extracting highly dimensional features from the presented representations, we integrate the RIfeature extraction module into a deep learning framework. Inspired by the success of deep learning in 2D computer vision [6, 27, 12], massive efforts have been widely carried out for point cloud analysis [22]
, which mainly employ MLP as a basic feature extraction module. Further, to aggregate local information which has been proved to be critical in 2D convolution neural networks, some approaches such as graphbased convolutions
[11, 30, 31] and local max pooling have been developed. However, the previous works include interior drawbacks. For instance, graphbased methods are space consuming and local max pooling carried out after MLP which still individually processes each point leads to the inevitable loss of information. To this end, we design a series of group convolution layers which are able to hierarchically extract and aggregate features in a efficient way. Note that our framework is intrinsically invariant to rotation changes owing to the rotationinvariant representations.As shown in Fig. 5, for a central point (red dot), where represents the feature dimension, a graph is established by nearest searching in the space. The neighbors (black dots) are distributed into several groups according to the Euclidean distance from the central point, which transforms the unordered points into a sorted format. The group convolution is able to be conducted on the groups as
(7) 
where is a set of learning weights which are shared among different groups, and is the aggregated feature block of . The feature map is output by concatenating all feature blocks as
(8) 
Instead of using the combination of MLP and local max pooling which is accountable for the loss of information, we employ a set of group convolutions to hierarchically aggregate the input representation into a feature map, which intensively digs out the effective local information.
3.5 RotationInvariant Analysis


As demonstrated in Fig. 6, we visualize the extracted global and local representations in the 3D space using  [19]. Compared with raw point locations in Fig. 6 (a) which are sensitive to orientation changes, the projected locations of our representations in Fig. 6 (b) are identical facing the challenge of rotation, which intrinsically guarantees the robustness against rotation for the subsequent learning process.
The theoretical demonstration is also introduced as follows.
Distance. Assuming is the norm of , where , the invariance against rotation is able to be proved as
(9) 
Angle. Supposing are the angles between and , the equivalence is formulated as
(10) 
Singular Value Decomposition. We define two point clouds as and () with . Singular value decomposition is respectively performed as
(11) 
(12) 
so the relationship between and is able to be derived as . The invariance of point locations transformed by is then shown as
(13) 
4 Experiments
In this section, we develop experiments on three datasets designed for different tasks, i.e., ModelNet40 [33] (Synthetic shape classification), ScanObjectNN [29] (Real world shape classification), and ShapeNet [36] (Part segmentation). Ablation study is also performed to evaluate the effectiveness of our network design.
4.1 Implementation Details
For local graph generation, we use nearest searching to find out neighbors for each central point. In global branch, we down sample the original model into points utilizing farthest point sampling. For group convolutions which are individual between two branches, the dimensions
are employed. Each group convolution is followed by Batch Normalization
[13] and LeakyReLU [9]. We use three fully connected layers to predict classification results, and three layers of MLP to generate segmentation results, where and indicate the number of candidate labels.4.2 Synthetic Shape Classification
We evaluate our method on ModelNet40 which has been extensively used for synthetic shape classification [16, 14]. ModelNet40 includes CAD models from categories that are split into for training and for testing. We randomly sample points from each model. These points are then centralized and normalized into a unit sphere.
We divide previous works into two categories, i.e., rotationsensitive method and rotationrobust method. The experiments are performed in three different cases, i.e., raw training data and testing data, raw training data and 3D rotationaugmented testing data, and 3D rotationaugmented training data and testing data, which are respectively indicated by , , and . Table 1 lists the experimental results. First, In the case of , our method (GLRNet) surpasses the other rotationrobust methods. Compared with SphericalCNN and SCNN where mesh information is necessary, our method achieves superior performance even though we use raw points as input, which verifies our framework is more effective than spherical solutions. For ClusterNet and Riconv which also propose some local rotaioninvariant representations, the lack of global information leads to inferior performance compared with our method. Second, in the situations of and , the results of GLRNet are almost identical, exceeding other ones by a large margin, while the results of rotationsensitive algorithms considerably decline. The previous stateoftheart approach (DGCNN) is vulnerable in , which only gets a accuracy. The performance is still unsatisfactory () in , even though the training data is augmented by 3D rotations. These phenomenons show that it is crucial to take into account of rotation robustness for the applications in the real world.
Rotationsensitive Method  input  # views  z/z(%)  z/SO3(%)  SO3/SO3(%) 
VoxNet [20]  volume  12  83.0    73.0 
Subvolume [23]  volume  20  89.5  45.5  85.0 
MVCNN [28]  image  80  90.2  81.5  86.0 
PointNet [22]  point  1  89.2  16.4  75.5 
PointNet++ [24]  point  1  91.8  18.4  77.4 
PointCNN [17]  point  1  91.3  41.2  84.5 
DGCNN [32]  point  1  92.2  20.6  81.1 
Rotationrobust Method  input  # views  z/z(%)  z/SO3(%)  SO3/SO3(%) 
SphericalCNN [8]  mesh  1  88.9  76.7  86.9 
SCNN [18]  mesh  1  89.6  87.9  88.7 
ClusterNet [3]  point  1  87.1  87.1  87.1 
Riconv [37]  point  1  86.5  86.4  86.4 
GLRNet  point  1  90.2  90.2  89.7 
4.3 Real World Shape Classification
For purpose of analysing the limitation of our method, we estimate the confusion matrix which is shown in Fig.
7 (a). An unexpected discovery is observed that ModelNet40 contains interior ambiguity. Specifically, as illustrated in Fig. 7 (a), the most two confusing categories are flower pot and plant, so we show the models belong to these categories in Fig. 7(b), where both two models include similar plants and pots. They are ambiguous that can not be explicitly classified even by human beings.
Additionally, considering that the objects in ModelNet40 are manmade CAD models, which are thus wellaligned and noisefree, there is a significant gap between the synthetic data and realworld data which tends to include different oriented objects and various nuisances, e.g., missing data, occlusion, and nonuniform density. In order to evaluate the performance of shape classification in the real world and the robustness against noises in a reliable way, we perform experiments on ScanObjectNN [29] which is collected in the realworld indoor scenes on the one hand and declares to discard ambiguous objects on the other hand. This dataset includes objects that are categorized into categories, taking into account of one freedom rotation, translation, missing data, background noise, occlusion, and nonuniform density. Some examples in this dataset are shown in Fig. 8.
We develop the experiments on the easiest part OBJ_BG without rotation, translation, and scaling, and the hardest part PB_T50_RS which contains bounding box translation, rotation around the gravity axis, and random scaling. The evaluated results are shown in Table 2. Our method achieves the best performance compared with previous works, which indicates that GLRNet is not only invariant to rotation, but also robust to common nuisances. Consequently, it is promising to utilize our method for the classification task in the real world. However, the performances of GLRNet considerably decline compared with the ones in Table 1, which suggests that there is still a large room for further improvement from the perspectives of robustness and generalization.
Method  OBJ_BG  PB_T50_RS  
z/SO3  SO3/SO3  z/SO3  SO3/SO3  
PointNet [22]  16.7  54.7  17.1  42.2 
PointNet++ [24]  15.0  47.4  15.8  60.1 
SpiderCNN [34]  17.6  58.9  15.4  46.4 
DGCNN [32]  17.7  71.8  16.1  63.4 
PointCNN [17]  14.6  63.7  14.9  51.8 
Riconv [37]  78.4  78.1  67.9  68.3 
GLRNet  79.0  78.8  68.2  68.6 
4.4 Part Segmentation
Given a pointcloud model, the target of segmentation is accurately predicting perpoint labels. Compared with the shape classification, segmentation is a more challenging task which requires the capacity of capturing finegrained patterns. Consequently, we extend our experiments on ShapeNet [36] which is a widely used dataset for part segmentation evaluation. We use a part of ShapeNet that includes 3D models from kinds of objects with part categories. Overall average category mIoU (Cat. mIoU) [26] is utilized to measure the segmentation performance, which is calculated by immediately averaging the results over categories.
Method  aero  bag  cap  car  chair  earph.  guitar  knife  lamp  laptop  motor  mug  pistol  rocket  skate  table 

#shapes  2690  76  55  898  3758  69  787  392  1547  451  202  184  283  66  152  5271 
PointNet  40.4  48.1  46.3  24.5  45.1  39.4  29.2  42.6  52.7  36.7  21.2  55.0  29.7  26.6  32.1  35.8 
PointNet++  51.3  66.0  50.8  25.2  66.7  27.7  29.7  65.6  59.7  70.1  17.2  67.3  49.9  23.4  43.8  57.6 
PointCNN  21.8  52.0  52.1  23.6  29.4  18.2  40.7  36.9  51.1  33.1  18.9  48.0  23.0  27.7  38.6  39.9 
DGCNN  37.0  50.2  38.5  24.1  43.9  32.3  23.7  48.6  54.8  28.7  17.8  74.4  25.2  24.1  43.1  32.3 
Riconv  79.1  76.4  73.9  69.1  87.0  68.6  89.0  80.0  75.6  75.3  54.1  89.5  74.5  52.9  65.1  77.1 
Ours  81.0  76.9  79.5  73.4  86.0  67.5  89.4  85.6  83.1  83.1  62.7  91.3  76.0  61.4  78.2  78.9 
Method  aero  bag  cap  car  chair  earph.  guitar  knife  lamp  laptop  motor  mug  pistol  rocket  skate  table 

PointNet  81.6  68.7  74.0  70.3  87.6  68.5  88.9  80.0  74.9  83.6  56.5  77.6  75.2  53.9  69.4  79.9 
PointNet++  79.5  71.6  87.7  70.7  88.8  64.9  88.8  78.1  79.2  94.9  54.3  92.0  76.4  50.3  68.4  81.0 
PointCNN  78.0  80.1  78.2  68.2  81.2  70.2  82.0  70.6  68.9  80.8  48.6  77.3  63.2  50.6  63.2  82.0 
DGCNN  77.7  71.8  77.7  55.2  87.3  68.7  88.7  85.5  81.8  81.3  36.2  86.0  77.3  51.6  65.3  80.2 
Riconv  79.2  73.1  75.5  68.8  86.8  68.9  89.0  79.8  76.4  77.5  57.0  89.3  70.9  48.5  66.6  77.8 
Ours  81.2  77.3  79.1  72.5  86.0  71.1  88.7  84.8  82.4  84.3  59.2  91.6  75.8  62.1  77.8  79.1 
Method  z/z (%)  z/SO3 (%)  SO3/SO3 (%) 
PointNet  80.4  37.8  74.4 
PointNet++  81.9  48.2  76.7 
PointCNN  84.6  34.7  71.4 
DGCNN  82.3  37.4  73.3 
SpiderCNN  82.4  42.9  72.3 
Riconv  74.6  74.2  73.7 
GLRNet  78.4  78.4  78.3 
The general results are reported in Table 5, and the specific results are listed in Table 3 and Table 4. In the situation without rotation (), our approach considerably surpasses the previous rotationinvariant algorithm (Riconv); In the case of rotations ( and ), GLRNet achieves consistent performance, significantly exceeding other algorithms, which empirically confirms that GLRNet makes a well tradeoff between rotation invariance and Cat. mIoU.
4.5 Evaluations of Network Design
In order to further verify the effectiveness of our twobranch network design, we perform an ablation study. Specifically, we separate the global branch and local branch and individually employ each branch to train the classification network on ModelNet40.
As reported in Table 6, considerable decline occurs when the branch is individually used during the training step. It is confirmed that the combination of global and local representations is a promising solution to increase the distinctiveness in the embedded feature space. The two branches play complementary roles that reasonably take into account of considerations from two different views, i.e., global observation and local finegrained patterns.
Global Branch  Local Branch  Acc. (%) 

yes    87.4 
  yes  85.3 
yes  yes  90.2 
4.6 Limitation
Although our method achieves stateoftheart performances, the limitation is still can not be ignored. During the RIfeature extraction in global branch, the original model is projected into a global coordinate system estimated by SVD, which leads to various orientations among different models. Considering that the objects in existing datasets are wellaligned, the various orientations reduce the underlying consistence among the objects from the same category, which causes the loss of performance. However, due to the rotations in the real world, it is impractical to obtain wellaligned instances in practical applications. Our method still shows a promising prospect for the classification and segmentation tasks in the real world.
5 Conclusion
We have presented a combination of global and local representations which are intrinsically invariant to rotations. For further highly dimensional feature extraction, we integrate the representations into a twobranch network where a series of group convolutions are designed to hierarchically extract and aggregate features. Both theoretical and empirical proofs for the invariance against rotations are provided. Experiments also demonstrate the superiority of our twobranch network design. Our method shows a promising prospect for the realworld applicaitons.
References
 [1] (2017) Towards subjective quality assessment of point cloud imaging in augmented reality. In 2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6. Cited by: §1.
 [2] (2006) Simultaneous localization and mapping (slam): part ii. IEEE robotics & automation magazine 13 (3), pp. 108–117. Cited by: §1.

[3]
(2019)
ClusterNet: deep hierarchical cluster network with rigorously rotationinvariant representation for point cloud analysis
. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 4994–5002. Cited by: §1, §2, Table 1.  [4] (2018) Spherical cnns. In International Conference on Learning Representations, Cited by: §1, §2.

[5]
(2018)
Ppffoldnet: unsupervised learning of rotation invariant 3d local descriptors
. In Proceedings of the European Conference on Computer Vision, pp. 602–618. Cited by: §2.  [6] (2009) Imagenet: a largescale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. Cited by: §3.4.
 [7] (2006) Simultaneous localization and mapping: part i. IEEE robotics & automation magazine 13 (2), pp. 99–110. Cited by: §1.
 [8] (2018) Learning so (3) equivariant representations with spherical cnns. In Proceedings of the European Conference on Computer Vision, pp. 52–68. Cited by: §1, §2, Table 1.
 [9] (2011) Deep sparse rectifier neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 315–323. Cited by: §4.1.
 [10] (1971) Singular value decomposition and least squares solutions. In Linear Algebra, pp. 134–151. Cited by: §1.
 [11] (2018) Multikernel diffusion cnns for graphbased learning on point clouds. In Proceedings of the European Conference on Computer Vision, pp. 0–0. Cited by: §3.4.
 [12] (2016) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. Cited by: §3.4.
 [13] (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Cited by: §4.1.
 [14] (2018) Pointsift: a siftlike network module for 3d point cloud semantic segmentation. arXiv preprint arXiv:1807.00652. Cited by: §4.2.
 [15] (2019) GS3D: an efficient 3d object detection framework for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1019–1028. Cited by: §1.
 [16] (2018) Sonet: selforganizing network for point cloud analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9397–9406. Cited by: §4.2.
 [17] (2018) Pointcnn: convolution on xtransformed points. In Advances in Neural Information Processing Systems, pp. 820–830. Cited by: §2, §3.2, Table 1, Table 2.
 [18] (2018) Deep learning 3d shapes using altaz anisotropic 2sphere convolution. Cited by: §2, Table 1.

[19]
(2008)
Visualizing data using tsne.
Journal of Machine Learning Research
9 (Nov), pp. 2579–2605. Cited by: Figure 6, §3.5.  [20] (2015) Voxnet: a 3d convolutional neural network for realtime object recognition. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 922–928. Cited by: Table 1.
 [21] (2019) LaserNet: an efficient probabilistic 3d object detector for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12677–12686. Cited by: §1.
 [22] (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660. Cited by: §1, §2, §3.4, Table 1, Table 2.
 [23] (2016) Volumetric and multiview cnns for object classification on 3d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5648–5656. Cited by: Table 1.
 [24] (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pp. 5099–5108. Cited by: Table 1, Table 2.
 [25] (2008) Aligning point cloud views using persistent feature histograms. In 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3384–3391. Cited by: §1, §3.2.
 [26] (2018) Mining point cloud local structures by kernel correlation and graph pooling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4548–4557. Cited by: §4.4.
 [27] (2014) Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §3.4.
 [28] (2015) Multiview convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953. Cited by: Table 1.
 [29] (2019) Revisiting point cloud classification: a new benchmark dataset and classification model on realworld data. arXiv preprint arXiv:1908.04616. Cited by: §1, §4.3, §4.
 [30] (2018) Learning localized generative models for 3d point clouds via graph convolution. Cited by: §3.4.
 [31] (2018) Local spectral graph convolution for point set feature learning. In Proceedings of the European Conference on Computer Vision, pp. 52–66. Cited by: §3.4.
 [32] (2019) Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics 38 (5), pp. 146. Cited by: §2, §3.2, Table 1, Table 2.
 [33] (2015) 3d shapenets: a deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1912–1920. Cited by: §1, §1, §4.
 [34] (2018) Spidercnn: deep learning on point sets with parameterized convolutional filters. In Proceedings of the European Conference on Computer Vision, pp. 87–102. Cited by: §2, §3.2, Table 2.
 [35] (2018) IPOD: intensive pointbased object detector for point cloud. arXiv preprint arXiv:1812.05276. Cited by: §1.
 [36] (2016) A scalable active framework for region annotation in 3d shape collections. ACM Transactions on Graphics 35 (6), pp. 210. Cited by: §1, §1, §4.4, §4.
 [37] (2019) Rotation invariant convolutions for 3d point clouds deep learning. arXiv preprint arXiv:1908.06297. Cited by: §1, §2, §3.3, Table 1, Table 2.
Comments
There are no comments yet.