Dense-Resolution Network for Point Cloud Classification and Segmentation

05/14/2020 ∙ by Shi Qiu, et al. ∙ CSIRO Australian National University 0

Point cloud analysis is attracting attention from Artificial Intelligence research since it can be extensively applied for robotics, Augmented Reality, self-driving, etc. However, it is always challenging due to problems such as irregularities, unorderedness, and sparsity. In this article, we propose a novel network named Dense-Resolution Network for point cloud analysis. This network is designed to learn local point features from point cloud in different resolutions. In order to learn local point groups more intelligently, we present a novel grouping algorithm for local neighborhood searching and an effective error-minimizing model for capturing local features. In addition to validating the network on widely used point cloud segmentation and classification benchmarks, we also test and visualize the performances of the components. Comparing with other state-of-the-art methods, our network shows superiority.



There are no comments yet.


page 1

page 2

page 3

page 4

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the help of fast progress in 3D sensing technology, an increasing number of researchers are now focusing on point cloud data. Different from complex 3D data e.g. , mesh and volumetric data, point clouds are concise. Particularly, point clouds are easier to collect using different types of scanners [2]: e.g. , LiDAR scanner [11], light scanner, sound scanner, etc. . Traditional algorithms about point cloud learning [30, 23, 29, 35]

used to estimate geometric information and capture indirect clues utilizing complicated models. In contrast, deep learning provides intuitive and effective data-driven approaches to acquire information from 3D point cloud data leveraging Convolutional Neural Networks (CNN).

In general, CNN-related methods can be divided into two streams [6]. The first one is projection-based, which involves some intermediate data representations and 2D/3D CNN for learning: e.g. , MVCNN [32] using multi-view 2D images, and VoxNet [22]

taking volumetric grids. The other one is point-based, which directly processes points. It has become popular since the multi-layer perceptrons (

s) operation was introduced by PointNet [26]. Subsequently, others [27, 37, 33, 28] promoted to learn local features in various ways.

Figure 1: A Birdeye view of the dense-resolution network.

For local areas of point clouds, Qi et al.  [27] and Liu et al.  [18] apply the Ball Query algorithm [25] to group local points while [37, 28] use k-nearest neighbors () to construct neighborhoods. According to these methods, the performances are affected by the areas of their pre-defined neighborhoods i.e. the searching radius of Ball Query or the of . If the area is small, it cannot cover sufficient local patterns; if too large, the overlap may involve redundancies. Recent DPC [5] proposes an idea of dilated point convolution to increase the size of the receptive field without extra computational cost. Different from the previous works, we attempt to adaptively define such a local area for each point w.r.t the density distribution around it. With fewer manual and empirical settings, a more reasonable neighborhood is supposed to be set up for each point in the point cloud.

Previously, the idea of error feedback has been applied in 2D human pose estimation 


and image Super-Resolution (SR) 

[7, 19]. In contrast to 3D works [14, 28] utilizing complex error-correcting structure, here we propose an error-minimizing module, leveraging the properties of both error-feedback and CNN training mechanisms, by which the network learning can be guided while the complexity can be reduced. In terms of the architecture, we present a new model called Dense-Resolution Network with two branches: a Full-Resolution (FR) branch and a Multi-Resolution (MR) branch. By collecting features from different resolutions of point cloud and merging feature maps of the FR and MR in a novel fusion method, we can obtain more information for a comprehensive analysis. The main contributions are:

  • We propose a point grouping algorithm to find neighbors for each point considering the density distribution adaptively.

  • We design an error-minimizing module for local feature learning on point clouds.

  • We introduce a network to learn point clouds comprehensively in different resolutions.

  • We conduct thorough experiments to validate the properties and abilities of our proposals. Our results demonstrate that the approach outperforms state-of-the-art methods on some point cloud segmentation and classification benchmarks.

Figure 2: Dense-resolution network architecture. For the FR branch, we learn the features in a series of modules with the full resolution. In the MR branch, point features of different resolutions are investigated in a downsampling/upsampling manner. By merging the feature maps of the two branches, we manage point cloud classification and segmentation tasks. ( means concatenation along channels, and means merging feature maps as Equ 9. The upper-right presents our error-minimizing module. Please refer to Section 3.2)

2 Related Work

Local points grouping. Different from the pioneer PointNet [26] that relied on the global feature, subsequent work captured more local features in detail. PointNet++ [27] firstly introduced Ball Query, an algorithm for collecting possible neighbors of a particular point through a ball-like searching space centering at itself, to group local neighbors of the point. Another simpler algorithm, , gathers nearest neighbors based on a distance metric, and this algorithm is applied for local features learning in  [37, 5, 28].

Although Ball Query and grouping are intuitive, sometimes the size of the neighborhood (i.e. the receptive field of the point) is limited due to the range of searching (i.e. the radius of query ball, or the value of ). Meanwhile, merely increasing the searching range may involve substantial computational cost. To solve this problem, DPC [5] extended regular to


, which gathers local points over a dilated neighborhood obtained by computing the nearest neighbors ( is the dilation factor) and preserving only every -th point. Other works [27, 18, 41] also group neighbors through query balls in different scales (e.g. , multi-scale grouping) to capture information from various sizes of the local area.

However, the existing methods have some issues in common. On the one hand, the performances of grouping algorithms highly rely on pre-defined settings. For example, DGCNN [37] provided the results under different conditions, DPN [5] compared the effects of values, and PointNet++ [27] discussed the influence from query ball radius. On the other hand, the grouping algorithms act on all points of the point clouds without taking the distinct condition of each point or model into account. As far as we are concerned, it is necessary to find an intelligent point-level adaptive grouping algorithm.

Input: Feature map in -dimensional space
Parameters: The number of neighbors , and the maximum dilation factor
Output: The matrix , indices of the selected neighbors for the point cloud

1:Calculate pairwise metrics for the point cloud based on
2:Choose candidate neighbors for the point cloud based on metrics
3:Record the indices and metrics of the candidates
4:Learn the dilation factor for the point cloud from the candidate where ; and
5:Select the indices of neighbors for the point cloud from all candidate indices based on the dilation factor
Algorithm 1 Adaptive Dilated Point Grouping forward pass algorithm

Error feedback structure. Previously in 2D, Carreira et al.  [3] proposed a framework called Iterative Error Feedback (IEF): by minimizing the error loss between current and desired outputs in the back-propagation procedure, the network would help to approach the target. In contrast to minimizing error during back-propagation, the methods in [7, 19] complement the output with a back-projection unit in the forward procedure. For 3D point clouds, PU-GAN [14] leveraged a similar idea for point cloud generation, while  [28] presented a structure with specially designed paths for prominent features learning.

Network architecture for point cloud learning.

To tackle problems in 2D computer vision tasks, many classical architectures have been introduced:

e.g. , VGG [31], ResNet [8], etc. Besides, some works tried different image resolutions for more clues, for example, fully convolutional network [20] keeps the full size of an image, deconvolution network [24] steps into lower resolutions, and HRNet [36] shares the features among different resolutions.

As for 3D point clouds, there are two popular architectures. Some of them follow the form of PointNet++[27], which learns in lower resolutions using Farthest Point Sampling (FPS) in Set Abstraction (SA) module for downsampling and Feature Propagation (FP) module for upsamling the point features. Meanwhile, DGCNN [37] works as a fully convolutional network because it dynamically updates the crafted point graph around each point of the model. Different from them, our approach exploits more clues learnt from various resolutions for better representations of point-wise fine-grained features.

3 Approach

Since PointNet [26] introduced multi-layer perceptrons (s) that directly process point clouds, CNN-based learning on 3D data becomes more intuitive. Basically, an operation (

) can be described as a 1-by-1 convolution with a possible batch normalization 

[10] layer (

) and an activation function (

) on feature map:

In addition, many works craft regional patterns to record more local details. Wang et al.  [37] dynamically draws a graph around each point in -dimensional feature space encoding the information of both the absolute position of the centroid and relative positions of the neighbors in feature space. Specifically, the crafted graph () at the centroid is:

Therefore, the quality of information that can provide highly depends on the neighbors (i.e. ) that the grouping algorithm can find. Starting from this point, we investigate a better grouping algorithm for .

3.1 Adaptive Dilated Point Grouping

As we mentioned in Section 1, there are two main grouping algorithms applied: Ball Query and k-nearest neighbors (). Although they are popular, they have some common issues as analyzed in Section 2. To solve the problems, here we propose an algorithm, Adaptive Dilated Point Grouping (). The pipeline can be described as in Algorithm 1.

We take pairwise Euclidean distances in feature space as our metrics since it can indicate the point density distribution to a certain extent. With as a feature map having size and

as a row vector of all ones with

entries, we calculate the metrics as:


By sorting the metrics in ascending order, we can easily identify the nearest points (i.e. the elements with smallest # () values in each row of ) as candidate neighbors for each point. Next, we select the qualified neighbors from all candidates, whose indices are and metrics are . To be specific, we apply and an activate function (e.g. , logistic function), on the metrics of candidates to summarize the information of point distribution of the local areas. Then, a projection function (e.g. , linear function) can map the activated values to a certain range. Finally, we take a scale function (e.g. , round function) to assign a certain dilation factor for each point according to the summarized information:


As each point has a corresponding dilation factor, we pick up every -th index of candidate indices to form the final neighbors for each point. Following similar behavior of dilated-knn () in [5], we have the indices of final point groups:


3.2 Error-minimizing Module for Local Point Graph

Once the neighbors are selected by , the local graph of point will be:


Assume that the crafted local graph embeds the full information about the neighborhood, it would be possible to restore the previous features by a back-projection. In terms of the back-projection feature , we adopt a 1-by- convolution over the local graph as in [28], since it acts to aggregate the nodes based on learned weights of the edges in the graph, which implicitly simulates a reverse process of crafting the graph:


Therefore, the error feature is defined as the difference between the original input feature and back-projection feature :


Different from the methods in [14, 28, 7, 19] that correct the error by extra computations in the forward pass, we use additional loss to minimize the error during back-propagation:


As the network training continues, this loss can constrain the feature learning by forcing the back-projection feature to approach the original input inside of this module, especially in the early stages of training. Moreover, it is expected to provide further instructions for the grouping of our algorithm compared with the general cross-entropy loss.

With a max-pooling function

being applied on the crafted local graph along with neighbors, we aggregate a prominent local feature as the output of the centroid :


3.3 Dense-Resolution Network Architecture

Although the

algorithm and the error-minimizing module seem promising for local feature extraction, we still need a robust network architecture to leverage the potential offered by both. The basic fully convolutional network architecture in 

[26, 37, 28] remains the same size of points (i.e. full resolution of the point cloud) even in different scales of feature spaces. Even though it can retain the features point-wise without any confusion caused by upsampling, the output may lack channel-wise clues about semantic/shape information, which could be collected from different resolutions of the point cloud.

overall air bag cap car chair ear guitar knife lamp laptop moto mug pistol rocket skate table
mIoU plane phone bike board
# shapes 16881 2690 76 55 898 3758 69 787 392 1547 451 202 184 283 66 152 5271
PointNet [26] 83.7 83.4 78.7 82.5 74.9 89.6 73.0 91.5 85.9 80.8 95.3 65.2 93.0 81.2 57.9 72.8 80.6
A-SCN [39] 84.6 83.8 80.8 83.5 79.3 90.5 69.8 91.7 86.5 82.9 96.0 69.2 93.8 82.5 62.9 74.4 80.8
SO-Net [13] 84.6 81.9 83.5 84.8 78.1 90.8 72.2 90.1 83.6 82.3 95.2 69.3 94.2 80.0 51.6 72.1 82.6
PointNet++ [27] 85.1 82.4 79.0 87.7 77.3 90.8 71.8 91.0 85.9 83.7 95.3 71.6 94.1 81.3 58.7 76.4 82.6
PCNN [1] 85.1 82.4 80.1 85.5 79.5 90.8 73.2 91.3 86.0 85.0 95.7 73.2 94.8 83.3 51.0 75.0 81.8
DGCNN [37] 85.2 84.0 83.4 86.7 77.8 90.6 74.7 91.2 87.5 82.8 95.7 66.3 94.9 81.1 63.5 74.5 82.6
P2Sequence [16] 85.2 82.6 81.8 87.5 77.3 90.8 77.1 91.1 86.9 83.9 95.7 70.8 94.6 79.3 58.1 75.2 82.8
SpiderCNN [40] 85.3 83.5 81.0 87.2 77.5 90.7 76.8 91.1 87.3 83.3 95.8 70.2 93.5 82.7 59.7 75.8 82.8
PointASNL [41] 86.1 84.1 84.7 87.9 79.7 92.2 73.7 91.0 87.2 84.2 95.8 74.4 95.2 81.0 63.0 76.3 83.2
RS-CNN [18] 86.2 83.5 84.8 88.8 79.6 91.2 81.1 91.6 88.4 86.0 96.0 73.7 94.1 83.4 60.5 77.7 83.6
Ours 86.4 84.3 85.0 88.3 79.5 91.2 79.3 91.8 89.0 85.2 95.7 72.2 94.2 82.0 60.6 76.8 84.2
Table 1: Part segmentation results (mIoU(%)) on ShapeNet Part dataset.

To overcome the above limitation, another branch learns the necessary information from different resolutions of the point cloud. In contrast to the full-resolution (FR) branch, a multi-resolution (MR) branch is able to capture point-wise channel-related information from different scales, which contributes to a comprehensive channel-wise understanding. After an enhancement of the feature map of FR, from the feature map of MR (please see Section 4.3 and Table 4 for more details), the final output of our dense-resolution (DR) network can be formulated with element-wise multiplication :


4 Experiments

In this section, the details of our implementation are provided, including network parameters, training settings, datasets, etc. . By comparing the experimental results with other state-of-the-art methods, we analyze the performances quantitatively. Besides, some ablation studies and visualization are presented to illustrate the properties of our approach.

4.1 Implementation

Network details. Generally, our dense-resolution network consists of two branches: a full-resolution (FR) branch and a multi-resolution (MR) branch. Specifically, The FR branch is a series of the error-minimizing modules extracting features in different scales of feature spaces i.e. 64, 128, and 256, etc. The FR output is a projected concatenation of the modules’ outputs. As for the MR branch, we adopt farthest point sampling (FPS) and feature propagation (FP) in  [27] for downsampling and upsampling, respectively. The MR branch starts from the first output of FR in N size; after that, lower resolutions i.e. N/4 and N/16 are investigated. Different from others, more propagated features and skip links are densely connected to enhance the relations between different point resolutions and feature spaces. Empirically, we adopt and as in [37, 5]. For error-minimizing modules in MR, we use regular (equivalent to with ) since the points are sparse.

method input type #points ModelNet40 ScanObjectNN
PointNet [26] coords 89.2 68.2
A-SCN [39] coords 90.0 -
PointNet++ [27] coords 90.7 77.9
SO-Net [13] coords 90.9 -
PointCNN [15] coords 92.2 78.5
PCNN [1] coords 92.3 -
SpiderCNN [40] coords 92.4 73.7
P2Sequence [16] coords 92.6 -
DensePoint [17] coords 92.8 -
RS-CNN [18] coords 92.9 -
DGCNN [37] coords 92.9 78.1
KP-Conv [33] coords 92.9 -
PointASNL [41] coords 92.9 -
Ours coords 93.1 80.3
Table 2: Overall classification accuracies (%) on ModelNet40 and ScanObjectNN datasets. (: 3D coordinates, :, -: unknown)

The output is obtained by following Equation 9. For the classification task, we apply a max-pooling function and Fully Connected (FC) layers to regress confidence scores for all possible categories. In terms of the segmentation task, we attach the max-pooled feature to each point feature of

and further predict the semantic label of each point with FC layers being applied. We implement the project with PyTorch and Python; all experiments are trained and tested on Linux and GeForce RTX 2080Ti GPUs.

111The code and models will be available at

Training strategy.Stochastic Gradient Descent (SGD) with momentum of 0.9 is adopted as the optimizer for classification. The learning rate decreases from 0.1 to 0.001 by cosine annealing [21]

during the 300 epochs. For segmentation, we exploit Adam 

[12] optimization for 200 epochs of training. The learning rate begins at 0.001 and gradually decays with a rate of 0.5 after every 20 epochs. The batch size for both tasks is 32. Besides, training data is augmented with random scaling and translation; the total loss is the sum of regular cross-entropy loss and weighted error-minimizing loss (see Equation 7). Part segmentation is evaluated with a ten-votes strategy used by state-of-the-art approaches [26, 27, 18].

Datasets. We test our approach on two main tasks: point cloud segmentation and classification. The ShapeNet Part dataset [42] is used to predict the semantic class (part label) for each point of the object. In addition, the synthetic ModelNet40 [38] dataset and the real-world ScanObjectNN [34] dataset are used to identify the category of the object.

Figure 3: The learned dilation factors by the algorithm. (First-row: the learned dilation factors in the shallow layer of our network. Second-row: in the deep layer.)
  • ShapeNet Part. In general, the dataset has 16,881 object point clouds in 16 categories. Each point is labeled as one of the 50 parts. As the primary dataset for our experiments, we follow the official data split [4]. We input the 3D coordinates of 2048 points for each point cloud and feed a one-hot class feature before FC layers during training. In terms of the metric for evaluation, we adopt Intersection-over-Union (i.e. IoU). The IoU of the shape is calculated by the mean value of IoUs of all parts in that shape. Particularly, mIoU (i.e. mean IoU) is the average of IoUs for all testing shapes.

  • ModelNet40. It is a popular dataset because of the regular and clean point clouds. There are 12,311 meshes in 40 classes, with 9,843 for training and 2,468 for testing. Corresponding point clouds are generated by uniformly sampling from the surfaces, translating to the origin, and scaling within a unit sphere [26]. In our case, only the 3D coordinates of 1024 points for each point cloud has been used.

  • ScanObjectNN. This real-world object dataset is recently published. Although it has 15,000 objects in only 15 categories, it is practically more challenging due to the background, missing parts, and deformations.

4.2 Results

Segmentation. Table 1

shows the results of related works reported in overall mIoU, which is the most critical evaluation metric on the ShapeNet Part dataset. In general, our network achieves 86.4% and outperforms other state-of-the-art algorithms based on similar experimental settings. As for evaluations inside of each class, we surpass others in 5 out of 16 categories. Particularly in categories with a relatively large number of samples,

e.g. , airplane, chair, or table, we perform even better (two out of these three classes) than others.

Classification. Table 2 presents the overall accuracy of the classification on both synthetic and real-world object datasets. For ModelNet40, we achieve 93.1% and exceed other state-of-the-art results with similar input. Besides, an overall accuracy of 80.3% is obtained on the ScanObjectNN dataset, which is significantly higher than all results on its official leaderboard [9]. The inference time of our model is about 19.2ms running on a single GeForce RTX 2080Ti GPU. In general, our network is effective and robust for point cloud classification.


model overall mIoU
0 - - 85.2
1 - 85.6
2 85.7
3 85.3
4 DR 86.0


Table 3: Ablation study on different modules on ShapeNet Part (%). (: Full-Resolution branch only, : Multi-Resolution branch only, : Dense-Resolution Network, : Adaptive Dilated Point Grouping algorithm, : Error-minimizing module for local point graph.)

4.3 Ablation Studies

Visualization of learned dilation factors. The color of the point corresponds to the learned dilation factor by our algorithm. From Figure 3, we can find that our algorithm tends to assign larger dilation factors to the points on corner/boundary/edges. The reason is that the point distribution around them would be relatively sparse, thus larger neighborhoods for local feature learning are needed. Due to the series connection of modules, the points in deep layers are supposed to have larger receptive fields already, so the larger dilation factors are unnecessary: the points in relatively dense distribution (e.g. , on the flat surfaces or central areas) turn out to have smaller dilation factors as the network goes deeper. Different from regular /Ball Query with a limited receptive field or with fixed dilation factor for all points, our algorithm works adaptively and reasonably as expected.

Effects of components.

Here we conduct an ablation study about the effects of network architecture, grouping algorithm, and the error-minimizing module. We run the experiments on the ShapeNet Part dataset with the same input and classifier, and Table 

3 presents the results in overall mIoU. Comparing model 1&2 to model 0, we observe that the error-minimizing module with applied can significantly improve the network performance for part segmentation. Although the multi-resolution branch (model 3) alone is not able to learn the features as comprehensively as a full-resolution branch (model 2) does, we can take advantage from both by combining them into the form of a dense-resolution network (model 4).


model overall mIoU
0 85.7
1 85.3
2 85.8
3 85.7
4 85.7
5 (Equ 9) 86.0
6 DR (Equ 9)   86.4*


Table 4: Ablation study on different forms of merged feature on ShapeNet Part (%). (: output of FR, : output of MR, : element-wise multiplication, *: ten-votes strategy for evaluation.)

Merging the feature maps. Both FR and MR have properties as mentioned, so we need to find an effective way to unify the advantages of both. We test simple ways of merging the features of and , i.e. concatenating them in channel-wise, adding and multiplying them in element-wise. Comparing the results of model 3&4&5 to model 0 in Table 4, we observe that the simple ways of merging may not improve performance. In contrast, channel-wise enhancement of from (model 5) can improve a bit because of the reasons explained in Section 3.3. With ten-votes testing, the overall mIoU can boost to 86.4%.

5 Conclusion

In this work, we propose a Dense-Resolution Network for point cloud analysis, which leverages information from different resolutions of the point cloud. Specifically, the Adaptive Dilated Point Grouping algorithm is introduced to realize a flexible point grouping based on the density distribution. Moreover, an error-minimizing module and corresponding loss are presented to capture local information and guide the network in training. We conduct experiments and provide ablation studies on both point cloud segmentation and classification benchmarks. According to the experimental results, we outperform competing state-of-the-art methods on ShapeNet Part, ModelNet40, and ScanObjectNN datasets. The quantitative reports and qualitative visualization demonstrate the advantages of our approach.


  • [1] M. Atzmon, H. Maron, and Y. Lipman (2018) Point convolutional neural networks by extension operators. arXiv preprint arXiv:1803.10091. Cited by: Table 1, Table 2.
  • [2] F. Blais et al. (2004) Review of 20 years of range sensor development. Journal of electronic imaging 13 (1), pp. 231–243. Cited by: §1.
  • [3] J. Carreira, P. Agrawal, K. Fragkiadaki, and J. Malik (2016) Human pose estimation with iterative error feedback. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    pp. 4733–4742. Cited by: §1, §2.
  • [4] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, et al. (2015) Shapenet: an information-rich 3d model repository. arXiv preprint arXiv:1512.03012. Cited by: 1st item.
  • [5] F. Engelmann, T. Kontogianni, and B. Leibe (2019) Dilated point convolutions: on the receptive field size of point convolutions on 3d point clouds. External Links: 1907.12046 Cited by: §1, §2, §2, §2, §3.1, §4.1.
  • [6] Y. Guo, H. Wang, Q. Hu, H. Liu, L. Liu, and M. Bennamoun (2019) Deep learning for 3d point clouds: a survey. External Links: 1912.12033 Cited by: §1.
  • [7] M. Haris, G. Shakhnarovich, and N. Ukita (2018) Deep back-projection networks for super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1664–1673. Cited by: §1, §2, §3.2.
  • [8] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §2.
  • [9] HKUST-VGD (2020)

    3D scene understanding benchmark

    Note: 2020-04-20 Cited by: §4.2.
  • [10] S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Cited by: §3.
  • [11] M. Jaboyedoff, T. Oppikofer, A. Abellán, M. Derron, A. Loye, R. Metzger, and A. Pedrazzini (2012) Use of lidar in landslide investigations: a review. Natural hazards 61 (1), pp. 5–28. Cited by: §1.
  • [12] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.1.
  • [13] J. Li, B. M. Chen, and G. Hee Lee (2018) So-net: self-organizing network for point cloud analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9397–9406. Cited by: Table 1, Table 2.
  • [14] R. Li, X. Li, C. Fu, D. Cohen-Or, and P. Heng (2019-10) PU-gan: a point cloud upsampling adversarial network. In The IEEE International Conference on Computer Vision (ICCV), Cited by: §1, §2, §3.2.
  • [15] Y. Li, R. Bu, M. Sun, W. Wu, X. Di, and B. Chen (2018) Pointcnn: convolution on x-transformed points. In Advances in Neural Information Processing Systems, pp. 820–830. Cited by: Table 2.
  • [16] X. Liu, Z. Han, Y. Liu, and M. Zwicker (2019) Point2Sequence: learning the shape representation of 3d point clouds with an attention-based sequence to sequence network. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 8778–8785. Cited by: Table 1, Table 2.
  • [17] Y. Liu, B. Fan, G. Meng, J. Lu, S. Xiang, and C. Pan (2019-10) DensePoint: learning densely contextual representation for efficient point cloud processing. In The IEEE International Conference on Computer Vision (ICCV), Cited by: Table 2.
  • [18] Y. Liu, B. Fan, S. Xiang, and C. Pan (2019) Relation-shape convolutional neural network for point cloud analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8895–8904. Cited by: §1, §2, Table 1, §4.1, Table 2.
  • [19] Z. Liu, L. Wang, C. Li, and W. Siu (2019) Hierarchical back projection network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0. Cited by: §1, §2, §3.2.
  • [20] J. Long, E. Shelhamer, and T. Darrell (2015) Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440. Cited by: §2.
  • [21] I. Loshchilov and F. Hutter (2016) Sgdr: stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983. Cited by: §4.1.
  • [22] D. Maturana and S. Scherer (2015) Voxnet: a 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928. Cited by: §1.
  • [23] N. J. Mitra, N. Gelfand, H. Pottmann, and L. Guibas (2004) Registration of point cloud data from a geometric optimization perspective. In Proceedings of the 2004 Eurographics/ACM SIGGRAPH symposium on Geometry processing, pp. 22–31. Cited by: §1.
  • [24] H. Noh, S. Hong, and B. Han (2015) Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE international conference on computer vision, pp. 1520–1528. Cited by: §2.
  • [25] S. M. Omohundro (1989) Five balltree construction algorithms. International Computer Science Institute Berkeley. Cited by: §1.
  • [26] C. R. Qi, H. Su, K. Mo, and L. J. Guibas (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660. Cited by: §1, §2, §3.3, Table 1, §3, 2nd item, §4.1, Table 2.
  • [27] C. R. Qi, L. Yi, H. Su, and L. J. Guibas (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pp. 5099–5108. Cited by: §1, §1, §2, §2, §2, §2, Table 1, §4.1, §4.1, Table 2.
  • [28] S. Qiu, S. Anwar, and N. Barnes (2019) Geometric back-projection network for point cloud classification. arXiv preprint arXiv:1911.12885. Cited by: §1, §1, §1, §2, §2, §3.2, §3.3.
  • [29] R. B. Rusu, N. Blodow, and M. Beetz (2009) Fast point feature histograms (fpfh) for 3d registration. In 2009 IEEE International Conference on Robotics and Automation, pp. 3212–3217. Cited by: §1.
  • [30] R. Schnabel, R. Wahl, and R. Klein (2007) Efficient ransac for point-cloud shape detection. In Computer graphics forum, Vol. 26, pp. 214–226. Cited by: §1.
  • [31] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. External Links: 1409.1556 Cited by: §2.
  • [32] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller (2015) Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision, pp. 945–953. Cited by: §1.
  • [33] H. Thomas, C. R. Qi, J. Deschaud, B. Marcotegui, F. Goulette, and L. J. Guibas (2019-10) KPConv: flexible and deformable convolution for point clouds. In The IEEE International Conference on Computer Vision (ICCV), Cited by: §1, Table 2.
  • [34] M. A. Uy, Q. Pham, B. Hua, T. Nguyen, and S. Yeung (2019-10) Revisiting point cloud classification: a new benchmark dataset and classification model on real-world data. In The IEEE International Conference on Computer Vision (ICCV), Cited by: §4.1.
  • [35] G. Vosselman, S. Dijkman, et al. (2001) 3D building model reconstruction from point clouds and ground plans. International archives of photogrammetry remote sensing and spatial information sciences 34 (3/W4), pp. 37–44. Cited by: §1.
  • [36] J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang, et al. (2019) Deep high-resolution representation learning for visual recognition. arXiv preprint arXiv:1908.07919. Cited by: §2.
  • [37] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon (2019) Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (TOG) 38 (5), pp. 146. Cited by: §1, §1, §2, §2, §2, §3.3, Table 1, §3, §4.1, Table 2.
  • [38] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao (2015) 3d shapenets: a deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1912–1920. Cited by: §4.1.
  • [39] S. Xie, S. Liu, Z. Chen, and Z. Tu (2018) Attentional shapecontextnet for point cloud recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4606–4615. Cited by: Table 1, Table 2.
  • [40] Y. Xu, T. Fan, M. Xu, L. Zeng, and Y. Qiao (2018) Spidercnn: deep learning on point sets with parameterized convolutional filters. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 87–102. Cited by: Table 1, Table 2.
  • [41] X. Yan, C. Zheng, Z. Li, S. Wang, and S. Cui (2020) PointASNL: robust point clouds processing using nonlocal neural networks with adaptive sampling. External Links: 2003.00492 Cited by: §2, Table 1, Table 2.
  • [42] L. Yi, V. G. Kim, D. Ceylan, I. Shen, M. Yan, H. Su, C. Lu, Q. Huang, A. Sheffer, and L. Guibas (2016) A scalable active framework for region annotation in 3d shape collections. ACM Transactions on Graphics (TOG) 35 (6), pp. 1–12. Cited by: §4.1.