Point clouds, which comprise raw outputs of many 3D data acquisition devices such as radars and sonars, are an important 3D data representation for computer-vision applications. Consequently, real applications such as object classification and segmentation usually require high-level processing of 3D point clouds. Recent research has proposed to employ Deep Neural Network (DNN) for high-accuracy and high-level processing of point clouds, achieving remarkable success. Representative DNN models for point-cloud data classification include PointNet[Qi et al.2017a], PointNet++ [Qi et al.2017b] and DGCNN [Wang et al.2018], which successfully handled the irregularity of point clouds, achieving high classification accuracies and robustness to random point-missing/dropping.
Despite the success of these DNN-based models, recent study has also shown vulnerability of these models to adversarial point clouds generated by point shifting/addition [Xiang, Qi, and Li2018], leading to model misclassification and thus misbehaviors of systems equipped with those models. It is worth noting that adversarial attacks considered in [Xiang, Qi, and Li2018]
only refer to point shifting and addition, which we argue is not a natural way to generate adversarial point clouds in real-world situations. This is because shifted/added points may contain abnormal outliers, which, on the one hand, will make the generated adversarial point clouds unrealistic, and on the other hand, are difficult to be realized on real 3D objectives. In contrast, point missing, which results in fragmented point clouds, is a more common phenomenon in practice due to view limitations of 3D data acquisition devices. Moreover, maliciously-designed point-dropping scheme is more practically realizable,e.g., by shading, damaging, or adjusting the poses of 3D objectives. Therefore, point-missing, especially when being maliciously-manipulated, appears to be a more serious real security concern, which merits detailed studies. Actually, point missing/dropping was initially studied in [Qi et al.2017a, Qi et al.2017b] by dropping either the furthest or random points. However, the study is not from the perspective of an adversary, thus those schemes do not lead to model attacks.
In this paper, we propose a simple yet principled way of dropping points from a point cloud to generate the adversarial point cloud. We study impacts of these adversarial point clouds on several DNN-based classification models. Interestingly, in contrary to what was claimed for PointNet***When there are points missing, the accuracy of PointNet only drops by and w.r.t. furthest and random input sampling. [Qi et al.2017a], we show these DNN models (including PointNet) are actually vulnerable to “malicious point dropping”. To achieve malicious point-dropping, we propose to learn a saliency map to quantify the significance of each point in a point cloud, which then serves as a guidance to our malicious point-dropping process. In other words, the saliency map assigns a saliency score for each point, reflecting the contribution of a point to the corresponding model-prediction loss. Based on the learned saliency-score map, adversarial point clouds are generated by simply dropping points with the highest saliency scores. As a byproduct, our saliency map can also be used to generate non-adversarial point clouds (defined in Section 2.1) by dropping points with the lowest scores, which, surprisingly, even leads to better recognition performance than original point clouds.
Despite simplicity in concept, how to learn a saliency map for adversarial-point-cloud generation is nontrivial. One possible solution is based on the critical-subset theory proposed in [Qi et al.2017a]. However, the theory in [Qi et al.2017a] is only applicable to PointNet. And even for PointNet, since the theory does not specify the contribution of each single point to the classification loss, it is difficult to decide which point/points to be dropped. Empirically, we have tried several point-dropping strategies using the critical-subset, and found that even the best strategy performs worse than our method. Furthermore, the simple brute-force method by trying all possible combinations of points and recomputing the losses is impractical because the computational complexity scales exponentially w.r.t. the number of points in a point cloud. Instead, we propose an efficient and effective method to learn approximate saliency maps with a single backward step through the DNN models. The basic idea is to approximate point dropping with a continuous point-shifting procedure, i.e., moving points towards the point-cloud center. In this way, prediction-loss changes can be approximated by the gradient of the loss w.r.t. the point under a spherical coordinate system. Thus, every point in a point cloud is associated with a score proportional to the gradient of loss w.r.t. the point. Malicious point-dropping is simply done by dropping the points associated with the highest scores.
Our saliency-map-driven point-dropping algorithm is compared with the random point-dropping baseline on several state-of-the-art point-cloud DNN models, including PointNet, PointNet++, and DGCNN. We show that our method can always outperform random point dropping, especially in the case of generating adversarial point clouds. As an example, we show that dropping points from each point cloud by our algorithm can reduce the accuracy of PointNet on 3D-MNIST/ModelNet40 to / (adversarial attack), while the random-dropping scheme only reduce the accuracy to /, close to original accuracies. Besides, the best critical-subset-based strategy (only applicable to PointNet) only reduces the accuracies to /.
2.1 Definition and Notations
A point cloud is represented as , where is a 3D point and is the number of points in the point cloud; is the ground-truth label, where is the number of classes. We denote the output of a point-cloud-based classification network as , whose input is a point cloud. The classification loss of the network is denoted as , which is usually defined as the cross-entropy between and . A fragmented point cloud, denoted as , is a subset of generated by dropping part of points from . is called an adversarial point cloud if . Otherwise, is considered non-adversarial if .
We define the contribution of a point/points in a point cloud as the difference between the prediction-losses of two point clouds including or excluding the point/points, respectively. Formally, given a point in , the contribution of is defined as , where . If this value is positive (or large), we consider the contribution of to model prediction as positive (or large). Because in this case, if is added back to , the loss will be reduced, leading to more accurate classification. Otherwise, we consider the contribution of to be negative (or small).
Point-Cloud Saliency Map
Point-cloud saliency map assigns each point a saliency score, i.e., , to reflect the contribution of . Formally, the map can be denoted as a function with input and outputting a vector of length , i.e., . We expect higher (positive) to indicate more (positive) contribution of so that adversarial point clouds can be generated by dropping points with higher scores in the map.
2.2 3D Point-Cloud Recognition Models
There are three mainstream approaches for 3D object recognition: volume-based [Wu et al.2015, Maturana and Scherer2015], multi-view-based [Su et al.2015, Wang, Pelillo, and Siddiqi2017, Yu, Meng, and Yuan2018, Kanezaki, Matsushita, and Nishida2018], and point-cloud-based [Qi et al.2017a, Qi et al.2017b, Wang et al.2018] approaches, which rely on voxel, multi-view-image, and point-cloud representations of 3D objects, respectively. In this work, we focus on point-cloud-based models.
PointNet and PointNet++
PointNet [Qi et al.2017a] applies a composition of single variable-functions, a max pooling layer, and a function of the max pooled features, which is invariant to point orders, to approximate the functions for point-cloud classification and segmentation. Formally, the composition can be denoted as , with a single-variable function,
the max-pooling layer, anda function of the max pooled features (i.e., ). PointNet plays a significant role during the recent development of point-cloud high-level processing, serving as a baseline for many DNN models afterwards. PointNet++ [Qi et al.2017b] is one of those extensions, which applies PointNet recursively on a nested partitioning of the input point set, to capture hierarchical structures induced by the metric space where points live in. Compared to PointNet, PointNet++ is able to learn and make use of hierarchical features w.r.t. the Euclidean distance metric, and thus typically achieves better performance.
Dynamic Graph Convolutional Neural Network (DGCNN)
DGCNN [Wang et al.2018] integrates a novel operation into PointNet, namely EdgeConv, to capture local geometric structures while maintaining network invariance to point-permutation. Specifically, the operation EdgeConv generates features that can describe the neighboring relationships by constructing a local neighborhood graph and applying convolutional-like operations on edges connecting neighboring pairs of points. EdgeConv helps DGCNN achieve further performance improvement, usually surpassing PointNet and PointNet++.
2.3 Adversarial Attack
One major application of our proposed saliency map is to generate adversarial point clouds. In this section, we briefly introduce current research frontier on standard adversarial attacks with 2D adversarial images, and clarify the differences between adversarial images and point clouds. Besides, the critical-point-subset idea and the point-dropping strategy based on this idea are also briefly introduced.
Most of existing works in the scope of adversarial attacks focus on image classification, where an adversarial image is defined as an image close to the original image in terms of certain distance metric, but leading to misclassification in the original model. Most successful adversarial attacks on images, e.g., FGSM [Szegedy et al.2013], PGD [Kurakin, Goodfellow, and Bengio2016], C&W attack [Carlini and Wagner2017], DeepFool [Moosavi-Dezfooli, Fawzi, and Frossard2016], DAA [Zheng, Chen, and Ren2018] and JSMA [Papernot et al.2016], are designed based on gradients of the network output/loss w.r.t. the input. For instance, the PGD attack iteratively updates the input pixel values along the directions of the gradients of the model loss to generate adversarial images, and exploits a clip function to constrain the adversarial images in the neighbor of the original images. PGD and its variants have achieved several SOTA results as first-order attacks according to many recent studies [Carlini et al.2017, Dong et al.2017, Athalye, Carlini, and Wagner2018, Athalye and Carlini2018, Cai et al.2018, Zheng, Chen, and Ren2018]
Adversarial Point Clouds
Although 2D adversarial images have already been studied for a few years, adversarial attack on irregular point clouds has not become aware until recently [Xiang, Qi, and Li2018]. The main difference between adversarial learning on images and point clouds is that apart from shifting points like modifying pixels in traditional adversarial attacks, adding new points/dropping existing points are distinctive ways of generating adversarial point clouds. [Xiang, Qi, and Li2018] mainly focuses on generating adversarial point clouds by shifting existing or adding new points. It solves the problem of shifting existing points by gradient-based optimization on the CW loss [Carlini and Wagner2017], which can be formulated as
where is an adversarial loss indicating the possibility of an successful attack, and is certain distance metric (e.g., Chamfer distance [Fan, Su, and Guibas2017]) to measure the difference between the original and the adversarial point cloud. [Xiang, Qi, and Li2018] solves the problem of adding new points by initializing a number of new points to the same coordinates of existing points and also optimizing them over Eq. 1. It is experimentally shown that state-of-the-art point-cloud models are vulnerable to adversarial point clouds crafted by the above two methods. However, we argue that adversarial attacks via point dropping is a more common way for attacks in practice due to the physical 3D point-cloud generation process†††Think of attacks by occluding parts of an object (dropping points) when generating 3D point clouds.. As a result, we focus on generating adversarial clouds by dropping certain existing points in this paper.
For any point cloud , [Qi et al.2017a] proves that there exists a subset , namely critical subset, which determines all the max pooled features (i.e., ), and thus the output of PointNet (i.e., ), which is only applicable to network structures similar to . Visually, usually distributes evenly along the skeleton of . In this sense, for PointNet, it seems dropping points in can also generate adversarial point clouds. Empirically, we have tested several point-dropping strategies based on , and found that the best strategy is to iteratively drop the points, which determines the most number of max pooled features. As shown in experiments, even this strategy performs worse than our method. We refer the interested readers to detailed theory and the strategy in the supplementary material
3 Point-Cloud Saliency Map
In this section, in terms of point-cloud classification, we derive our proposed point-dropping approach from the approximately equivalent procedure of shifting points to the spherical core (center) of a point cloud. Through this way, the nondifferentiable point-dropping operation can be approximated by differentiable point-shifting operations, based on which a saliency map is constructed.
3.1 From Point Dropping to Point Shifting
Our idea is illustrated in Fig. 2. The intuition is that surface points of a point cloud are supposed to determine the classification result, because surface points encode shape information of objects, while the points near the point center‡‡‡Median value of x, y, z coordinates almost have no effect on the classification performance. Consequently, dropping a point is approximately equivalent to shifting the point towards the center in terms of eliminating the effect of the point on the classification result. To verify our hypothesis, we conduct a proof-of-concept experiment: thousands of pairs of point clouds are generated by dropping points and shifting those points to the point cloud center respectively. Here we totally used three schemes to select those points, including furtherest point-dropping, random point-dropping, and point-dropping based on our saliency map. We use PointNet for classification of both of the point clouds in every pair. For all those selection schemes, the classification results achieve more than pairwise consistency§§§For more than pairs, the classification results of the two point clouds in each pair are the same (may be correct or wrong), indicating applicability of our approach.
3.2 Gradient-based Saliency Map
Based on ideas of adversarial-sample generation and the intuition in 3.1, we approximate the contribution of a point by the gradient of loss under the point-shifting operation. Note measuring gradients in the original coordinate system is problematic because points are not view (angle) invariant. In order to overcome this issue, we consider point shifting in the Spherical Coordinate System, where a point is represented as with distance of a point to the spherical core, and the two angles of a point relative to the spherical core. Under this spherical coordinate system, as shown in Fig. 2, shifting a point towards the center by will increase the loss by . Then based on the equivalence we established in section 3.1, we measure the contribution of a point by a real-valued score – negative gradient of the loss w.r.t. , i.e., . To calculate for certain point cloud, we use the medians of the axis values of all the points in the point cloud as the spherical core, denoted as , to build the spherical coordinate system for outlier-robustness [Böhm, Faloutsos, and Plant2008]. Formally, can be expressed as
where represent the axis values of point corresponding the orthogonal coordinates . Consequently, can be computed by the gradients under the original orthogonal coordinates as:
where . In practice, we apply a change-of-variable by () to allow more flexibility in saliency-map construction, where is used to rescale the point clouds. The gradient of w.r.t. can be calculated by
Define / as a differential step size along /. Since , shifting a point by (i.e., towards the center ) is equivalent to shifting the point by if ignoring the positive factor . Therefore, under the framework of , we approximate the loss change by , which is proportional to . Thus in the rescaled coordinates, we measure the contribution of a point by , i.e., . Since is a constant, we simply employ
as the saliency score of in our saliency map. Note the additional parameter gives us extra flexibility for saliency-map construction, and optimal choice of would be problem specific. In the following experiments of generating adversarial/non-adversarial point clouds, we simply set to 1, which already achieves remarkable performance. For better understanding of our saliency maps, several maps are visualized in Fig. 3. In the following, we specify two applications of our proposed saliency map: adversarial and non-adversarial point cloud generations.
Adversarial point clouds generation
Based on the saliency map, adversarial point-cloud generation is achieved by simply dropping points with highest scores so that the classification-loss significantly increases (i.e., dropping increases the loss by a value approximately proportional to ). The increased model loss will lead to misclassification on the fragmented clouds. This is consistent with the definition of standard adversarial attacks.
Non-adversarial point clouds generation
As a byproduct, our saliency map can also be applied to generate non-adversarial point clouds. This corresponds to the opposite of adversarial point clouds, which is achieved by dropping the points with lowest scores. In contrast to adversarial point clouds, when the scores of the dropped points are negative, it corresponds to decreasing the loss, potentially leading to improved model performance.
Based on the above description, saliency maps are readily constructed by calculating gradients following (5), which guide our point-dropping processes (algorithms). Algorithm 1 describes our basic algorithm for point dropping. Note Algorithm 1 calculates saliency scores at once, which might be suboptimal because point dependencies have been ignored. To alleviate this issue, an iterative version of Algorithm 1 is proposed in Algorithm 2. The idea is to drop points iteratively such that point dependencies in the remaining point set will be considered when calculating saliency scores for the next iteration. Specifically, in each iteration, a new saliency map is constructed for the remaining points, and among them points are dropped based on the current saliency map. In section 4.3, we set for adversarial point-cloud generation and show that this setting is good enough in terms of improving the performance with reasonable computational cost.
We verify our approach by applying it to several datasets for adversarial and non-adversarial point-cloud generations.
4.1 Datasets and Models
We use the two public datasets, 3D MNIST¶¶¶https://www.kaggle.com/daavoo/3d-mnist/version/13 and ModelNet40∥∥∥http://modelnet.cs.princeton.edu/ [Wu et al.2015], to test our saliency map and point-dropping algorithms. 3D MNIST contains raw 3D point clouds generated from 2D MNIST images, among which are used for training and for testing. Each raw point cloud contains about 3D points. To enrich the dataset, we randomly select points from each raw point cloud for 10 times to create 10 point clouds, making a training set of size and a testing set of size , with each point cloud consisting of 1024 points. ModelNet40 contains 12,311 meshed CAD models of 40 categories, where 9,843 models are used for training and 2,468 models are for testing. We use the same point-cloud data provided by [Qi et al.2017a], which are sampled from the surfaces of those CAD models. Finally, our approach is evaluated on state-of-the-art point cloud models introduced in section 2.2, i.e., PointNet, PointNet++ and DGCNN.
4.2 Implementation Details
Our implementation is based on the models and code provided by [Qi et al.2017a, Qi et al.2017b, Wang et al.2018]******https://github.com/charlesq34/pointnet; https://github.com/charlesq34/pointnet2; https://github.com/WangYueFt/dgcnn
. Default settings are used to train these models. To enable dynamic point-number input along the second dimension of the batch-input tensor, for all the three models, we substitute several Tensorflow ops with equivalent ops that support dynamic inputs. We also rewrite a dynamic batch-gather ops and its gradient ops for DGCNN by C++ and Cuda.For simplicity, we set the number of votes ††††††Aggregate classification scores from multiple rotations as 1. In all of the following cases, approximately accuracy improvement can be obtained by more votes, e.g., 12 votes. Besides, incorporation of additional features like face normals will further improve the accuracy by nearly . We did not consider these tricks in our experiments for simplicity.
4.3 Empirical Results
To show the effectiveness of our saliency map as a guidance to point dropping, we compare our approach with the random point-dropping baseline [Qi et al.2017a], denoted as rand-drop, and the critical-subset-based strategy introduced in Section 2.3, denoted as critical (only applicable to PointNet). For simplicity, we refer to dropping points with the lowest scores to generate non-adversarial clouds as non-adversarial, and dropping points with highest scores to generate adversarial clouds as adversarial in the followings. In order to achieve better performance, as explained in section 3.3, we use Algorithm 1 to generate non-adversarial clouds; while for the iterative version in Algorithm 2, we set to generate adversarial clouds.
Results on PointNet
The performance of PointNet on 3D-MNIST test set is shown in Fig. 4. The overall accuracy of PointNet maintains under rand-drop while varying the number of dropped points between 0 to 200. In contrast, the adversarial point clouds generated by our point-dropping algorithm reduce PointNet’s overall accuracy to . Furthermore, it is interesting to see by dropping points with negative scores, the accuracy even increases compared to using original point clouds by nearly . This is consistent for other models and datasets as shown below. For ModelNet40, as shown in Fig. 4, the overall accuracy of PointNet maintains *** in [Qi et al.2017a] can be acquired by setting the number of votes as . We set the number of votes to for simplicity. The discrepancy between the accuracies under these two setting is always less than . under rand-drop. However, our point-dropping algorithm can increase/reduce the accuracy to /.
Results on PointNet++
The results for PointNet++ are shown in Fig. 5, which maintains on 3D-MNIST under rand-drop, while our point-dropping algorithm can increase/reduce the accuracy to /. On the ModelNet40 test set, PointNet++ maintains ††† in [Qi et al.2017b] can be achieved by incorporating face normals as additional features and setting the number of votes as overall accuracy under rand-drop, while our algorithm can increase/reduce the accuracy to /.
Results on DGCNN
The accuracies of DGCNN on 3D-MNIST and ModelNet40 test sets are shown in Fig. 6, respectively. Similarly, under rand-drop, DGCNN maintains and accuracies respectively. Given the same conditions, our algorithm is able to increase/reduce the accuracies to / and / respectively.
Several adversarial point clouds are visualized in Fig. 8. For the point clouds shown in those figures, our iterative algorithm successfully identifies important segments that distinguish them from other clouds, e.g., the base of the lamp, and fools the DNN model by dropping those segments. It is worth pointing out that human still seems to be able to recognize most of those fragmented point clouds, probably due to the ability of human imagination. On the contrary, as shown in Fig. 9, non-adversarial point cloud generation is visually similar to a denoising process, i.e., dropping noisy/useless points scattered throughout point clouds. Although the DNN model misclassifies the original point clouds in some cases, dropping those noisy points could correct the model predictions.
We employ PointNet on ModelNet40 to study the impacts of the scaling factor , the number of dropped points , and the number of iterations to model performance. As shown in Fig. 7, is a good setting for Algorithm 2 since as increases, the number of adversarial clouds generated by our algorithm will slightly decrease. Besides, it is clear in Fig. 7 (middle) that our algorithm significantly outperforms rand-drop in terms of generating adversarial clouds: the accuracy of PointNet still maintains over under rand-drop with points dropped, while Algorithm 2 reduces the accuracy to nearly . In Fig. 7 (right), we show that Algorithm 2 generates more adversarial point clouds than Algorithm 1. When it comes to non-adversarial point-cloud generation, Algorithm 2 still slightly outperforms Algorithm 1 but with more expensive computational cost. Therefore, we recommend Algorithm 2 for adversarial point-cloud generation, and Algorithm 1 fo non-adversarial point-cloud generation.
Among all the three state-of-the-art DNN models for 3D point clouds, DGCNN appears to be the most robust model to adversarial point clouds generated by our proposed algorithm. We conjecture the robustness comes from its structures designed to capture more local information, which is supposed to compensate for the information loss by dropping a single point. On the contrary, PointNet does not capture local structures [Qi et al.2017b], making it the most vulnerable model to adversarial fragmented point clouds.
In this paper, a saliency-map learning method for 3D point clouds is proposed to measure the contribution (importance) of each point in a point cloud to the model prediction loss. By approximating point dropping with a continuous point-shifting procedure, we show that the contribution of a point is approximately proportional to, and thus can be scored by, the gradient of loss w.r.t. the point under a scaled spherical-coordinate system. Using this saliency map, we further standardize the point-dropping process to generate adversarial/non-adversarial point clouds by dropping points associated with the highest/lowest scores. Extensive evaluations show that our saliency-map-driven point-dropping algorithm consistently outperforms other schemes such as the random point-dropping scheme, revealing the vulnerabilities of state-of-the-art DNNs to adversarial point clouds generated by malicious point dropping, i.e., a more realizable adversarial attack in practice.
- [Athalye and Carlini2018] Athalye, A., and Carlini, N. 2018. On the robustness of the cvpr 2018 white-box adversarial example defenses. arXiv preprint arXiv:1804.03286.
- [Athalye, Carlini, and Wagner2018] Athalye, A.; Carlini, N.; and Wagner, D. 2018. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420.
- [Böhm, Faloutsos, and Plant2008] Böhm, C.; Faloutsos, C.; and Plant, C. 2008. Outlier-robust clustering using independent components. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 185–198. ACM.
- [Cai et al.2018] Cai, Q.-Z.; Du, M.; Liu, C.; and Song, D. 2018. Curriculum adversarial training. arXiv preprint arXiv:1805.04807.
- [Carlini and Wagner2017] Carlini, N., and Wagner, D. 2017. Towards evaluating the robustness of neural networks. In Security and Privacy (SP), 2017 IEEE Symposium on, 39–57. IEEE.
- [Carlini et al.2017] Carlini, N.; Katz, G.; Barrett, C.; and Dill, D. L. 2017. Ground-truth adversarial examples. arXiv preprint arXiv:1709.10207.
- [Dong et al.2017] Dong, Y.; Liao, F.; Pang, T.; Su, H.; Hu, X.; Li, J.; and Zhu, J. 2017. Boosting adversarial attacks with momentum. arXiv preprint arXiv:1710.06081.
- [Fan, Su, and Guibas2017] Fan, H.; Su, H.; and Guibas, L. J. 2017. A point set generation network for 3d object reconstruction from a single image. In CVPR, volume 2, 6.
[Kanezaki, Matsushita, and
Kanezaki, A.; Matsushita, Y.; and Nishida, Y.
Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints.In
Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
- [Kurakin, Goodfellow, and Bengio2016] Kurakin, A.; Goodfellow, I.; and Bengio, S. 2016. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236.
Maturana, D., and Scherer, S.
Voxnet: A 3d convolutional neural network for real-time object recognition.In Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on, 922–928. IEEE.
- [Moosavi-Dezfooli, Fawzi, and Frossard2016] Moosavi-Dezfooli, S.-M.; Fawzi, A.; and Frossard, P. 2016. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2574–2582.
[Papernot et al.2016]
Papernot, N.; McDaniel, P.; Jha, S.; Fredrikson, M.; Celik, Z. B.; and Swami,
The limitations of deep learning in adversarial settings.In Security and Privacy (EuroS&P), 2016 IEEE European Symposium on, 372–387. IEEE.
- [Qi et al.2017a] Qi, C. R.; Su, H.; Mo, K.; and Guibas, L. J. 2017a. Pointnet: Deep learning on point sets for 3d classification and segmentation. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE 1(2):4.
- [Qi et al.2017b] Qi, C. R.; Yi, L.; Su, H.; and Guibas, L. J. 2017b. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems, 5099–5108.
- [Su et al.2015] Su, H.; Maji, S.; Kalogerakis, E.; and Learned-Miller, E. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision, 945–953.
- [Szegedy et al.2013] Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; and Fergus, R. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.
- [Wang et al.2018] Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S. E.; Bronstein, M. M.; and Solomon, J. M. 2018. Dynamic graph cnn for learning on point clouds. arXiv preprint arXiv:1801.07829.
- [Wang, Pelillo, and Siddiqi2017] Wang, C.; Pelillo, M.; and Siddiqi, K. 2017. Dominant set clustering and pooling for multi-view 3d object recognition. In Proceedings of British Machine Vision Conference (BMVC), volume 12.
- [Wu et al.2015] Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; and Xiao, J. 2015. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1912–1920.
- [Xiang, Qi, and Li2018] Xiang, C.; Qi, C. R.; and Li, B. 2018. Generating 3d adversarial point clouds. arXiv preprint arXiv:1809.07016.
- [Yu, Meng, and Yuan2018] Yu, T.; Meng, J.; and Yuan, J. 2018. Multi-view harmonized bilinear network for 3d object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 186–194.
- [Zheng, Chen, and Ren2018] Zheng, T.; Chen, C.; and Ren, K. 2018. Distributionally adversarial attack. arXiv preprint arXiv:1808.05537.
Appendix A Critical-Subset Theory
We reexplain the critical-subset theory [Qi et al.2017a] for interested readers. Here is used to represent the max pooled features in PointNet, i.e., . (i.e., a special maxpooling layer) is a vector max operator that takes n vectors as input and returns a new vector of the element-wise maximum. A PointNet network can be expressed as , where is a continuous function. Apparently, is determined by . For the dimension of , there exists at least one such that , where is the dimension of . Aggregate all those into a subset such that will determine , and thus . [Qi et al.2017a] named as critical subset. As we can see, this theory is applicable to PointNet, where a max-pooled feature is simply determined by one point, but not to networks with more complicated structures.
Appendix B Critical-Subset-based Point Dropping
We tested several point-dropping strategies based on the critical-subset theory, e.g., randomly dropping points from the critical-subset one-time/iteratively and dropping the points that contribute to the most number of max pooled features one-time/iteratively. Among all those schemes, dropping the points with contribution to the most number of max pooled features (at least two features) iteratively provides the best performance. The strategy is illustrated in Algorithm 3.
Appendix C More Visualization Results
In the body, several adversarial point clouds generated by dropping less than points are visualized. Here we will show more adversarial point clouds generated by dropping points‡‡‡When the dropped points increase beyond , our saliency-map-based point-dropping scheme can generate adversarial point clouds for almost all the data in both 3D-MNIST and ModelNet40 testing datasets in Figure 18.