Learning Saliency Maps for Adversarial Point-Cloud Generation

11/28/2018 ∙ by Tianhang Zheng, et al. ∙ 0

3D point-cloud recognition with deep neural network (DNN) has received remarkable progress on obtaining both high-accuracy recognition and robustness to random point missing (or dropping). However, the robustness of DNNs to maliciously-manipulated point missing is still unclear. In this paper, we show that point-missing can be a critical security concern by proposing a malicious point-dropping method to generate adversarial point clouds to fool DNNs. Our method is based on learning a saliency map for a whole point cloud, which assigns each point a score reflecting its contribution to the model-recognition loss, i.e., the difference between the losses with and without the specific point respectively. The saliency map is learnt by approximating the nondifferentiable point-dropping process with a differentiable procedure of shifting points towards the cloud center. In this way, the loss difference, i.e., the saliency score for each point in the map, can be measured by the corresponding gradient of the loss w.r.t the point under the spherical coordinates. Based on the learned saliency map, maliciously point-dropping attack can be achieved by dropping points with the highest scores, leading to significant increase of model loss and thus inferior classification performance. Extensive evaluations on several state-of-the-art point-cloud recognition models, including PointNet, PointNet++ and DGCNN, demonstrate the efficacy and generality of our proposed saliency-map-based point-dropping scheme. Code for experiments is released on <https://github.com/tianzheng4/Learning-PointCloud-Saliency-Maps>.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Point clouds, which comprise raw outputs of many 3D data acquisition devices such as radars and sonars, are an important 3D data representation for computer-vision applications. Consequently, real applications such as object classification and segmentation usually require high-level processing of 3D point clouds. Recent research has proposed to employ Deep Neural Network (DNN) for high-accuracy and high-level processing of point clouds, achieving remarkable success. Representative DNN models for point-cloud data classification include PointNet

[Qi et al.2017a], PointNet++ [Qi et al.2017b] and DGCNN [Wang et al.2018], which successfully handled the irregularity of point clouds, achieving high classification accuracies and robustness to random point-missing/dropping.

Figure 1: Automatically drop points with the highest saliency scores (red points) from a bench point cloud can change the prediction outcome.

Despite the success of these DNN-based models, recent study has also shown vulnerability of these models to adversarial point clouds generated by point shifting/addition [Xiang, Qi, and Li2018], leading to model misclassification and thus misbehaviors of systems equipped with those models. It is worth noting that adversarial attacks considered in [Xiang, Qi, and Li2018]

only refer to point shifting and addition, which we argue is not a natural way to generate adversarial point clouds in real-world situations. This is because shifted/added points may contain abnormal outliers, which, on the one hand, will make the generated adversarial point clouds unrealistic, and on the other hand, are difficult to be realized on real 3D objectives. In contrast, point missing, which results in fragmented point clouds, is a more common phenomenon in practice due to view limitations of 3D data acquisition devices. Moreover, maliciously-designed point-dropping scheme is more practically realizable,

e.g., by shading, damaging, or adjusting the poses of 3D objectives. Therefore, point-missing, especially when being maliciously-manipulated, appears to be a more serious real security concern, which merits detailed studies. Actually, point missing/dropping was initially studied in [Qi et al.2017a, Qi et al.2017b] by dropping either the furthest or random points. However, the study is not from the perspective of an adversary, thus those schemes do not lead to model attacks.

In this paper, we propose a simple yet principled way of dropping points from a point cloud to generate the adversarial point cloud. We study impacts of these adversarial point clouds on several DNN-based classification models. Interestingly, in contrary to what was claimed for PointNet***When there are points missing, the accuracy of PointNet only drops by and w.r.t. furthest and random input sampling. [Qi et al.2017a], we show these DNN models (including PointNet) are actually vulnerable to “malicious point dropping”. To achieve malicious point-dropping, we propose to learn a saliency map to quantify the significance of each point in a point cloud, which then serves as a guidance to our malicious point-dropping process. In other words, the saliency map assigns a saliency score for each point, reflecting the contribution of a point to the corresponding model-prediction loss. Based on the learned saliency-score map, adversarial point clouds are generated by simply dropping points with the highest saliency scores. As a byproduct, our saliency map can also be used to generate non-adversarial point clouds (defined in Section 2.1) by dropping points with the lowest scores, which, surprisingly, even leads to better recognition performance than original point clouds.

Despite simplicity in concept, how to learn a saliency map for adversarial-point-cloud generation is nontrivial. One possible solution is based on the critical-subset theory proposed in [Qi et al.2017a]. However, the theory in [Qi et al.2017a] is only applicable to PointNet. And even for PointNet, since the theory does not specify the contribution of each single point to the classification loss, it is difficult to decide which point/points to be dropped. Empirically, we have tried several point-dropping strategies using the critical-subset, and found that even the best strategy performs worse than our method. Furthermore, the simple brute-force method by trying all possible combinations of points and recomputing the losses is impractical because the computational complexity scales exponentially w.r.t. the number of points in a point cloud. Instead, we propose an efficient and effective method to learn approximate saliency maps with a single backward step through the DNN models. The basic idea is to approximate point dropping with a continuous point-shifting procedure, i.e., moving points towards the point-cloud center. In this way, prediction-loss changes can be approximated by the gradient of the loss w.r.t. the point under a spherical coordinate system. Thus, every point in a point cloud is associated with a score proportional to the gradient of loss w.r.t. the point. Malicious point-dropping is simply done by dropping the points associated with the highest scores.

Our saliency-map-driven point-dropping algorithm is compared with the random point-dropping baseline on several state-of-the-art point-cloud DNN models, including PointNet, PointNet++, and DGCNN. We show that our method can always outperform random point dropping, especially in the case of generating adversarial point clouds. As an example, we show that dropping points from each point cloud by our algorithm can reduce the accuracy of PointNet on 3D-MNIST/ModelNet40 to / (adversarial attack), while the random-dropping scheme only reduce the accuracy to /, close to original accuracies. Besides, the best critical-subset-based strategy (only applicable to PointNet) only reduces the accuracies to /.

2 Preliminaries

2.1 Definition and Notations

Point Cloud

A point cloud is represented as , where is a 3D point and is the number of points in the point cloud; is the ground-truth label, where is the number of classes. We denote the output of a point-cloud-based classification network as , whose input is a point cloud

and output a probability vector

. The classification loss of the network is denoted as , which is usually defined as the cross-entropy between and . A fragmented point cloud, denoted as , is a subset of generated by dropping part of points from . is called an adversarial point cloud if . Otherwise, is considered non-adversarial if .

Point Contribution

We define the contribution of a point/points in a point cloud as the difference between the prediction-losses of two point clouds including or excluding the point/points, respectively. Formally, given a point in , the contribution of is defined as , where . If this value is positive (or large), we consider the contribution of to model prediction as positive (or large). Because in this case, if is added back to , the loss will be reduced, leading to more accurate classification. Otherwise, we consider the contribution of to be negative (or small).

Point-Cloud Saliency Map

Point-cloud saliency map assigns each point a saliency score, i.e., , to reflect the contribution of . Formally, the map can be denoted as a function with input and outputting a vector of length , i.e., . We expect higher (positive) to indicate more (positive) contribution of so that adversarial point clouds can be generated by dropping points with higher scores in the map.

2.2 3D Point-Cloud Recognition Models

There are three mainstream approaches for 3D object recognition: volume-based [Wu et al.2015, Maturana and Scherer2015], multi-view-based [Su et al.2015, Wang, Pelillo, and Siddiqi2017, Yu, Meng, and Yuan2018, Kanezaki, Matsushita, and Nishida2018], and point-cloud-based [Qi et al.2017a, Qi et al.2017b, Wang et al.2018] approaches, which rely on voxel, multi-view-image, and point-cloud representations of 3D objects, respectively. In this work, we focus on point-cloud-based models.

PointNet and PointNet++

PointNet [Qi et al.2017a] applies a composition of single variable-functions, a max pooling layer, and a function of the max pooled features, which is invariant to point orders, to approximate the functions for point-cloud classification and segmentation. Formally, the composition can be denoted as , with a single-variable function,

the max-pooling layer, and

a function of the max pooled features (i.e., ). PointNet plays a significant role during the recent development of point-cloud high-level processing, serving as a baseline for many DNN models afterwards. PointNet++ [Qi et al.2017b] is one of those extensions, which applies PointNet recursively on a nested partitioning of the input point set, to capture hierarchical structures induced by the metric space where points live in. Compared to PointNet, PointNet++ is able to learn and make use of hierarchical features w.r.t. the Euclidean distance metric, and thus typically achieves better performance.

Dynamic Graph Convolutional Neural Network (DGCNN)

DGCNN [Wang et al.2018] integrates a novel operation into PointNet, namely EdgeConv, to capture local geometric structures while maintaining network invariance to point-permutation. Specifically, the operation EdgeConv generates features that can describe the neighboring relationships by constructing a local neighborhood graph and applying convolutional-like operations on edges connecting neighboring pairs of points. EdgeConv helps DGCNN achieve further performance improvement, usually surpassing PointNet and PointNet++.

2.3 Adversarial Attack

One major application of our proposed saliency map is to generate adversarial point clouds. In this section, we briefly introduce current research frontier on standard adversarial attacks with 2D adversarial images, and clarify the differences between adversarial images and point clouds. Besides, the critical-point-subset idea and the point-dropping strategy based on this idea are also briefly introduced.

Adversarial Images

Most of existing works in the scope of adversarial attacks focus on image classification, where an adversarial image is defined as an image close to the original image in terms of certain distance metric, but leading to misclassification in the original model. Most successful adversarial attacks on images, e.g., FGSM [Szegedy et al.2013], PGD [Kurakin, Goodfellow, and Bengio2016], C&W attack [Carlini and Wagner2017], DeepFool [Moosavi-Dezfooli, Fawzi, and Frossard2016], DAA [Zheng, Chen, and Ren2018] and JSMA [Papernot et al.2016], are designed based on gradients of the network output/loss w.r.t. the input. For instance, the PGD attack iteratively updates the input pixel values along the directions of the gradients of the model loss to generate adversarial images, and exploits a clip function to constrain the adversarial images in the neighbor of the original images. PGD and its variants have achieved several SOTA results as first-order attacks according to many recent studies [Carlini et al.2017, Dong et al.2017, Athalye, Carlini, and Wagner2018, Athalye and Carlini2018, Cai et al.2018, Zheng, Chen, and Ren2018]

Adversarial Point Clouds

Although 2D adversarial images have already been studied for a few years, adversarial attack on irregular point clouds has not become aware until recently [Xiang, Qi, and Li2018]. The main difference between adversarial learning on images and point clouds is that apart from shifting points like modifying pixels in traditional adversarial attacks, adding new points/dropping existing points are distinctive ways of generating adversarial point clouds. [Xiang, Qi, and Li2018] mainly focuses on generating adversarial point clouds by shifting existing or adding new points. It solves the problem of shifting existing points by gradient-based optimization on the CW loss [Carlini and Wagner2017], which can be formulated as

(1)

where is an adversarial loss indicating the possibility of an successful attack, and is certain distance metric (e.g., Chamfer distance [Fan, Su, and Guibas2017]) to measure the difference between the original and the adversarial point cloud. [Xiang, Qi, and Li2018] solves the problem of adding new points by initializing a number of new points to the same coordinates of existing points and also optimizing them over Eq. 1. It is experimentally shown that state-of-the-art point-cloud models are vulnerable to adversarial point clouds crafted by the above two methods. However, we argue that adversarial attacks via point dropping is a more common way for attacks in practice due to the physical 3D point-cloud generation processThink of attacks by occluding parts of an object (dropping points) when generating 3D point clouds.. As a result, we focus on generating adversarial clouds by dropping certain existing points in this paper.

Critical-Subset-based Strategy

For any point cloud , [Qi et al.2017a] proves that there exists a subset , namely critical subset, which determines all the max pooled features (i.e., ), and thus the output of PointNet (i.e., ), which is only applicable to network structures similar to . Visually, usually distributes evenly along the skeleton of . In this sense, for PointNet, it seems dropping points in can also generate adversarial point clouds. Empirically, we have tested several point-dropping strategies based on , and found that the best strategy is to iteratively drop the points, which determines the most number of max pooled features. As shown in experiments, even this strategy performs worse than our method. We refer the interested readers to detailed theory and the strategy in the supplementary material

3 Point-Cloud Saliency Map

In this section, in terms of point-cloud classification, we derive our proposed point-dropping approach from the approximately equivalent procedure of shifting points to the spherical core (center) of a point cloud. Through this way, the nondifferentiable point-dropping operation can be approximated by differentiable point-shifting operations, based on which a saliency map is constructed.

3.1 From Point Dropping to Point Shifting

Figure 2: Approximate point dropping with point shifting toward the point-cloud center.

Our idea is illustrated in Fig. 2. The intuition is that surface points of a point cloud are supposed to determine the classification result, because surface points encode shape information of objects, while the points near the point centerMedian value of x, y, z coordinates almost have no effect on the classification performance. Consequently, dropping a point is approximately equivalent to shifting the point towards the center in terms of eliminating the effect of the point on the classification result. To verify our hypothesis, we conduct a proof-of-concept experiment: thousands of pairs of point clouds are generated by dropping points and shifting those points to the point cloud center respectively. Here we totally used three schemes to select those points, including furtherest point-dropping, random point-dropping, and point-dropping based on our saliency map. We use PointNet for classification of both of the point clouds in every pair. For all those selection schemes, the classification results achieve more than pairwise consistency§§§For more than pairs, the classification results of the two point clouds in each pair are the same (may be correct or wrong), indicating applicability of our approach.

3.2 Gradient-based Saliency Map

Based on ideas of adversarial-sample generation and the intuition in 3.1, we approximate the contribution of a point by the gradient of loss under the point-shifting operation. Note measuring gradients in the original coordinate system is problematic because points are not view (angle) invariant. In order to overcome this issue, we consider point shifting in the Spherical Coordinate System, where a point is represented as with distance of a point to the spherical core, and the two angles of a point relative to the spherical core. Under this spherical coordinate system, as shown in Fig. 2, shifting a point towards the center by will increase the loss by . Then based on the equivalence we established in section 3.1, we measure the contribution of a point by a real-valued score – negative gradient of the loss w.r.t. , i.e., . To calculate for certain point cloud, we use the medians of the axis values of all the points in the point cloud as the spherical core, denoted as , to build the spherical coordinate system for outlier-robustness [Böhm, Faloutsos, and Plant2008]. Formally, can be expressed as

(2)

where represent the axis values of point corresponding the orthogonal coordinates . Consequently, can be computed by the gradients under the original orthogonal coordinates as:

(3)

where . In practice, we apply a change-of-variable by () to allow more flexibility in saliency-map construction, where is used to rescale the point clouds. The gradient of w.r.t. can be calculated by

(4)

Define / as a differential step size along /. Since , shifting a point by (i.e., towards the center ) is equivalent to shifting the point by if ignoring the positive factor . Therefore, under the framework of , we approximate the loss change by , which is proportional to . Thus in the rescaled coordinates, we measure the contribution of a point by , i.e., . Since is a constant, we simply employ

(5)

as the saliency score of in our saliency map. Note the additional parameter gives us extra flexibility for saliency-map construction, and optimal choice of would be problem specific. In the following experiments of generating adversarial/non-adversarial point clouds, we simply set to 1, which already achieves remarkable performance. For better understanding of our saliency maps, several maps are visualized in Fig. 3. In the following, we specify two applications of our proposed saliency map: adversarial and non-adversarial point cloud generations.

Figure 3: Visualize several saliency maps of digits and objectives (one-step): coloring points by their score-rankings.

Adversarial point clouds generation

Based on the saliency map, adversarial point-cloud generation is achieved by simply dropping points with highest scores so that the classification-loss significantly increases (i.e., dropping increases the loss by a value approximately proportional to ). The increased model loss will lead to misclassification on the fragmented clouds. This is consistent with the definition of standard adversarial attacks.

Non-adversarial point clouds generation

As a byproduct, our saliency map can also be applied to generate non-adversarial point clouds. This corresponds to the opposite of adversarial point clouds, which is achieved by dropping the points with lowest scores. In contrast to adversarial point clouds, when the scores of the dropped points are negative, it corresponds to decreasing the loss, potentially leading to improved model performance.

0:

  A point-cloud-based model with loss function

with 3D point cloud input , label , and model weights ; hyper-parameter ; total number of points to drop .
  Compute the gradient under orthogonal coordinates
  Compute the cloud center by
  Compute (inner product)
  Construct the saliency map by
  if Generate non-adversarial clouds then
     Drop the points with lowest from
  else if Generate adversarial clouds then
     Drop the points with highest from
  end if
  Output
Algorithm 1 Drop points based on saliency map

3.3 Algorithms

Based on the above description, saliency maps are readily constructed by calculating gradients following (5), which guide our point-dropping processes (algorithms). Algorithm 1 describes our basic algorithm for point dropping. Note Algorithm 1 calculates saliency scores at once, which might be suboptimal because point dependencies have been ignored. To alleviate this issue, an iterative version of Algorithm 1 is proposed in Algorithm 2. The idea is to drop points iteratively such that point dependencies in the remaining point set will be considered when calculating saliency scores for the next iteration. Specifically, in each iteration, a new saliency map is constructed for the remaining points, and among them points are dropped based on the current saliency map. In section 4.3, we set for adversarial point-cloud generation and show that this setting is good enough in terms of improving the performance with reasonable computational cost.

0:  Loss function ; point cloud input , label , and model weights ; hyper-parameter ; total number of points to drop ; number of iterations .
  for  = to  do
     Compute the gradient
     Compute the center by
     Compute (inner product)
     Construct the saliency map by
     if Drop points with negative contribution then
        Drop the points with lowest from
     else if Drop points with positive contribution then
        Drop the points with highest from
     end if
  end for
  Output
Algorithm 2 Iteratively drop points based on dynamic saliency maps
Figure 4: PointNet on 3D-MNIST and ModelNet40 from left to right: averaged loss (3D-MNIST), overall accuracy (3D-MNIST), averaged loss (ModelNet40), overall accuracy (ModelNet40).
Figure 5: PointNet++ on 3D-MNIST and ModelNet40 from left to right: averaged loss (3D-MNIST), overall accuracy (3D-MNIST), averaged loss (ModelNet40), overall accuracy (ModelNet40).
Figure 6: DGCNN on 3D-MNIST and ModelNet40: averaged loss (3D-MNIST), overall accuracy (3D-MNIST), averaged loss (ModelNet40), overall accuracy (ModelNet40).
Figure 7: Impacts of hyper-parameters: scaling factor (left), number of dropped points (middle), number of iterations (right).

4 Experiments

We verify our approach by applying it to several datasets for adversarial and non-adversarial point-cloud generations.

4.1 Datasets and Models

We use the two public datasets, 3D MNISThttps://www.kaggle.com/daavoo/3d-mnist/version/13 and ModelNet40http://modelnet.cs.princeton.edu/ [Wu et al.2015], to test our saliency map and point-dropping algorithms. 3D MNIST contains raw 3D point clouds generated from 2D MNIST images, among which are used for training and for testing. Each raw point cloud contains about 3D points. To enrich the dataset, we randomly select points from each raw point cloud for 10 times to create 10 point clouds, making a training set of size and a testing set of size , with each point cloud consisting of 1024 points. ModelNet40 contains 12,311 meshed CAD models of 40 categories, where 9,843 models are used for training and 2,468 models are for testing. We use the same point-cloud data provided by [Qi et al.2017a], which are sampled from the surfaces of those CAD models. Finally, our approach is evaluated on state-of-the-art point cloud models introduced in section 2.2, i.e., PointNet, PointNet++ and DGCNN.

Figure 8: Adversarial point clouds: original correct prediction (left), dropped points associated with highest scores by Algorithm 2 (middle), wrong prediction after point dropping (right).
Figure 9: Non-adversarial point clouds: original wrong prediction (left), dropped points associated with lowest scores (middle), correct prediction after point dropping (right).

4.2 Implementation Details

Our implementation is based on the models and code provided by [Qi et al.2017a, Qi et al.2017b, Wang et al.2018]******https://github.com/charlesq34/pointnet; https://github.com/charlesq34/pointnet2; https://github.com/WangYueFt/dgcnn

. Default settings are used to train these models. To enable dynamic point-number input along the second dimension of the batch-input tensor, for all the three models, we substitute several Tensorflow ops with equivalent ops that support dynamic inputs. We also rewrite a dynamic batch-gather ops and its gradient ops for DGCNN by C++ and Cuda.

For simplicity, we set the number of votes ††††††Aggregate classification scores from multiple rotations as 1. In all of the following cases, approximately accuracy improvement can be obtained by more votes, e.g., 12 votes. Besides, incorporation of additional features like face normals will further improve the accuracy by nearly . We did not consider these tricks in our experiments for simplicity.

4.3 Empirical Results

To show the effectiveness of our saliency map as a guidance to point dropping, we compare our approach with the random point-dropping baseline [Qi et al.2017a], denoted as rand-drop, and the critical-subset-based strategy introduced in Section 2.3, denoted as critical (only applicable to PointNet). For simplicity, we refer to dropping points with the lowest scores to generate non-adversarial clouds as non-adversarial, and dropping points with highest scores to generate adversarial clouds as adversarial in the followings. In order to achieve better performance, as explained in section 3.3, we use Algorithm 1 to generate non-adversarial clouds; while for the iterative version in Algorithm 2, we set to generate adversarial clouds.

Results on PointNet

The performance of PointNet on 3D-MNIST test set is shown in Fig. 4. The overall accuracy of PointNet maintains under rand-drop while varying the number of dropped points between 0 to 200. In contrast, the adversarial point clouds generated by our point-dropping algorithm reduce PointNet’s overall accuracy to . Furthermore, it is interesting to see by dropping points with negative scores, the accuracy even increases compared to using original point clouds by nearly . This is consistent for other models and datasets as shown below. For ModelNet40, as shown in Fig. 4, the overall accuracy of PointNet maintains *** in [Qi et al.2017a] can be acquired by setting the number of votes as . We set the number of votes to for simplicity. The discrepancy between the accuracies under these two setting is always less than . under rand-drop. However, our point-dropping algorithm can increase/reduce the accuracy to /.

Results on PointNet++

The results for PointNet++ are shown in Fig. 5, which maintains on 3D-MNIST under rand-drop, while our point-dropping algorithm can increase/reduce the accuracy to /. On the ModelNet40 test set, PointNet++ maintains in [Qi et al.2017b] can be achieved by incorporating face normals as additional features and setting the number of votes as overall accuracy under rand-drop, while our algorithm can increase/reduce the accuracy to /.

Results on DGCNN

The accuracies of DGCNN on 3D-MNIST and ModelNet40 test sets are shown in Fig. 6, respectively. Similarly, under rand-drop, DGCNN maintains and accuracies respectively. Given the same conditions, our algorithm is able to increase/reduce the accuracies to / and / respectively.

Visualization

Several adversarial point clouds are visualized in Fig. 8. For the point clouds shown in those figures, our iterative algorithm successfully identifies important segments that distinguish them from other clouds, e.g., the base of the lamp, and fools the DNN model by dropping those segments. It is worth pointing out that human still seems to be able to recognize most of those fragmented point clouds, probably due to the ability of human imagination. On the contrary, as shown in Fig. 9, non-adversarial point cloud generation is visually similar to a denoising process, i.e., dropping noisy/useless points scattered throughout point clouds. Although the DNN model misclassifies the original point clouds in some cases, dropping those noisy points could correct the model predictions.

Parameter Study

We employ PointNet on ModelNet40 to study the impacts of the scaling factor , the number of dropped points , and the number of iterations to model performance. As shown in Fig. 7, is a good setting for Algorithm 2 since as increases, the number of adversarial clouds generated by our algorithm will slightly decrease. Besides, it is clear in Fig. 7 (middle) that our algorithm significantly outperforms rand-drop in terms of generating adversarial clouds: the accuracy of PointNet still maintains over under rand-drop with points dropped, while Algorithm 2 reduces the accuracy to nearly . In Fig. 7 (right), we show that Algorithm 2 generates more adversarial point clouds than Algorithm 1. When it comes to non-adversarial point-cloud generation, Algorithm 2 still slightly outperforms Algorithm 1 but with more expensive computational cost. Therefore, we recommend Algorithm 2 for adversarial point-cloud generation, and Algorithm 1 fo non-adversarial point-cloud generation.

Discussion

Among all the three state-of-the-art DNN models for 3D point clouds, DGCNN appears to be the most robust model to adversarial point clouds generated by our proposed algorithm. We conjecture the robustness comes from its structures designed to capture more local information, which is supposed to compensate for the information loss by dropping a single point. On the contrary, PointNet does not capture local structures [Qi et al.2017b], making it the most vulnerable model to adversarial fragmented point clouds.

5 Conclusion

In this paper, a saliency-map learning method for 3D point clouds is proposed to measure the contribution (importance) of each point in a point cloud to the model prediction loss. By approximating point dropping with a continuous point-shifting procedure, we show that the contribution of a point is approximately proportional to, and thus can be scored by, the gradient of loss w.r.t. the point under a scaled spherical-coordinate system. Using this saliency map, we further standardize the point-dropping process to generate adversarial/non-adversarial point clouds by dropping points associated with the highest/lowest scores. Extensive evaluations show that our saliency-map-driven point-dropping algorithm consistently outperforms other schemes such as the random point-dropping scheme, revealing the vulnerabilities of state-of-the-art DNNs to adversarial point clouds generated by malicious point dropping, i.e., a more realizable adversarial attack in practice.

References

Appendix A Critical-Subset Theory

We reexplain the critical-subset theory [Qi et al.2017a] for interested readers. Here is used to represent the max pooled features in PointNet, i.e., . (i.e., a special maxpooling layer) is a vector max operator that takes n vectors as input and returns a new vector of the element-wise maximum. A PointNet network can be expressed as , where is a continuous function. Apparently, is determined by . For the dimension of , there exists at least one such that , where is the dimension of . Aggregate all those into a subset such that will determine , and thus . [Qi et al.2017a] named as critical subset. As we can see, this theory is applicable to PointNet, where a max-pooled feature is simply determined by one point, but not to networks with more complicated structures.

Appendix B Critical-Subset-based Point Dropping

We tested several point-dropping strategies based on the critical-subset theory, e.g., randomly dropping points from the critical-subset one-time/iteratively and dropping the points that contribute to the most number of max pooled features one-time/iteratively. Among all those schemes, dropping the points with contribution to the most number of max pooled features (at least two features) iteratively provides the best performance. The strategy is illustrated in Algorithm 3.

0:  PointNet network ; point cloud input , label , and model weights ; hyper-parameter ; total number of points to drop ; number of iterations .
  for  = to  do
     Compute the indexes of points in the critical-subset (index list) is by
     Count (i.e., determines max pooled features)
     Drop points with the largest s from
  end for
  Output
Algorithm 3 Iteratively drop points based on dynamic critical subset

Appendix C More Visualization Results

In the body, several adversarial point clouds generated by dropping less than points are visualized. Here we will show more adversarial point clouds generated by dropping pointsWhen the dropped points increase beyond , our saliency-map-based point-dropping scheme can generate adversarial point clouds for almost all the data in both 3D-MNIST and ModelNet40 testing datasets in Figure 18.

Figure 11: From airplane to radio by dropping points
Figure 12: From airplane to stairs by dropping points
Figure 10: From airplane to table by dropping points
Figure 11: From airplane to radio by dropping points
Figure 12: From airplane to stairs by dropping points
Figure 13: From airplane to laptop by dropping points
Figure 10: From airplane to table by dropping points
Figure 15: From chair to stool by dropping points
Figure 16: From chair to stairs by dropping points
Figure 14: From chair to toilet by dropping points
Figure 15: From chair to stool by dropping points
Figure 16: From chair to stairs by dropping points
Figure 17: From chair to stool by dropping points
Figure 14: From chair to toilet by dropping points