ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial Rotation

by   Robin Wang, et al.
Peking University

Point cloud classifiers with rotation robustness have been widely discussed in the 3D deep learning community. Most proposed methods either use rotation invariant descriptors as inputs or try to design rotation equivariant networks. However, robust models generated by these methods have limited performance under clean aligned datasets due to modifications on the original classifiers or input space. In this study, for the first time, we show that the rotation robustness of point cloud classifiers can also be acquired via adversarial training with better performance on both rotated and clean datasets. Specifically, our proposed framework named ART-Point regards the rotation of the point cloud as an attack and improves rotation robustness by training the classifier on inputs with Adversarial RoTations. We contribute an axis-wise rotation attack that uses back-propagated gradients of the pre-trained model to effectively find the adversarial rotations. To avoid model over-fitting on adversarial inputs, we construct rotation pools that leverage the transferability of adversarial rotations among samples to increase the diversity of training data. Moreover, we propose a fast one-step optimization to efficiently reach the final robust model. Experiments show that our proposed rotation attack achieves a high success rate and ART-Point can be used on most existing classifiers to improve the rotation robustness while showing better performance on clean datasets than state-of-the-art methods.


page 4

page 6

page 13

page 14


Discrete Rotation Equivariance for Point Cloud Recognition

Despite the recent active research on processing point clouds with deep ...

3D-Rotation-Equivariant Quaternion Neural Networks

This paper proposes a set of rules to revise various neural networks for...

Learning Rotation-Invariant Representations of Point Clouds Using Aligned Edge Convolutional Neural Networks

Point cloud analysis is an area of increasing interest due to the develo...

VN-Transformer: Rotation-Equivariant Attention for Vector Neurons

Rotation equivariance is a desirable property in many practical applicat...

Rotation Invariant Point Cloud Classification: Where Local Geometry Meets Global Topology

Point cloud analysis is a basic task in 3D computer vision, which attrac...

TPC: Transformation-Specific Smoothing for Point Cloud Models

Point cloud models with neural network architectures have achieved great...

3D Point Cloud Completion with Geometric-Aware Adversarial Augmentation

With the popularity of 3D sensors in self-driving and other robotics app...

Code Repositories

1 Introduction

A very basic requirement for point cloud classification is expecting the network to obtain stable predictions on inputs undergoing rigid transformations since such transformations do not change the shape of the object, let alone change its semantic meanings. This basic requirement is even more important in practical applications. For example, when a robot is identifying and picking up an object, the object is usually in an unknown pose. However, many studies [51, 7, 17] have shown that most existing point cloud classifiers can be easily attacked by simply rotating the inputs. To use these classifiers we require to align all input objects which is a very expensive and time-consuming process. To this end, how to improve the robustness of point cloud classifiers to arbitrary rotations, becomes a very popular and necessary research topic.

In order to make the network robust to rotated inputs, most existing works can be classified into three categories: (1) Rotation Augmentation Methods attempt to augment the training data using rotations and have been widely used in the earlier point cloud classifiers [30, 31, 39]. However, data augmentation can hardly be applied to improve model robustness to arbitrary rotations due to the astronomical number of rotated data [49]. (2) Rotation-Invariance Methods propose to convert the input point clouds into geometric descriptors that are invariant to rotations. Typical invariant descriptors can be the distance and angles between local point pairs [8, 4, 47, 48] or point norms [17, 49] and principal directions [47] calculated from global coordinates. (3) Rotation-Equivariance Methods try to solve the rotation problem from the perspective of model architectures. For example, [40, 5, 28, 37] use convolution with steerable kernel bases to construct rotation-equivariant networks and [7, 50, 35] modify existing networks with equivariant operations. While both methods (2) and (3) can effectively improve model robustness to arbitrary rotations, they either require time-consuming pre-processing on inputs or need complex architectural modifications, which will result in limited performance on clean aligned datasets.

In this paper, we try to explore a new technical route for the rotation robustness problem in point clouds. Our method is inspired by adversarial training [22], a typical defense method to improve model robustness to attacks. The idea of adversarial training is straightforward: it augments training data with adversarial examples in each training loop. Thus adversarially trained models behave more normally when facing adversarial examples than standardly trained models. Adversarial training has shown its great effectiveness in improving model robustness to image or text perturbations [34, 44, 11, 9, 21], while keeping a strong discriminative ability. In 3D point clouds, [36, 18] also successfully leverage adversarial training to defend against point cloud perturbations such as random point shifting or removing. However, using adversarial training to improve the rotation robustness of point cloud classifiers has rarely been studied.

To this end, by regarding rotation as an attack, we develop the ART-Point framework to improve the rotation robustness by training networks on inputs with Adversarial RoTations. Like the general framework of adversarial training, ART-Point forms a classic min-max problem, where the max step finds the most aggressive rotations, on which the min step is performed to optimize the network parameters for rotation robustness. For the max step, we propose an axis-wise rotation attack algorithm to find the most offensive rotating samples. Compared with the existing rotation attack algorithm [51] that directly optimizes the transformation matrix, our method optimizes on the rotation angles which reduces the optimization parameters, while ensuring that the attack is pure rotation to serve for the adversarial training. For the min step, we follow the training scheme of the original classifier to retrain the network on the adversarial samples. To overcome the problem of over-fitting on adversarial samples caused by label leaking [15], we construct a rotation pool that leverages the transferability of adversarial rotations among point cloud samples to increase the diversity of training data. Finally, inspired by ensemble adversarial training [38], we contribute a fast one-step optimization method to solve the min-max problems. Instead of alternately optimizing the min-max problem until the model converges, the one-step method can quickly reach the final robust model with competitive performance.

Compared with the rotation-invariant and equivariant methods, the ART-Point framework aims to optimize network parameters such that the converged model is naturally robust to both arbitrary and adversarial rotations, without the necessity of either geometric descriptor extractions or architectural modifications that may impede the model to learn discriminative features. So our resulting robust model better inherits the original performance on the clean (aligned) datasets. It has no constraint on the model design and can be integrated on most point cloud classifiers.

In experiments, we mainly verify the effectiveness of our methods under two datasets ModelNet40 [42] and ShapeNet16 [46]. We adopt PointNet [30], PointNet++ [31] and DGCNN [39] as the basic classifiers. Firstly, compared with the existing rotation attack method [51], our proposed attack achieves a higher attack success rate. Then, compared with existing rotation robust classifiers, our best model (ART-DGCNN) shows a more robust performance on randomly rotated datasets. Meanwhile, our methods generally show less accuracy reduction on clean aligned datasets. Beyond arbitrary rotations, the resulting models also show a solid defense against adversarial rotations.111Code address: Our contributions can be summarized as follows:

  • For the first time, we successfully improve the rotation robustness of point cloud classifiers from the perspective of model attack and defense. Our proposed framework, ART-Point, enjoys fewer architectural modifications than previous rotation-equivariant methods and requires no descriptor extractions on input data.

  • We propose an axis-wise rotation attack algorithm to efficiently find the most aggressive rotated samples for adversarial training. A rotation pool is designed to avoid over-fitting of models on adversarial samples. We also contribute a fast one-step optimization to solve the min-max problem.

  • We validate our method on two datasets with three point cloud classifiers. The results show that our attack algorithm achieves a higher attack success rate than existing methods. Moreover, the proposed ART-Point framework can effectively improve model rotation robustness allowing the model to defend against both arbitrary and adversarial rotations, while hardly affecting model performance on clean data.

2 Related Work

2.1 Rotation Robust Point Cloud Classifiers

Rotation Augmentation. The initial work of the point cloud classifier [30, 31, 39] adopt rotation augmentation during training to improve rotation robustness. Nevertheless, rotation augmentation can only result in models robust to a small range of angles. More recently, to obtain models robust to arbitrary rotation angles, both rotation-invariance and rotation-equivariance methods are proposed.

Rotation-invariance methods extract rotation-invariant descriptors from point clouds as model inputs. For example, [8, 29, 4, 48] cleverly construct distances and angles from local point pairs. [47, 49, 17]

further extend local invariant descriptors with global invariant contexts. In addition to using invariant descriptors with a clear geometric meaning,

[32, 29, 20] also design invariant convolutions to automatically learn various descriptors for processing.

Rotation-equivariance methods expect the learned features to rotate correspondingly with the input thus resulting in rotation robust models. Most of these works usually rely on rotation-equivariant convolutions [6, 40, 37, 10, 28, 5, 14] to construct equivariant networks. Other works like [7, 50, 35] attempt to modify modules in existing point cloud classifiers [30, 31, 39] to make them rotation-equivariant.

However, these methods usually require specific descriptors or network modules which will reduce the performance of the classifier on the aligned datasets. Our study differs from these methods in that we try to obtain a robust model by optimizing the parameters without changing the input space or network architectures.

2.2 Adversarial Training

Adversarial Training [13, 22] has been proved to be the most effective technique against adversarial attacks [26, 23, 33], receiving considerable attention from the research community. Unlike other defense strategies, adversarial training aims to enhance the robustness of models intrinsically [1]. This property makes adversarial training widely used in various fields to improve the robustness of the model, including image recognition [12, 34, 44, 11], text classification [24, 9, 21, 25], relation extraction [41] etc. In 3D point clouds classification, adversarial training can also be effectively used. For example, [18] employs adversarial training to improve the model robustness to point shifting perturbation by training on both clean and adversarially perturbed point clouds. [36] presents an in-depth study showing how adversarial training behaves in point cloud classification. However, existing works only focus on improving the model’s robustness to perturbations of random point shifting or removing [43, 45, 16, 52, 19, 12].

Recently, [51] designs a rotation attack algorithm for existing point cloud classifiers. Yet it does not provide detailed strategies to defense the rotation attack. As a comparison, we design a new attack algorithm that enjoys a higher attack success rate. More importantly, it serves for our adversarial training framework that generates model naturally defending against both arbitrary and adversarial rotations.

Figure 1: The general pipeline of our adversarial training approach. In the upper branch, the network takes a clean batch (aligned object) as inputs and finds the most aggressive attack angles by maximizing the classification loss of the eval model. The attack angles will be stored by class in the rotation pool. In the lower branch, the network samples angles from the rotation pool to produce adversarial point clouds for re-training the classifier to obtain the rotation robust model. The red and blue dashed lines respectively indicate routes of the backward gradient in two optimization tasks and point to the final optimized parameters. In the real implementations, the one-step optimization will construct the rotation pool by attacking multiple eval models, while the iterative optimization will update the parameter of the eval model by parameters of the latest re-trained model in each min-max iterations.

3 Methods

In this section, we first provide a brief review of adversarial training (Sect. 3.1). Then, we reformulate the adversarial training objective under rotation attack of point clouds (Sect. 3.2). Next, we propose attack (Sect. 3.3) and defense (Sect. 3.4) algorithms to obtain good solutions to the reformulated objective. Finally, we provide a one-step optimization to fast reach a robust model (Sect. 3.5).

3.1 Preliminaries on Adversarial Training

Let us first consider a standard classification task with an underlying data distribution over inputs and corresponding labels . The goal then is to find model parameters that minimize the risk , where

is a suitable loss function. To improve the model robustness, we wish no perturbations are possible to fool the network, which gives rise to the following formulation:


where refers to the perturbed samples generated by introducing perturbations on input data . refers to the allowed perturbation set. Eq. (1) reflects the basic idea of data augmentations.

In contrast, adversarial training improves model robustness more efficiently. By the in-depth study of the landscape of adversarial samples, [22] finds the concentration phenomenon of different adversarial samples, which suggests that training on the most aggressive adversary yields robustness against all other concentrated adversaries. This gives rise to the formulation of adversarial training which is a saddle point problem:


The saddle point problem can be viewed as the composition of an inner maximization problem and an outer minimization problem, where the inner maximization problem is finding the worst-case samples for the given model, and the outer minimization problem is to train a model robust to adversarial samples. Compared with data augmentation, adversarial training searches for the best solution to the worst-case optimum and can improve the model robustness to perturbations in larger ranges [22].

3.2 Problem Formulation

Our main goal is to improve the robustness of the point cloud classifiers to rotation attacks through the adversarial training framework. We reformulate Eq. (2) by specifying the perturbation to be the point cloud rotation as follows:


where refers to an input point cloud of size and is the corresponding class label. is the parameters of point cloud classifiers such as PointNet [30] or DGCNN [39]. refers to the adversarial samples generated by using matrix to rotate the input and is the group of all rotations around the origin of Euclidean space. We set the rotation to ensure the objective is to make the model robust to arbitrary rotations.

As discussed in [22], one key element for obtaining a good solution to Eq. (3) is using the strongest possible adversarial samples to train the networks. Following this principle, we first propose a novel rotation attack method that enjoys satisfactory attack success and thus better serves for the adversarial training to improve model robustness.

3.3 Attack—Inner Maximization

For the inner maximization problem, we expect a strong rotation attack algorithm that can find the most aggressive samples inducing high classification loss. A previous study [51]

introduced two rotation attack methods, Thompson Sampling Isometry (TSI) attack and Combined Targeted Restricted Isometry (CTRI) attack, for generating adversarial rotations. However, they can hardly be used in adversarial training for the following reasons: (1) the TSI attack is a black-box attack, which has no direct access to the classifier parameters and thus can hardly be used to find samples inducing high loss. (2) CTRI attack is a white-box attack and one can use parameter information to search the most aggressive samples. Yet, in CTRI, there is no strict constraint for the matrix to be a pure rotation, which leads to adversarial samples with non-rigid deformation. To this end, we propose a novel white-box attack that can efficiently find the most aggressive samples while guaranteeing that the attack is pure rotation.

Gradient Descent on Angles. Firstly, to ensure the attack is pure rotation, we propose to optimize the attack by gradient descent on rotating angles. Specifically, for an n-point cloud

, we consider vectors

with 3 parameters denoting rotation angles along three axes. Rotating points along axis by will increase the loss by

, which can then be calculated under the spherical coordinate, by the chain rule as:


where, and are gradients back-propagated on point coordinates. For the rest of the rotation axes, and can also be calculated in the same way. Based on Eq. (4), we can iteratively optimize the angles by gradient descent to obtain adversarial rotations that induce high loss. Finally, the rotation matrix is generated from optimized angles as , where corresponds to the rotation matrix that rotates degrees around axis. More derivations about the gradient calculation and rotation matrix construction will be provided in the supplementary.

Axis-Wise Attack. In order to efficiently find the most aggressive rotations, based on the angle gradients, we further propose an axis-wise mechanism. Specifically, we subdivide a rotation in SO(3) into rotations around three axes for optimization. By doing so, each time we can choose the most aggressive axis to rotate, resulting in stronger attacks. We approximate the loss change ratio of a specific axis by , which reflects the influence of rotating around a certain axis on final losses. Next, we select the most influenced axis


and attack the axis by rotating one step in the opposite direction of gradient descent:


Compared with simultaneously optimizing on all three axes, the axis-wise attack can specify a gentler change of the rotation angles in each attack step.

Implementation Details. In the real implementations, we adopt several other general settings to find adversarial samples. Firstly, we use the Projected Gradient Descent (PGD) [22] to optimize angles. Compared with the normal gradient descent, PGD ensures that the optimized angles can be constrained into certain scopes:


In our case, we set the projected scope as to avoid the discontinuity caused by the periodicity of rotation. Then, instead of cross-entropy, we follow [43, 51] to adopt CW loss [3] to modify the cross-entropy as a more powerful adversarial objective to generate stronger adversary. Finally, to make sure that the generated adversary can be more evenly distributed among , we adopt a random start strategy. For each input point cloud, we will initialize it with a random rotation angle, then continue to attack along with the initialization angles. The proposed axis-wise rotation attack algorithm is illustrated in Algorithm (1).

0:  Point cloud input , label and model parameters , loss function , number of iterations , step size , initial rotation angles and corresponding rotation matrix .
1:  for  to  do
2:     Compute the gradients on coordinates:
3:     .
4:     Compute the gradients on angles by Eq. (4).
5:     Determining the target axis by Eq. (5).
6:     Attack the target axis by Eq. (7).
7:     Update the rotation matrix:
9:     Obtain the attacked point clouds:
10:  end forOutput
Algorithm 1 Axis-Wise Rotation Attack

3.4 Defense—Outer Minimization

On the defense side, we use Stochastic Gradient Descent (SGD)

[2] to re-train the model on the adversarial samples. During experiments, we find that for the original training set and its attacked set with rotations, directly training on set can easily lead to model over-fitting. This behavior is known as label leaking [15] and stems from the fact that the gradient-based attack produces a very restricted set of adversarial examples that the network can overfit. The problem can be even worse on the smaller training set, in our case, ModelNet40 [42]. To solve the label leaking caused over-fitting problems, we propose to increase the training data with more kinds of adversarial rotations. A simple solution is to construct the training set with multiple attack . However, multiple attacks can be very time-consuming. To this end, we construct a rotation pool to increase the diversity of training data in a more efficient manner.

Figure 2: Transferability of adversarial rotations among samples in the same categories. The adversarial rotation found on one sample in “Bench” can be applied to other samples of the same category to induce high loss and mislead the model to classify them into a wrong category “Bookshelf”.

Rotation Pool. As shown in Fig. (4), we observe that the adversarial rotation found on one sample has a strong transferability on other samples of the same category. Based on this observation, instead of saving the rotated samples, we suggest saving the rotation angles produced on each sample by class to construct a rotation pool:


where is the rotation found on sample of category . We will save the rotations corresponding to all samples in the category and traverse all categories to construct the final rotation pool . During defense training, we only need to sample rotations from the rotation pool according to the category to transform the input into adversaries. Thanks to the transferability, the adversarial samples generated by the rotation pool can also induce high classification loss. Experiments in Sect. 4.5 also confirm that the rotation pool can effectively solve the over-fitting problem.

Figure 3: Comparison of different optimizations. For the iterative optimization (a), model with parameters will be repeatedly optimized on the min-max problem times until converging to a robust parameter . In contrast, the proposed one-step optimization (b) constructs the rotation pool by attacking different models and requires only one step to obtain robust parameters of the targeted model.

Iterative Optimization. In order to solve the minimization problem, i.e. Eq. (3), in adversarial training to reach the final robust models, an iterative optimization scheme is usually adopted. Specifically, in the first iteration, we will attack the pre-trained classifier to initialize the rotation pool and then re-train the classifier on adversarial samples generated from the rotation pool towards a robust model. In the following iterations, we will attack the latest robust model to update the rotation pool iteratively:


where refers to the parameters of robust model after iterations, is the rotation matrix of random start angles and is the class label corresponding to input sample . refers to the rotation found on sample of category in the -th iteration. We then re-train the classifier on the adversaries generated from the updated pool to reach a more robust model. The process will be repeated until the model converges to the most robust state.

3.5 One-Step Optimization

The naive implementation above requires multiple iterations on both the attack and defense sides. Though obtaining robust models, the whole process is extremely time-consuming. Inspired by the ensemble adversarial training (EAT) [38], we further propose an efficient one-step optimization to reach the robust model with lower training cost.

Specifically, instead of iterating multiple times for obtaining more aggressive samples, EAT proposes to introduce the adversarial examples crafted on other stronger static pre-trained models. Intuitively, as adversarial samples transfer between models, perturbations crafted on the more robust model are good approximations for the maximization problem of the target model. We follow this principle to solve the minimization problem Eq. (3) in one step. Concretely, we not only attack the target classifier but attack more robust classifiers to construct a larger rotation pool:


where, refers to the parameters of model and is the adversarial rotation generated by attacking model . By attacking models, the resulting rotation pool has times more aggressive rotations than the iterative optimization does. For defense, similar to the iterative optimization, we use the adversarial rotation sampled from the rotation pool to re-train the target model. Compared with the iterative manner, the one-step optimization achieves competitive results with faster training progress. Hence, we select the one-step optimization as the default implementation of our ART-Point framework. The comparison between the two optimization methods is shown in Fig. (6). Detailed implementations and comparison experiments will be provided in the supplementary.

4 Experiments

4.1 Experiment Setup

Datasets. We evaluate our methods on two classification datasets ModelNet40 [42] and ShapeNet16 [46]. ModelNet40 contains 12,311 meshed CAD models from 40 categories. ShapeNet16 is a larger dataset which contains 16,881 shapes from 16 categories. For both datasets, we follow the official train and test split scheme and use the same data pre-processing as in [30, 31, 39] where each model is uniformly sampled with 1,024 points from the mesh faces and rescaled to fit into the unit sphere.

Models. We select three point cloud classifiers to evaluate our method, including PointNet [30], a pioneer network that processes points individually, PointNet++ [31]

, a hierarchical feature extraction network and DGCNN

[39], a graph-based feature extraction network. These classifiers lack robustness to rotation. By verifying these classifiers, we show that ART-Point can be applied to various learning architectures to improve rotation robustness.

Evaluations. In order to comprehensively compare the rotation robustness of different models, we design three evaluation protocols: (1) Attack. The test set is adversarially rotated by the proposed attack algorithm for evaluating model defense. (2) Random. The test set is randomly rotated for evaluating model rotation robustness. (3) Clean. The test set is unchanged for evaluating the discriminative ability under aligned data. Moreover, we use the attack success rate to evaluate our attack algorithm. The attack success rate is calculated as the percentage of correctly predicted samples in the test set before and after the attack.

Method ModelNet40
Attack Random Clean
PointNet [30] (RA) 55.6 74.4 76.7
PointNet++ [31] (RA) 58.9 80.1 82.3
DGCNN [39] (RA) 65.6 85.7 87.6
ART-PointNet (Ours) 85.6(30.0) 84.3(9.9) 85.5(8.8)
ART-PointNet++ (Ours) 90.1(31.2) 87.5(7.4) 88.6(6.3)
ART-DGCNN (Ours) 91.5(25.9) 90.5(4.8) 91.3(3.7)
Method ShapeNet16
Attack Random Clean
PointNet [30] (RA) 66.4 87.3 89.5
PointNet++ [31] (RA) 70.5 89.7 92.1
DGCNN [39] (RA) 74.4 90.5 94.3
ART-PointNet (Ours) 96.9(30.5) 95.1(7.8) 96.2(6.7)
ART-PointNet++ (Ours) 97.8(27.3) 96.3(6.6) 97.5(5.4)
ART-DGCNN (Ours) 98.4(24.0) 97.7(7.2) 98.1(3.8)
Table 1: Comparing three evaluation protocols under ModelNet40 [42] and ShapeNet16 [46] for classifiers trained via rotation augmentation (RA) and adversarial rotation (ART).

4.2 Comparison with Rotation Augmentation

We first compare the effectiveness of the proposed ART-Point with rotation augmentation (RA) for improving model rotation robustness. For classifiers using rotation augmentation, we will train them with randomly rotated inputs. In Tab. (6), we illustrate the comparison results under ModelNet40 [42] and ShapeNet16 [46]. From the table, several observations can be obtained. Firstly, compared with rotation augmentation, the proposed ART-Point results in models performing better under all protocols. Such performance improvements can be consistently observed on all three classifiers under both datasets. Secondly, under the attacked test set, the classification accuracy of model trained using ART-point is significantly higher than model trained with RA. (maximum increase: 31.2%). This is mainly because that rotation augmentation can hardly defend against adversarial rotations found using model gradient information. In contrast, our method shows stronger defense to adversarial rotations. We will further test the defense ability of our method under different rotation attacks in Sect. 4.4. Both observations suggest that the proposed ART-Point is a more effective method to improve the rotation robustness of point cloud classifiers than rotation augmentation.

4.3 Comparison with Rotation Robust Classifiers

We further compare robust models trained by ART-Point with existing rotation robust classifiers, including [32, 48, 4, 17] that convert point clouds into rotation invariant descriptors and [37, 35, 7, 5] that design rotation-equivariant architectures, to further illustrate appealing properties of our method. Rotation robust classifiers will be trained on random rotated inputs. The comparison results based on all protocols under ModelNet40 [42] are shown in Tab. (2).

Method ModelNet40
Attack Random Clean
Classifiers Using Invariant Descriptors
SFCNN [32] 90.1 90.1 90.1
RI-Conv [48] 86.5 86.4 86.5
ClusterNet [4] 87.1 87.1 87.1
RI-Framework[17] 89.4 89.3 89.4
Classifiers with Equivariant Architectures
TFN [37] 87.6 87.6 87.6
REQNN [35] 74.4 74.1 74.4
VN-PointNet [7] 77.2 77.2 77.2
VN-DGCNN[7] 90.2 90.2 90.2
EPN [5] 88.3 88.3 88.3
ART-PointNet 85.6 84.3 85.5
ART-PointNet++ 90.1 87.5 88.6
ART-DGCNN 91.5 90.5 91.3
Table 2: Comparing three evaluation protocols under ModelNet40 [42] for various rotation robust classifiers.

Firstly, our best model ART-DGCNN outperforms all equivariant or invariant methods under three evaluation protocols, which indicates its stronger robustness over rotations. Secondly, both equivariant or invariant methods perform similarly under all protocols, which is undesirable, since the clean test set should more easily be classified by the model. This is mainly because that these methods obtain rotation robustness by separating the pose information from point clouds via modifications on input space or model architectures. In contrast, ART-Point uses original classifiers for training on adversarial samples in 3D space, the resulting model not only better inherits the performance of original classifiers on clean sets but shows great defense on the attacked test set.

4.4 Attack and Defense

Beyond rotation robustness, our method provides a complete set of tools for attack and defense on point cloud classifiers. To verify the proposed attack algorithm, we compare the attack success rate of our method with other rotation attacks proposed in [51]. Meanwhile, we also show the defense ability of classifiers trained with ART-Point. The results are illustrated in Tab. (3).

Models Rotation Attack Algorithm
TSI [51] CTRI [51] Ours
PointNet [30] 96.92 99.44 99.54
PointNet++ [31] 91.31 97.93 98.96
DGCNN [39] 89.81 97.99 98.51
ART-PointNet (Ours) 9.71 11.13 12.78
ART-PointNet++ (Ours) 4.31 6.60 7.92
ART-DGCNN (Ours) 3.14 5.33 6.62
Table 3: Comparing attack success rate (%) of several attack algorithms on different classifiers under ModelNet40 [42].

In the first three rows, we report the attack success rate of different attack algorithms on classifiers trained using clean samples. As can be seen, compared with the other two rotation attacks, our attack achieves the highest success rate on all three classifiers. In the last three rows, we further report the attack success rate on classifiers trained using ART-Point. As can be seen, ART-Point improves model defense against rotation attacks.

4.5 Ablation Study

Finally, we conduct ablation studies to prove the effectiveness of our designs in ART-Point. All ablation experiments are conducted on the PointNet [30] classifier and evaluated under randomly rotated test sets111More ablation studies on descent step, rotation angle, and attack step size can be found in the supplementary material..

Different Attacks. We use adversarial samples generated by different rotation attacks for adversarial training and investigate the impact on the robustness of the resulting models. We adopt several attacks to generate adversarial samples that induce different loss values, including the random rotation attack, attacks in [51] and our attacks with different steps. In the left column of Tab. (4), we illustrate the average classification loss of samples produced by different attacks and results of adversarial training using corresponding samples. Compared with other attacks, the proposed axis-wise rotation attack with 10 steps gradient descent induces the highest loss value.

Methods Loss Acc. Methods Loss Acc.
Random 5.13 74.4 w/o RP 12.72 55.8
TSI [51] 7.35 79.5 RP(pn1) 10.19 82.9
CTRI[51] 8.87 82.1 RP(pn1,pn2) 12.01 82.6
Ours (step=1) 7.65 81.5 RP(pn1,dg) 12.55 83.1
Ours (step=5) 9.57 82.8 RP(pn2,dg) 13.03 84.0
Ours (step=10) 13.49 84.3 RP(pn1,pn2,dg) 13.49 84.3
Table 4: The average loss of adversarial samples generated by different methods and accuracy of corresponding adversarial training. RP(pn1) refers to the rotation pool generated by attacking PointNet [30]. pn2 and dg refer to PointNet++ [31] and DGCNN [39].

Rotation Pool. We verify the necessity of constructing the rotation pool. We compare the results of adversarial training with and without rotation pools. Moreover, we also investigate the impacts of constructing rotation pools from different models. As shown in the right column of Tab. (4), although adversarial training without rotation pool generates samples inducing high loss values, the final result is worse than training with rotation pool due to the over-fitting caused by label leaking [15].

Axis-Wise Attack. We compare our proposed axis-wise rotation attack with the standard attack algorithm, which simultaneously optimizes three angles in one gradient descent. We mainly follow [22] to show the average loss value of attacked samples in each step. We restart the attack 20 times with random angle initialization. The comparison results are shown in Fig. (4). As can be seen, the axis-wise mechanism enables the attack algorithm to find more aggressive rotated samples.

Figure 4: Averaged loss values of attacked samples produced by standard attack and axis-wise attack under different attack steps.

4.6 Discussions of Limitations and Society Impact

Since our method is mainly based on adversarial training, one limitation is that we need to obtain a fully trained model with accessible parameters in the first place. Meanwhile, since our method involves a rotating attack algorithm, it may be exploited for attacking point cloud based 3D object detection systems, which is a potential negative societal impact.

5 Conclusion

In this paper, we propose ART-Point to improve the rotation robustness of point cloud classifiers via adversarial training. ART-Point consists of an axis-wise rotation attack and a defense method with the rotation pool mechanism. It can be adopted on most existing classifiers with fast one-step optimization to obtain rotation robust models. Experiments show that the novel rotation attack achieves a high attack success rate on most point cloud classifiers. Moreover, our best model ART-DGCNN shows great robustness to arbitrary and adversarial rotations and outperforms existing state-of-the-art rotation robust classifiers.


Appendix A Overview

This document provides technical details, additional quantitative results, and more qualitative test examples to the main paper. In Sect. B we provide derivations about the gradients back-propagated on three rotation angles and illustrate the construction of rotation matrices. In Sect. C we show more implementation details on our network architectures and training parameters. Then Sect. D illustrates comparison experiments between different optimization skills, while Sect. E shows more analysis experiments on our attack algorithm. At last, we show some visualization results in Sect. F.

Appendix B Gradient Derivation and Rotation matrix Construction (Sect. 3.3)

Gradient Derivation. As illustrated in the main paper, rotating points along axis by will increase the loss by , where can be calculated by the chain rule as:


Here, refers to the rotation angle around axis, which is the same as the azimuthal angle in the following spherical coordinate system :


Then, based on Eq. (12), we can write Eq. (11) as follows:


Similarly, for the remaining rotation axes and , we can calculate the gradients simply by rolling the coordinate system in Eq. (13) as follows:


Rotation Matrix Construction. Given the optimized rotation angle , we construct the corresponding rotation matrices as follows:


Based on above equations, we compute the final rotation matrix , where “” refers to the matrix multiplication.

Appendix C Implementation Details

We implement ART-Point using PyTorch [27]. In detail, during attack, we set the step size of angle gradient descent , a batch size and adopt ten steps descent to obtain the adversarial rotation. During defense, we mainly use SGD to train existing point cloud classifiers following the same optimizer and learning rate schedules as used in their papers. We experiment with two optimization methods: iterative optimization and one-step optimization.

For the iterative optimization, we alternate the min-max process until the model converges. Specifically, to train a robust PointNet, in each iteration we use 10 epochs gradient descent on angles for maximization to find the most aggressive rotation angles and 50 epochs for minimization to train on adversarial datasets. We perform 10 iterations in total to obtain the final robust model.

For the one-step optimization, we construct the rotation pool by attacking multiple classifiers and reach the robust model in a single min-max iteration. Concretely, suppose that our target model is the PointNet classifier [30]. We not only attack PointNet but attack more robust classifiers such as PointNet++ [31] and DGCNN [39] to construct the rotation pool. We use 10 epochs gradient descent for maximization to find adversarial samples and 200 epochs for minimization to train on adversarial samples.

Figure 5: Adversarial training results of three classifiers under ModelNet40 [42] with different optimizations.

Appendix D Comparison of Different Optimizations

We compare the training progress of the naive iterative optimization with the proposed one-step optimization. The experiments are conducted under ModelNet40 [42] and resulting classifiers are tested under randomly rotated datasets for evaluating the rotation robustness. We record the performance of three classifiers in each iteration and compare the final results with classifiers trained via the one-step method.

Specifically, we follow the detailed implementations for both optimizations in Sect. C to reach robust models. It can be seen from Fig. (5) that for the iterative optimization it usually takes 8-10 iterations to reach the most robust model. In contrast, the one-step method obtains the robust model with competitive performance in one iteration. Note that, for different classifiers in one-step optimizations, the rotating pools are all constructed by attacking three models, i.e. PointNet [30], PointNet++ [31] and DGCNN [39].

Appendix E More Ablation Studies

Here, we provide more control experiments to verify our rotation attack algorithm. We mainly conduct studies based on ModelNet40 [42] with PointNet classifiers [30].

Attack Step Size. We further illustrate experiments to select the appropriate step size in angle attacks. The results are shown in Tab. (6), where we record the average loss value of attacked samples under different step size . Our attack algorithm finds the most aggressive attacked samples that induce the highest loss with .

Descent Steps and Rotation Angles. Finally, we verify the effect of different hyper-parameters on adversarial training. We adopt different descent steps during attacking and we also study the performance of our method under limited rotation ranges. The final results are shown in Tab. (5).

Steps 83.9 84.3 84.3 84.2
Angles 87.2 86.4 85.5 84.3
Table 5: Adversarial training results under different settings.

The adversarial training results tend to be saturated when the gradient descent step is large than 10, so we set the attack algorithm with 10 steps descent by defaults. Our method obtains better results under smaller rotation ranges, which demonstrates that by specifying the range of rotation angles, ART-Point can further increase the model robustness.

5.3 7.4 9.5 8.9 11.3
13.5 12.4 11.7 10.2 9.5
Table 6: Averaged loss values of attacked samples produced by attacks with different step sizes.
Figure 6: In every two rows, we compare the classification loss of DGCNN [39] (top row) and ART-DGCNN (bottom row) on the same arbitrarily rotated point clouds, which are randomly sampled from test sets of ModelNet40 [42]. From top to bottom, the categories of point clouds are “table”, “desk” and “car”.
Figure 7: In every two rows, we compare the classification loss of DGCNN [39] (top row) and ART-DGCNN (bottom row) on the same arbitrarily rotated point clouds, which are randomly sampled from test sets of ShapeNet16 [46]. From top to bottom, the categories of point clouds are “bag”, “cap” and “mug”.

Appendix F Visualization

Finally, we compare the classification loss of different models under the the randomly rotated test set of ModelNet40 [42] (Fig. 6) and ShapeNet16 [46] (Fig. 7). We illustrate the corresponding loss value under each rotated sample and compare them between the original DGCNN [39] and our best model ART-DGCNN. As can be seen, our method generally shows lower classification loss under both randomly rotated datasets.