The unprecedented development in 3D acquisition technologies has led to the widely affordable of 3D sensors such as LiDARs, and RGB-D cameras, which directly output data termed Point Cloud. Theoretically, this data can provide rich information in terms of geometric, shape, and scale. However, in practice, capturing the same object from different locations or angles produces different data. Consequently, inducing equivariance under transformations such as permutation, translation and rotation in deep neural network architectures is crucial to improving generalization when dealing with 3D point clouds.
Incorporating equivariant properties in deep neural network architectures has long been a powerful idea. For example, Convolutional Neural Networks (CNNs) can effectively extracting local information in 2D image by having translation equivariance. Therefore, general theory for group equivariant neural networks and their universality have been recently developed. Some research effort was put to study 2D and 3D rotation equivariance for grid or voxel images with positive results. However, few attention has been paid to the study of group equivariant neural networks for 3D point clouds.
This work seeks to improve generalization and data efficiency of neural networks for 3D point clouds through group equivariance since processing raw point cloud directly can eliminate redundant computation and memory required. In particular, we propose a general procedure, called -PointX, to introduce group equivariant to an existing SOTA backbone which is not equivariant yet. The main idea of the procedure is inherited by the symmetricization technique, called Reynolds operator (see sturmfels2008algorithms ). For practical applications, we apply this procedure to two typical backbones which are PointNet++ and PointConv to obtain efficient group equivariant models for 3D point clouds that we term -PointNet and -PointConv, respectively.
We demonstrate the effectiveness of -PointX through extensive empirical experiments. For the classification benchmark, -PointX outperforms state-of-the-art deep neural networks for 3D point clouds when dealing with rotated data. For semantic segmentation benchmark, -PointX outperforms original models by a significant margin. In short, the main contribution of this paper are:
A detailed formulation of a novel group equivariant CNN and MLP termed -PointConv and -PointNet++
Comprehensive and extensive experiments demonstrating the effectiveness of our proposed method.
2 Related works
Deep learning with raw 3D point clouds: Recently, several approaches extracting features from 3D point clouds have been presented in the literature. As a pioneer in raw point cloud processing, PointNet qi2017pointnet
extracted features by simply employing a combination of MLPs and max-pooling. PointNet++qi2017pointnet++ , afterwards, introduces a new architecture called hierarchical structure which efficiently aggregated information from local area. Notably, PointNet++ used PointNet operation to extract feature, which could be considered as MLP operation for point cloud. Later papers inspired by PointNet++ often keep the hierarchical structure as the same and only modify the operation. Typically, PointConvwu2019pointconv replaced MLPs with Convolution layers and PointTransformerzhao2021point used self-attention operation to collect information in local region. As only relied on relative position and color, PointNet++, PointConvwu2019pointconv and PointTransformerzhao2021point are equivariant with translation and permutation.
Group equivariant neural networks: There are several methods tried to embed group equivariant properties into the architecture. For example, the patch-wise -rotation equivariance in Harmonic Networks worrall2017harmonic was obtained by using circular harmonics. Moreover, cohen2016steerable and jacobsen2017dynamic proposed blocks named Steerable CNNs, which can be equivariant to group of 3D rotations. This type of block, then, applied on capsule networks sabour2017dynamic and with N-body networks kondor2018n . The notions of equivariance and convolution were also generalized in neural networks to the action of compact groups in kondor2018generalization . Apart from that, some further abstract setting and universality results on group equivariance are described in kumagai2020universal ; maron2019universality ; petersen2020equivalence ; ravanbakhsh2020universal ; yarotsky2018universal .
Group equivariant neural networks for 3D point clouds: Despite the rapid development in group equivariant neural networks, the number of papers designing group equivariant neural networks for point cloud is modest. However, rotation equivariance seems indispensable for architecture in this field since the point clouds are heavily depend on the location and the orientations of the Lidar sensors. In previous works, that property was achieved by several approaches. For instance, while thomas2018tensor used filters built from spherical harmonics, chen2019clusternet ; li2020rotation ; zhang2019rotation employed the rigorous rotation invariance representation of point clouds in terms of angles and distances. Furthermore, we can also apply a multi-level abstraction involving graph convolutional neural networks Kim2020advances or quaternion-based neural networks shen20193d ; zhang2020quaternion ; zhao2020quaternion to acquire rotation equivariance.
Our work stands out from other approaches: The method we use in -PointX is different from the previous group equivariant models. Rather than design new representation or architecture, we provide a plug-in to existing models, which allows us to inherit the advantages of the original model. In our method, we achieve group equivariance by applying the standard MLPs and CNNs, and re-arranging different group conjugations of the operations in a suitable way.
3 Groups and group actions
Let be a group with identity and a nonempty set. An action of to is a map defined by a certain operation satisfying the properties and
If acts on , then acts on every objects built on . For example, if acts on , then acts naturally on functions defined on , the space of point clouds on , and the functions defined on . The following two group actions are important in our consideration:
is the permutation group of , and acts on a point cloud containing points in by permuting the arrangement of points.
is a subgroup of the group of 3D rotations, and acts on
by matrix-vector multiplication.
The group which is the semiproduct of the transition group and a subgroup of , and this group acts on by for and .
It is noted that the group multiplication in is determined as for some . Thus and .
Now we consider a 3D point cloud as a function from a finite set to , where is the dimension of the feature vectors. Sometimes, we extend the domain of this function from to the whole and view each point cloud as a continuous function (with compact support) from to in order to involve existing strong techniques in mathematics in developing suitable FFNNs for point clouds. A point cloud will be then identified with a function on or .
An FFNN is defined to be a sequence
of transformations, where each is a linear or nonlinear transformation between the -th layer to the -th layer
followed by the nonlinear point-wise activation function. MLPs and CNNs are two types of operators that considered in this work, which are the most used FFNNs in practice.
Assume that acts on . Then also acts on functions on point clouds as follows: for each and , the action is defined by
We can describe the action of on intuitively as follows: when we rotate the image due to the orientation , then the feature vector of the new image at the coordinate is exactly the feature vector of the old image at the coordinate .
Let be a linear (or nonlinear) transformation. We say that is -equivariant if and only if for every and . An FFNN is called -equivariant if all of its transformations are -equivariant.
). However, not every linear transformation is-equivariant. Therefore, in the next section we give a natural approach to refine a given FFNN and produce an equivariant one. The notable PointNet++qi2017pointnet++ and PointConvwu2019pointconv layers with rotation groups will be considered to illustrate the theory and to test the effectiveness of the proposed approach.
4 A general scheme for constructing -equivariant NNs for 3D point clouds
4.1 Local feature extractions
Given a point cloud on
, most of the state-of-the-art models for machine learning tasks on 3D point clouds are built based on a local feature extraction. In general, a local feature extraction extracts important information from a group of points around a given point and it is formulated by
where is the centroid, is the local neighbor of and is a local aggregration.
In this paper, we consider the aggregation methods in PointConv and PointNet++, whose formula can be written as:
4.2 A general framework for introducing equivariance
For the sake of simplicity, we choose as a finite group. In case the transformation group is infinite, a finite subgroup is chosen. We will construct a neural networks of the form of a sequence of transformations
is the initial point cloud.
is a subset of and it is chosen by using the furthest point sampling algorithm.
Each grouping layer is a -equivariant transformation.
For each , the grouping layer maps a point clouds to a point cloud . Here, is a subset of and is chosen by using the farthest point sampling algorithm. For each point in , the feature vector is determined as
where is a -equivariant function built from a given local feature extraction . In Eq. (2), we need to define what is the local area around a point . Different ways of choosing local neighbors will lead to different architectures. To simplify the formulation, we choose the local neighbor as
Then Eq. (2) becomes
The algorithm for computing the output of can be separated into three steps as follows:
Algorithm 4.1 (G-PointX).
Input is a point cloud and a non--equivariant SOTA model on point clouds based on a local feature extraction given in (1). While Output is another point cloud .
Step 1 (Sampling and grouping). We determine a subset of by using the furthest point sampling algorithm, and then determine the local area around each point by using the nearest point algorithm as:
Step 2 (Local aggregration): For each from 1 to and for each , we extract a feature vector from each local group by
Step 3. Return
One can verify that
The group layer given in Algorithm 4.1 is equivariant with respect to transformations in the semigroup transitions in .
By applying the above algorithm to typical backbones using in practice PointNet++ and PointConv, we obtain -PointNet++ and -PointConv which can be described in detail as follows:
To evaluate the performance of our technique, we run experiments on two tasks which are classification using ModelNet40 dataset and semantic segmentation using S3DIS dataset. In particular, we initially compare our methods with other equivariant models on rotated dataset to show the enhancement on both performance and complexity. Thereafter we do the ablation study which highlights the benefits of using our techniques, compared to solely employ rotation augmentation on original models. Additionally, the ablation study also provides the information about the performance of different group . Eventually, we conduct experiments on semantic segmentation using the G equivariant models and the original ones.
5.1 GPointX versus other equivariant models
In this section we compares the performance of -PointNet++ and -PointConv with other equivariant models on rotated ModelNet 40 dataset, which contains 13,834 mesh samples from 40 labels such as: table, chair, plane, plant, etc. Notably, the group includes 24 rotation angles which consists of the combination of any and
rotations around the three axes. In terms of the experiment configuration, we reuse the same training pipeline of original models; number of points and number of epochs are 1024 and 200, respectively. However, to keep the memory required the same as original model, batch size is reduced to 8. Moreover, we also addrotation along with augmentations employed in the original paper. The results in Table 1 indicates that both PointNet++ and PointConv are outperforms other models in the literature.
|Model in zhang2019rotation||86.5|
|Model in li2020rotation||89.4|
|G24PointNet++ (Augmentation + )||90.3|
|G24PointConv (Augmentation + )||89.6|
5.2 Ablation study on different groups
We compare the accuracy of -PointNet++ and -PointConv for object classification on ModelNet40 when using different finite rotation groups , , and . Here, is a subgroup of SO(3) with rotations. In particular, contains the combinations of the rotations around the -axis. contains the combination of the rotations around the -axis and the rotations around the -axis. Note that, when using , the Group Equivariant models are equivalent to original models. Table 2 shows that when we increase the elements in group in the results of the model on rotated dataset is also improved. Furthermore, this experiment also highlights the large enhancement of our group equivariant technique, compared to using solely rotation augmentation. To fairly compare the effect of the different groups, we fixed the batch size of all experiments equals to 8 and regarding the remaining parameters, we use the same configurations with the previous section.
5.3 Semantic Segmentation on S3DIS
Regarding semantic segmentation, S3DIS dataset was used to evaluate the performance of group equivariant and original models. This dataset contains contains 271 rooms and the objects are divided into 13 classes. Here, is set to 8 instead of 24 since the objects in the rooms were mostly rotated around Oz. Similar to the previous section, we also fixed the batch size, epochs and number of points of the three models, which are 16, 32, and 4096 and the remaining parameter were kept as the same as the papers. As observed from Table 3, there is a significant improvement in terms of performance of PointConvwu2019pointconv when using our equivariant method.
|Name||Original Model||Group Equivariant Version|
C. Chen, G. Li, R. Xu, T. Chen, M. Wang, and L. Lin.
Clusternet: Deep hierarchical cluster network with rigorously rotation-invariant representation for point cloud analysis.In , pages 4994–5002, 2019.
- (2) H. Chen, S. Liu, W. Chen, H. Li, and R. Hill. Equivariant point network for 3d point cloud analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14514–14523, June 2021.
- (3) T. Cohen and M. Welling. Group equivariant convolutional networks. In International conference on machine learning, pages 2990–2999. PMLR, 2016.
- (4) T. S. Cohen and M. Welling. Steerable cnns. arXiv preprint arXiv:1612.08498, 2016.
- (5) J.-H. Jacobsen, B. De Brabandere, and A. W. Smeulders. Dynamic steerable blocks in deep residual networks. arXiv preprint arXiv:1706.00598, 2017.
- (6) S. Kim, J. Park, and B. Han. Rotation-invariant local-to-global representation learning for 3d point cloud. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 8174–8185. Curran Associates, Inc., 2020.
- (7) R. Kondor. N-body networks: a covariant hierarchical neural network architecture for learning atomic potentials. arXiv preprint arXiv:1803.01588, 2018.
- (8) R. Kondor and S. Trivedi. On the generalization of equivariance and convolution in neural networks to the action of compact groups. In International Conference on Machine Learning, pages 2747–2755. PMLR, 2018.
- (9) W. Kumagai and A. Sannai. Universal approximation theorem for equivariant maps by group cnns. arXiv preprint arXiv:2012.13882, 2020.
- (10) X. Li, R. Li, G. Chen, C.-W. Fu, D. Cohen-Or, and P.-A. Heng. A rotation-invariant framework for deep point cloud analysis. arXiv preprint arXiv:2003.07238, 2020.
- (11) H. Maron, E. Fetaya, N. Segol, and Y. Lipman. On the universality of invariant networks. In International conference on machine learning, pages 4363–4371. PMLR, 2019.
- (12) P. Petersen and F. Voigtlaender. Equivalence of approximation by convolutional neural networks and fully-connected networks. Proceedings of the American Mathematical Society, 148(4):1567–1581, 2020.
- (13) C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017.
- (14) C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413, 2017.
Universal equivariant multilayer perceptrons.In International Conference on Machine Learning, pages 7996–8006. PMLR, 2020.
- (16) S. Sabour, N. Frosst, and G. E. Hinton. Dynamic routing between capsules. arXiv preprint arXiv:1710.09829, 2017.
- (17) W. Shen, B. Zhang, S. Huang, Z. Wei, and Q. Zhang. 3d-rotation-equivariant quaternion neural networks. arXiv preprint arXiv:1911.09040, 2019.
- (18) B. Sturmfels. Algorithms in invariant theory. Springer Science & Business Media, 2008.
- (19) N. Thomas, T. Smidt, S. Kearnes, L. Yang, L. Li, K. Kohlhoff, and P. Riley. Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv preprint arXiv:1802.08219, 2018.
- (20) D. E. Worrall, S. J. Garbin, D. Turmukhambetov, and G. J. Brostow. Harmonic networks: Deep translation and rotation equivariance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5028–5037, 2017.
- (21) W. Wu, Z. Qi, and L. Fuxin. Pointconv: Deep convolutional networks on 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9621–9630, 2019.
- (22) D. Yarotsky. Universal approximations of invariant maps by neural networks. arXiv preprint arXiv:1804.10306, 2018.
- (23) X. Zhang, S. Qin, Y. Xu, and H. Xu. Quaternion product units for deep learning on 3d rotation groups. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7304–7313, 2020.
- (24) Z. Zhang, B.-S. Hua, D. W. Rosen, and S.-K. Yeung. Rotation invariant convolutions for 3d point clouds deep learning. In 2019 International Conference on 3D Vision (3DV), pages 204–213. IEEE, 2019.
- (25) H. Zhao, L. Jiang, J. Jia, P. Torr, and V. Koltun. Point transformer, 2021.
- (26) Y. Zhao, T. Birdal, J. E. Lenssen, E. Menegatti, L. Guibas, and F. Tombari. Quaternion equivariant capsule networks for 3d point clouds. In European Conference on Computer Vision, pages 1–19. Springer, 2020.