Design equivariant neural networks for 3D point cloud

by   Thuan N. A. Trang, et al.
Ton Duc Thang University

This work seeks to improve the generalization and robustness of existing neural networks for 3D point clouds by inducing group equivariance under general group transformations. The main challenge when designing equivariant models for point clouds is how to trade-off the performance of the model and the complexity. Existing equivariant models are either too complicate to implement or very high complexity. The main aim of this study is to build a general procedure to introduce group equivariant property to SOTA models for 3D point clouds. The group equivariant models built form our procedure are simple to implement, less complexity in comparison with the existing ones, and they preserve the strengths of the original SOTA backbone. From the results of the experiments on object classification, it is shown that our methods are superior to other group equivariant models in performance and complexity. Moreover, our method also helps to improve the mIoU of semantic segmentation models. Overall, by using a combination of only-finite-rotation equivariance and augmentation, our models can outperform existing full SO(3)-equivariance models with much cheaper complexity and GPU memory. The proposed procedure is general and forms a fundamental approach to group equivariant neural networks. We believe that it can be easily adapted to other SOTA models in the future.



page 7


Generalizing discrete convolutions for unstructured point clouds

Point clouds are unstructured and unordered data, as opposed to images. ...

Efficient Urban-scale Point Clouds Segmentation with BEV Projection

Point clouds analysis has grasped researchers' eyes in recent years, whi...

IPC-Net: 3D point-cloud segmentation using deep inter-point convolutional layers

Over the last decade, the demand for better segmentation and classificat...

On the Universality of Rotation Equivariant Point Cloud Networks

Learning functions on point clouds has applications in many fields, incl...

A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups

Symmetries and equivariance are fundamental to the generalization of neu...

Bi-Directional Attention for Joint Instance and Semantic Segmentation in Point Clouds

Instance segmentation in point clouds is one of the most fine-grained wa...

E^2PN: Efficient SE(3)-Equivariant Point Network

This paper proposes a new point-cloud convolution structure that learns ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The unprecedented development in 3D acquisition technologies has led to the widely affordable of 3D sensors such as LiDARs, and RGB-D cameras, which directly output data termed Point Cloud. Theoretically, this data can provide rich information in terms of geometric, shape, and scale. However, in practice, capturing the same object from different locations or angles produces different data. Consequently, inducing equivariance under transformations such as permutation, translation and rotation in deep neural network architectures is crucial to improving generalization when dealing with 3D point clouds.

Incorporating equivariant properties in deep neural network architectures has long been a powerful idea. For example, Convolutional Neural Networks (CNNs) can effectively extracting local information in 2D image by having translation equivariance. Therefore, general theory for group equivariant neural networks and their universality have been recently developed. Some research effort was put to study 2D and 3D rotation equivariance for grid or voxel images with positive results. However, few attention has been paid to the study of group equivariant neural networks for 3D point clouds.

This work seeks to improve generalization and data efficiency of neural networks for 3D point clouds through group equivariance since processing raw point cloud directly can eliminate redundant computation and memory required. In particular, we propose a general procedure, called -PointX, to introduce group equivariant to an existing SOTA backbone which is not equivariant yet. The main idea of the procedure is inherited by the symmetricization technique, called Reynolds operator (see sturmfels2008algorithms ). For practical applications, we apply this procedure to two typical backbones which are PointNet++ and PointConv to obtain efficient group equivariant models for 3D point clouds that we term -PointNet and -PointConv, respectively.

We demonstrate the effectiveness of -PointX through extensive empirical experiments. For the classification benchmark, -PointX outperforms state-of-the-art deep neural networks for 3D point clouds when dealing with rotated data. For semantic segmentation benchmark, -PointX outperforms original models by a significant margin. In short, the main contribution of this paper are:

  • A detailed formulation of a novel group equivariant CNN and MLP termed -PointConv and -PointNet++

  • Comprehensive and extensive experiments demonstrating the effectiveness of our proposed method.

2 Related works

Deep learning with raw 3D point clouds: Recently, several approaches extracting features from 3D point clouds have been presented in the literature. As a pioneer in raw point cloud processing, PointNet qi2017pointnet

extracted features by simply employing a combination of MLPs and max-pooling. PointNet++

qi2017pointnet++ , afterwards, introduces a new architecture called hierarchical structure which efficiently aggregated information from local area. Notably, PointNet++ used PointNet operation to extract feature, which could be considered as MLP operation for point cloud. Later papers inspired by PointNet++ often keep the hierarchical structure as the same and only modify the operation. Typically, PointConvwu2019pointconv replaced MLPs with Convolution layers and PointTransformerzhao2021point used self-attention operation to collect information in local region. As only relied on relative position and color, PointNet++, PointConvwu2019pointconv and PointTransformerzhao2021point are equivariant with translation and permutation.

Group equivariant neural networks: There are several methods tried to embed group equivariant properties into the architecture. For example, the patch-wise -rotation equivariance in Harmonic Networks worrall2017harmonic was obtained by using circular harmonics. Moreover, cohen2016steerable and jacobsen2017dynamic proposed blocks named Steerable CNNs, which can be equivariant to group of 3D rotations. This type of block, then, applied on capsule networks sabour2017dynamic and with N-body networks kondor2018n . The notions of equivariance and convolution were also generalized in neural networks to the action of compact groups in kondor2018generalization . Apart from that, some further abstract setting and universality results on group equivariance are described in kumagai2020universal ; maron2019universality ; petersen2020equivalence ; ravanbakhsh2020universal ; yarotsky2018universal .

Group equivariant neural networks for 3D point clouds: Despite the rapid development in group equivariant neural networks, the number of papers designing group equivariant neural networks for point cloud is modest. However, rotation equivariance seems indispensable for architecture in this field since the point clouds are heavily depend on the location and the orientations of the Lidar sensors. In previous works, that property was achieved by several approaches. For instance, while thomas2018tensor used filters built from spherical harmonics, chen2019clusternet ; li2020rotation ; zhang2019rotation employed the rigorous rotation invariance representation of point clouds in terms of angles and distances. Furthermore, we can also apply a multi-level abstraction involving graph convolutional neural networks Kim2020advances or quaternion-based neural networks shen20193d ; zhang2020quaternion ; zhao2020quaternion to acquire rotation equivariance.

Our work stands out from other approaches: The method we use in -PointX is different from the previous group equivariant models. Rather than design new representation or architecture, we provide a plug-in to existing models, which allows us to inherit the advantages of the original model. In our method, we achieve group equivariance by applying the standard MLPs and CNNs, and re-arranging different group conjugations of the operations in a suitable way.

3 Groups and group actions

Let be a group with identity and a nonempty set. An action of to is a map defined by a certain operation satisfying the properties and

If acts on , then acts on every objects built on . For example, if acts on , then acts naturally on functions defined on , the space of point clouds on , and the functions defined on . The following two group actions are important in our consideration:

  • is the permutation group of , and acts on a point cloud containing points in by permuting the arrangement of points.

  • is a subgroup of the group of 3D rotations, and acts on

    by matrix-vector multiplication.

  • The group which is the semiproduct of the transition group and a subgroup of , and this group acts on by for and .

It is noted that the group multiplication in is determined as for some . Thus and .

Now we consider a 3D point cloud as a function from a finite set to , where is the dimension of the feature vectors. Sometimes, we extend the domain of this function from to the whole and view each point cloud as a continuous function (with compact support) from to in order to involve existing strong techniques in mathematics in developing suitable FFNNs for point clouds. A point cloud will be then identified with a function on or .

An FFNN is defined to be a sequence

of transformations, where each is a linear or nonlinear transformation between the -th layer to the -th layer

followed by the nonlinear point-wise activation function

. MLPs and CNNs are two types of operators that considered in this work, which are the most used FFNNs in practice.

Assume that acts on . Then also acts on functions on point clouds as follows: for each and , the action is defined by

We can describe the action of on intuitively as follows: when we rotate the image due to the orientation , then the feature vector of the new image at the coordinate is exactly the feature vector of the old image at the coordinate .

Let be a linear (or nonlinear) transformation. We say that is -equivariant if and only if for every and . An FFNN is called -equivariant if all of its transformations are -equivariant.

Thanks to the point-wise structure, every nonlinear activation function is -equivariant with respect to any group (see cohen2016group ; kondor2018generalization

). However, not every linear transformation is

-equivariant. Therefore, in the next section we give a natural approach to refine a given FFNN and produce an equivariant one. The notable PointNet++qi2017pointnet++ and PointConvwu2019pointconv layers with rotation groups will be considered to illustrate the theory and to test the effectiveness of the proposed approach.

4 A general scheme for constructing -equivariant NNs for 3D point clouds

4.1 Local feature extractions

Given a point cloud on

, most of the state-of-the-art models for machine learning tasks on 3D point clouds are built based on a local feature extraction. In general, a local feature extraction extracts important information from a group of points around a given point and it is formulated by


where is the centroid, is the local neighbor of and is a local aggregration.

In this paper, we consider the aggregation methods in PointConv and PointNet++, whose formula can be written as:

  • PointNet++

  • PointConv

4.2 A general framework for introducing equivariance

For the sake of simplicity, we choose as a finite group. In case the transformation group is infinite, a finite subgroup is chosen. We will construct a neural networks of the form of a sequence of transformations


  • is the initial point cloud.

  • is a subset of and it is chosen by using the furthest point sampling algorithm.

  • Each grouping layer is a -equivariant transformation.

For each , the grouping layer maps a point clouds to a point cloud . Here, is a subset of and is chosen by using the farthest point sampling algorithm. For each point in , the feature vector is determined as


where is a -equivariant function built from a given local feature extraction . In Eq. (2), we need to define what is the local area around a point . Different ways of choosing local neighbors will lead to different architectures. To simplify the formulation, we choose the local neighbor as

Then Eq. (2) becomes


The algorithm for computing the output of can be separated into three steps as follows:

Algorithm 4.1 (G-PointX).

Input is a point cloud and a non--equivariant SOTA model on point clouds based on a local feature extraction given in (1). While Output is another point cloud .

  • Step 1 (Sampling and grouping). We determine a subset of by using the furthest point sampling algorithm, and then determine the local area around each point by using the nearest point algorithm as:

  • Step 2 (Local aggregration): For each from 1 to and for each , we extract a feature vector from each local group by

  • Step 3. Return

One can verify that

Theorem 4.2.

The group layer given in Algorithm 4.1 is equivariant with respect to transformations in the semigroup transitions in .

By applying the above algorithm to typical backbones using in practice PointNet++ and PointConv, we obtain -PointNet++ and -PointConv which can be described in detail as follows:

  • -PointNet++:

  • -PointConv

5 Experiments:

To evaluate the performance of our technique, we run experiments on two tasks which are classification using ModelNet40 dataset and semantic segmentation using S3DIS dataset. In particular, we initially compare our methods with other equivariant models on rotated dataset to show the enhancement on both performance and complexity. Thereafter we do the ablation study which highlights the benefits of using our techniques, compared to solely employ rotation augmentation on original models. Additionally, the ablation study also provides the information about the performance of different group . Eventually, we conduct experiments on semantic segmentation using the G equivariant models and the original ones.

5.1 GPointX versus other equivariant models

In this section we compares the performance of -PointNet++ and -PointConv with other equivariant models on rotated ModelNet 40 dataset, which contains 13,834 mesh samples from 40 labels such as: table, chair, plane, plant, etc. Notably, the group includes 24 rotation angles which consists of the combination of any and

rotations around the three axes. In terms of the experiment configuration, we reuse the same training pipeline of original models; number of points and number of epochs are 1024 and 200, respectively. However, to keep the memory required the same as original model, batch size is reduced to 8. Moreover, we also add

rotation along with augmentations employed in the original paper. The results in Table 1 indicates that both PointNet++ and PointConv are outperforms other models in the literature.

Name Rotated Dataset
QENet zhao2020quaternion 74.4
Model in zhang2019rotation 86.5
ClusterNet chen2019clusternet 87.1
SPConv Chen2021CVPR 88.3
Model in li2020rotation 89.4
G24PointNet++ (Augmentation + ) 90.3
G24PointConv (Augmentation + ) 89.6
Table 1: Results of different equivariant model on Rotated ModelNet40.
Figure 1: Convergence speed of the test accuracy on the original test set of -PointConv and -PointNet++ using in comparison with standard models PointConv and PointNet++ with data augmentation using the same sampled rotations.

5.2 Ablation study on different groups

We compare the accuracy of -PointNet++ and -PointConv for object classification on ModelNet40 when using different finite rotation groups , , and . Here, is a subgroup of SO(3) with rotations. In particular, contains the combinations of the rotations around the -axis. contains the combination of the rotations around the -axis and the rotations around the -axis. Note that, when using , the Group Equivariant models are equivalent to original models. Table 2 shows that when we increase the elements in group in the results of the model on rotated dataset is also improved. Furthermore, this experiment also highlights the large enhancement of our group equivariant technique, compared to using solely rotation augmentation. To fairly compare the effect of the different groups, we fixed the batch size of all experiments equals to 8 and regarding the remaining parameters, we use the same configurations with the previous section.

PointNet++ 86.00 88.0 89.6 90.3
PointConv 80.4 84.6 88.3 89.6
Table 2: Results of different groups on Rotated ModelNet40.

5.3 Semantic Segmentation on S3DIS

Regarding semantic segmentation, S3DIS dataset was used to evaluate the performance of group equivariant and original models. This dataset contains contains 271 rooms and the objects are divided into 13 classes. Here, is set to 8 instead of 24 since the objects in the rooms were mostly rotated around Oz. Similar to the previous section, we also fixed the batch size, epochs and number of points of the three models, which are 16, 32, and 4096 and the remaining parameter were kept as the same as the papers. As observed from Table 3, there is a significant improvement in terms of performance of PointConvwu2019pointconv when using our equivariant method.

Name Original Model Group Equivariant Version
PointNet++ 0.535 0.546
PointConv 0.530 0.578
Table 3: Results of PointX and GPointX on S3DIS.
Figure 2: Semantic segmentation with S3DIS dataset.