PointMixup: Augmentation for Point Clouds

08/14/2020 ∙ by Yunlu Chen, et al. ∙ University of Amsterdam 10

This paper introduces data augmentation for point clouds by interpolation between examples. Data augmentation by interpolation has shown to be a simple and effective approach in the image domain. Such a mixup is however not directly transferable to point clouds, as we do not have a one-to-one correspondence between the points of two different objects. In this paper, we define data augmentation between point clouds as a shortest path linear interpolation. To that end, we introduce PointMixup, an interpolation method that generates new examples through an optimal assignment of the path function between two point clouds. We prove that our PointMixup finds the shortest path between two point clouds and that the interpolation is assignment invariant and linear. With the definition of interpolation, PointMixup allows to introduce strong interpolation-based regularizers such as mixup and manifold mixup to the point cloud domain. Experimentally, we show the potential of PointMixup for point cloud classification, especially when examples are scarce, as well as increased robustness to noise and geometric transformations to points. The code for PointMixup and the experimental details are publicly available.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The goal of this paper is to classify a cloud of points into their semantic category, be it an airplane, a bathtub or a chair. Point cloud classification is challenging, as they are sets and hence invariant to point permutations. Building on the pioneering PointNet by Qi

et al. [1]

, multiple works have proposed deep learning solutions to point cloud classification 

[2, 12, 29, 30, 36, 23]. Given the progress in point cloud network architectures, as well as the importance of data augmentation in improving classification accuracy and robustness, we study how could data augmentation be naturally extended to support also point cloud data, especially considering the often smaller size of point clouds datasets (e.g. ModelNet40 [31]). In this work, we propose point cloud data augmentation by interpolation of existing training point clouds.

To perform data augmentation by interpolation, we take inspiration from augmentation in the image domain. Several works have shown that generating new training examples, by interpolating images and their corresponding labels, leads to improved network regularization and generalization, e.g., [8, 24, 34, 26]. Such a mixup is feasible in the image domain, due to the regular structure of images and one-to-one correspondences between pixels. However, this setup does not generalize to the point cloud domain, since there is no one-to-one correspondence and ordering between points. To that end, we seek to find a method to enable interpolation between permutation invariant point sets.

In this work, we make three contributions. First, we introduce data augmentation for point clouds through interpolation and we define the augmentation as a shortest path interpolation. Second, we propose PointMixup, an interpolation between point clouds that computes the optimal assignment as a path function between two point clouds, or the latent representations in terms of point cloud. The proposed interpolation strategy therefore allows usage of successful regularizers of Mixup and Manifold Mixup [26] on point cloud. We prove that (i) our PointMixup indeed finds the shortest path between two point clouds; (ii) the assignment does not change for any pairs of the mixed point clouds for any interpolation ratio; and (iii) our PointMixup is a linear interpolation, an important property since labels are also linearly interpolated. Figure 1

shows two pairs of point clouds, along with our interpolations. Third, we show the empirical benefits of our data augmentation across various tasks, including classification, few-shot learning, and semi-supervised learning. We furthermore show that our approach is agnostic to the network used for classification, while we also become more robust to noise and geometric transformations to the points.

Figure 1: Interpolation between point clouds. We show the interpolation between examples from different classes (airplane/chair, and monitor/bathtub) with multiple ratios . The interpolants are learned to be classified as the first class and the second class. The interpolation is not obtained by learning, but induced by solving the optimal bijective correspondence which allows the minimum overall distance that each point in one point cloud moves to the assigned point in the other point cloud.

2 Related Work

2.0.1 Deep learning for point clouds.

Point clouds are unordered sets and hence early works focus on analyzing equivalent symmetric functions which ensures permutation invariance. [17, 1, 33]. The pioneering PointNet work by Qi et al. [1]

presented the first deep network that operates directly on unordered point sets. It learns the global feature with shared multi-layer perceptions and a max pooling operation to ensure permutation invariance. PointNet++ 

[2]

extends this idea further with hierarchical structure by relying on a heuristic method of farthest point sampling and grouping to build the hierarchy. Likewise, other recent methods follow to learn hierarchical local features either by grouping points in various manners 

[10, 12, 29, 30, 32, 36, 23]. Li et al. [12] propose to learn a transformation from the input points to simultaneously solve the weighting of input point features and permutation of points into a latent and potentially canonical order. Xu et al. [32] extends 2D convolution to 3D point clouds by parameterizing a family of convolution filters. Wang et al. [29] proposed to leverage neighborhood structures in both point and feature spaces.

In this work, we aim to improve point cloud classification for any point-based approach. To that end, we propose a new model-agnostic data augmentation. We propose a Mixup regularization for point clouds and show that it can build on various architectures to obtain better classification results by reducing the generalization error in classification tasks. A very recent work by Li et al. [11] also considers improving point cloud classification by augmentation. They rely on auto-augmentation and a complicated adversarial training procedure, whereas in this work we propose to augment point clouds by interpolation.

2.0.2 Interpolation-based regularization.

Employing regularization approaches for training deep neural networks to improve their generalization performances have become standard practice in deep learning. Recent works consider a regularization by interpolating the example and label pairs, commonly known as Mixup 

[24, 8, 34]. Manifold Mixup [26]

extends Mixup by interpolating the hidden representations at multiple layers. Recently, an effort has been made on applying Mixup to various tasks such as object detection 

[35] and segmentation [7]. Different from existing works, which are predominantly employed in the image domain, we propose a new optimal assignment Mixup paradigm for point clouds, in order to deal with their permutation-invariant nature.

Recently, Mixup [34]

has also been investigated from a semi-supervised learning perspective

[3, 27, 2]. Mixmatch [3] guesses low-entropy labels for unlabelled data-augmented examples and mixes labelled and unlabelled data using Mixup [34]. Interpolation Consistency Training [27] utilizes the consistency constraint between the interpolation of unlabelled points with the interpolation of the predictions at those points. In this work, we show that our PointMixup can be integrated in such frameworks to enable semi-supervised learning for point clouds.

3 Point cloud augmentation by interpolation

3.1 Problem setting

In our setting, we are given a training set consisting of point clouds. is a point cloud consisting of points, is the 3D point, is the set of such 3D point clouds with elements. is the one-hot class label for a total of classes. The goal is to train a function that learns to map a point cloud to a semantic label distribution. Throughout our work, we remain agnostic to the type of function used for the mapping and we focus on data augmentation to generate new examples.

Data augmentation is an integral part of training deep neural networks, especially when the size of the training data is limited compared to the size of the model parameters. A popular data augmentation strategy is Mixup [34]. Mixup performs augmentation in the image domain by linearly interpolating pixels, as well as labels. Specifically, let and denote two images. Then a new image and its label are generated as:

(1)
(2)

where denotes the mixup ratio. Usually

is sampled from a beta distribution

. Such a direct interpolation is feasible for images as the data is aligned. In point clouds, however, linear interpolation is not straightforward. The reason is that point clouds are sets of points in which the point elements are orderless and permutation-invariant. We must, therefore, seek a definition of interpolation on unordered sets.

3.2 Interpolation between point clouds

Let and denote two training examples on which we seek to perform interpolation with ratio to generate new training examples. Given a pair of source examples and , an interpolation function, can be any continuous function, which forms a curve that joins and in a metric space with a proper distance function . This means that it is up to us to define what makes an interpolation good. We define the concept of shortest-path interpolation in the context of point cloud:

Definition 1 (Shortest-path interpolation)

In a metric space , a shortest-path interpolation is an interpolation between the given pair of source examples and , such that for any , holds for being the interpolant.

We say that Definition 1 ensures the shortest path property because the triangle inequality holds for any properly defined distance : . The intuition behind this definition is that the shortest path property ensures the uniqueness of the label distribution on the interpolated data. To put it otherwise, when computing interpolants from different sources, the interpolants generated by the shortest-path interpolation is more likely to be discriminative than the ones generated by a non-shortest-path interpolation.

Figure 2: Intuition of shortest-path interpolation. The examples lives on a metric space

as dots in the figure. The dashed lines are the interpolation paths between different pairs of examples. When the shortest-path property is ensured (left), the interpolation paths from different pairs of source examples are likely to be not intersect in a complicated metric space. While in non-shortest path interpolation (right), the paths can intertwine with each other with a much higher probability, making it hard to tell which pair of source examples does the mixed data come from.

To define an interpolation for point clouds, therefore, we must first select a reasonable distance metric. Then, we opt for the shorterst-path interpolation function based on the selected distance metric. For point clouds a proper distance metric is the Earth Mover’s Distance (EMD), as it captures well not only the geometry between two point clouds, but also local details as well as density distributions [5, 1, 13]. EMD measures the least amount of total displacement required for each of the points in the first point cloud, , to match a corresponding point in the second point cloud, . Formally, the EMD for point clouds solves the following assignment problem:

(3)

where is the set of possible bijective assignments, which give one-to-one correspondences between points in the two point clouds. Given the optimal assignment , the EMD is then defined as the average effort to move points to :

(4)

3.3 PointMixup: Optimal assignment interpolation for point clouds

We propose an interpolation strategy, which can be used for augmentation that is analogous of Mixup [34] but for point clouds. We refer to this proposed PointMixup as Optimal Assignment (OA) Interpolation, as it relies on the optimal assignment on the basis of the EMD to define the interpolation between clouds. Given the source pair of point clouds and , the Optimal Assignment (OA) interpolation is a path function . With ,

(5)
(6)

in which is the optimal assignment from to defined by Eq. 3. Then the interpolant (or when there is no confusion) generated by the OA interpolation path function is the required augmented data for point cloud Mixup.

(7)

Under the view of being a path function in the metric space , is expected to be the shortest path joining and since the definition of the interpolation is induced from the EMD.

3.4 Analysis

Intuitively we expect that PointMixup is a shortest path linear interpolation. That is, the interpolation lies on the shortest path joining the source pairs, and the interpolation is linear with regard to in , since the definition of the interpolation is derived from the EMD. However, it is non-trivial to show the optimal assignment interpolation abides to a shortest path linear interpolation, because the optimal assignment between the mixed point cloud and either of the source point cloud is unknown. It is, therefore, not obvious that we can ensure whether there exists a shorter path between the mixed examples and the source examples. To this end, we need to provide an in-depth analysis.

To ensure the uniqueness of the label distribution from the mixed data, we need to show that the shortest path property w.r.t. the EMD is fulfilled. Moreover, we need to show that the proposed interpolation is linear w.r.t the EMD, in order to ensure that the input interpolation has the same ratio as the label interpolation. Besides, we evaluate the assignment invariance property as a prerequisite knowledge for the proof for the linearity. This property implies that there exists no shorter path between interpolants with different , i.e., the shortest path between the interpolants is a part of the shortest path between the source examples. Due to space limitation, we sketch the proof for each property. The complete proofs are available in the supplementary material.

We start with the shortest path property. Since the EMD for point cloud is a metric, the triangle inequality holds (for which a formal proof can be found in [19]). Thus we formalize the shortest path property into the following proposition:

Property 1 (shortest path)

Given the source examples and , .

Sketch of Proof    From the definition of the EMD we can derive . Then from the triangle inequity of the EMD, only the equality remains. ∎

We then introduce the assignment invariance property of the OA Mixup as an intermediate step for the proof of the linearity of OA Mixup. The property shows that the assignment does not change for any pairs of the mixed point clouds with different . Moreover, the assignment invariance property is important to imply that the shortest path between the any two mixed point clouds is part of the shortest path between the two source point clouds.

Property 2 (assignment invariance)

and are two mixed point clouds from the same given source pair of examples and as well as the mix ratios and such that . Let the points in and be and , where is the optimal assignment from to . Then the identical assignment is the optimal assignment from to .

Sketch of Proof    We first prove that the identical mapping is the optimal assignment from to from the definition of the EMD. Then we prove that is the optimal assignment from to . Finally we prove that the identical mapping is the optimal assignment from to similarly as the proof for the first intermediate argument. ∎

Given the property of assignment invariance, the linearity follows:

Property 3 (linearity)

For any mix ratios and such that , the mixed point clouds and satisfies that .

Sketch of Proof    The proof can be directly derived from the fact that the identical mapping is the optimal assignment between and . ∎

The linear property of our interpolation is important, as we jointly interpolate the point clouds and the labels. By ensuring that the point cloud interpolation is linear, we ensure that the input interpolation has the same ratio as the label interpolation.

On the basis of the properties, we find that PointMixup is a shortest path linear interpolation between point clouds in .

3.5 Manifold PointMixup: Interpolate between latent point features

In standard PointMixup, only the inputs, i.e., the XYZ point cloud coordinates are mixed. The input XYZs are low-level geometry information and sensitive to disturbances and transformations, which in turn limits the robustness of the PointMixup. Inspired by Manifold Mixup [26], we can also use the proposed interpolation solution to mix the latent representations in the hidden layers of point cloud networks, which are trained to capture salient and high-level information that is less sensitive to transformations. PointMixup can be applied for the purpose of Manifold Mixup to mix both at the XYZs and different levels of latent point cloud features and maintain their respective advantages, which is expected to be a stronger regularizer for improved performance and robustness.

We describe how to mix the latent representations. Following [26], at each batch we randomly select a layer to perform PointMixup from a set of layers , which includes the input layer. In a point cloud network network, the intermediate latent representation at layer (before the global aggregation stage such as the max pooling aggregation in PointNet [1] and PointNet++ [2]) is , in which is 3D point coordinate and is the corresponding high-dimensional feature. For the mixed latent representation, given the latent representation of two source examples are and , the optimal assignment is obtained by the 3D point coordinates , and the mixed latent representation then becomes

Specifically in PointNet++, three layers of representations are randomly selected to perform Manifold Mixup: the input, and the representations after the first and the second SA modules (See appendix of [2]).

4 Experiments

4.1 Setup

Datasets. We focus in our experiments on the ModelNet40 dataset [31]. This dataset contains 12,311 CAD models from 40 man-made object categories, split into 9,843 for training and 2,468 for testing. We furthermore perform experiments on the ScanObjectNN dataset [25]. This dataset consists of real-world point cloud objects, rather than sampled virtual point clouds. The dataset consists of 2,902 objects and 15 categories. We report on two variants of the dataset, a standard variant OBJ_ONLY and one with heavy permutations from rigid transformations PB_T50_RS [25].

Following [12], we discriminate between settings where each dataset is pre-aligned and unaligned with horizontal rotation on training and test point cloud examples. For the unaligned settings, we randomly rotate the training point cloud along the up-axis. Then, before solving the optimal assignment, we perform a simple additional alignment step to fit and align the symmetry axes between the two point clouds to be mixed. Through this way, the point clouds are better aligned and we obtain more reasonable point correspondences. Last, we also perform experiments using only 20% of the training data.

Network architectures. The main network architecture used throughout the paper is PointNet++ [2]. We also report results with PointNet [1] and DGCNN [29], to show that our approach is agnostic to the architecture that is employed. PointNet learns a permutation-invariant set function, which does not capture local structures induced by the metric space the points live in. PointNet++ is a hierarchical structure, which segments a point cloud into smaller clusters and applies PointNet locally. DGCNN performs hierarchical operations by selecting a local neighbor in the feature space instead of the point space, resulting in each point having different neighborhoods in different layers.

Experimental details. We uniformly sample 1,024 points on the mesh faces according to the face area and normalize them to be contained in a unit sphere, which is a standard setting [1, 2, 12]

. In case of mixing clouds with different number of points, we can simply replicate random elements from the each point set to reach the same cardinality. During training, we augment the point clouds on-the-fly with random jitter for each point using Gaussian noise with zero mean and 0.02 standard deviation. We implement our approach in PyTorch 

[14]. For network optimization, we use the Adam optimizer with an initial learning rate of

. The model is trained for 300 epochs with a batch size of 16. We follow previous work 

[34, 26] and draw from a beta distribution . We also perform Manifold Mixup [26]

in our approach, through interpolation on the transformed and pooled points in intermediate network layers. In this work, we opt to use the efficient algorithm and adapt the open-source implementation from

[13] to solve the optimal assignment approximation. Training for 300 epochs takes around 17 hours without augmentation and around 19 hours with PointMixup or Manifold PointMixup on a single NVIDIA GTX 1080 ti.

Figure 3: Baseline interpolation variants. Top: point cloud interpolation through random assignment. Bottom: interpolation through sampling.

Baseline interpolations. For our comparisons to baseline point cloud augmentations, we compare to two variants. The first variant is random assignment interpolation, where a random assignment is used, to connect points from both sets, yielding:

The second variant is point sampling interpolation, where random draws without replacement of points from each set are made according to the sampling frequency :

where denotes a randomly sampled subset of , with elements. ( is the floor function.) And similar for with elements, such that contains exactly points. The intuition of the point sampling variant is that for point clouds as unordered sets, one can move one point cloud to another through a set operation such that it removes several random elements from set and replace them with same amount of elements from .

4.2 Point cloud classification ablations

We perform four ablation studies to show the workings of our approach with respect to the interpolation ratio, comparison to baseline interpolations and other regularizations, as well robustness to noise.

Figure 4: Effect of interpolation ratios. MM denotes Manifold Mixup.

Effect of interpolation ratio. The first ablation study focuses on the effect of the interpolation ratio in the data augmentation for point cloud classification. We perform this study on ModelNet40 using the PointNet++ architecture. The results are shown in Fig. 4 for the pre-aligned setting. We find that regardless of the interpolation ratio used, our approach provides a boost over the setting without augmentation by interpolation. PointMixup positively influences point cloud classification. The inclusion of manifold mixup adds a further boost to the scores. Throughout further experiments, we use for input mixup and for manifold mixup in unaligned setting, and for input mixup and for manifold mixup in pre-aligned setting.

Comparison to baseline interpolations. In the second ablation study, we investigate the effectiveness of our PointMixup compared to the two interpolation baselines. We again use ModelNet40 and PointNet++. We perform the evaluation on both the pre-aligned and unaligned dataset variants, where for both we also report results with a reduced training set. The results are shown in Table 1. Across both the alignment variants and dataset sizes, our PointMixup obtains favorable results. This result highlights the effectiveness of our approach, which abides to the shortest path linear interpolation definition, while the baselines do not.

No mixup Random assignment Point sampling PointMixup
Manifold mixup
Full dataset
Unaligned 90.7 90.8 91.1 90.9 91.4 91.3 91.7
Pre-aligned 91.9 91.6 91.9 92.2 92.5 92.3 92.7
Reduced dataset
Unaligned 84.4 84.8 85.4 85.7 86.5 86.1 86.6
Pre-aligned 86.1 85.5 87.3 87.2 87.6 87.6 88.6
Table 1: Comparison of PointMixup to baseline interpolations on ModelNet40 using PointNet++. PointMixup compares favorable to excluding interpolation and to the baselines, highlighting the benefits of our shortest path interpolation solution.

height=2.4cm PointMixup Baseline with no mixing 86.1 Mixup 87.6 Manifold mixup 88.6 Mix input, not labels 86.6 Mix input from same class 86.4 Mixup latent (layer 1) 86.9 Mixup latent (layer 2) 86.8 Label smoothing (0.1) 87.2 Label smoothing (0.2) 87.3

height=2.4cm Transforms PointMixup w/o MM w/ MM Noise 91.3 91.9 92.3 Noise 35.1 51.5 56.5 Noise 4.03 4.27 7.42 Z-rotation [-30,30] 74.3 70.9 77.8 X-rotation [-30,30] 73.2 70.8 76.8 Y-rotation [-30,30] 87.6 87.9 88.7 Scale (0.6) 85.8 84.5 86.3 Scale (2.0) 59.2 67.7 72.9 DropPoint (0.2) 84.9 78.1 90.9

Table 2: Evaluating our approach to other data augmentations (left) and its robustness to noise and transformations (right). We find that our approach with manifold mixup (MM) outperforms augmentations such as label smoothing and other variations of mixup. For the robustness evaluation, we find that our approach with strong regularization power from manifold mixup provides more robustness to random noise and geometric transformations.

PointMixup with other regularizers. Third, we evaluate how well PointMixup works by comparing to multiple existing data regularizers and mixup variants, again on ModelNet40 and PointNet++. We investigate the following augmentations: (i) Mixup [34], (ii) Manifold Mixup [26], (iii) mix input only without target mixup, (iv) mix latent representation at a fixed layer (manifold mixup does so at random layers), and (v) label smoothing [22]. Training is performed on the reduced dataset to better highlight their differences. We show the results in Table 2 on the left. Our approach with manifold mixup obtains the highest scores. The label smoothing regularizer is outperformed, while we also obtain better scores than the mixup variants. We conclude that PointMixup is forms an effective data augmentation for point clouds.

Robustness to noise. By adding additional augmented training examples, we enrich the dataset. This enrichment comes with additional robustness with respect to noise in the point clouds. We evaluate the robustness by adding random noise perturbations on point location, scale, translation and different rotations. Note that for evaluation of robustness against up-axis rotation, we use the models which are trained with the pre-aligned setting, in order to test also the performance against rotation along the up-axis as a novel transform. The results are in Table 2 on the right. Overall, our approach including manifold mixup provides more stability across all perturbations. For example, with additional noise (), we obtain an accuracy of 56.5, compared to 35.1 for the baseline. We similar trends for scaling (with a factor of two), with an accuracy of 72.9 versus 59.2. We conclude that PointMixup makes point cloud networks such as PointNet++ more stable to noise and rigid transformations.

Figure 5: Qualitative examples of PointMixup. We provide eight visualizations of our interpolation. The four examples on the left show interpolations for different configurations of cups and tables. The four examples on the right show interpolations for different chairs and cars.

Qualitative analysis. In Figure 5, we show eight examples of PointMix for point cloud interpolation; four interpolations of cups and tables, four interpolations of chairs and cars. Through our shortest path interpolation, we end up at new training examples that exhibit characteristics of both classes, making for sensible point clouds and mixed labels, which in turn indicate why PointMixup is beneficial for point cloud classification.

4.3 Evaluation on other networks and datasets

With PointMixup, new point clouds are generated by interpolating existing point clouds. As such, we are agnostic to the type of network or dataset. To highlight this ability, we perform additional experiments on extra networks and an additional point cloud dataset.

height=0.9cm PointNet DGCNN w/o MM w/o MM w/ MM Full 89.2 89.9 92.7 92.9 93.1 Reduced 81.3 83.4 88.2 88.8 89.0

height=0.9cm ScanObjectNN w/o MM w/ MM Standard 86.6 87.6 88.5 Perturbed 79.3 80.2 80.6

Table 3: PointMixup on other networks (left) and another dataset (right). We find our approach is beneficial regardless the network or dataset.

PointMixup on other network architectures. We show the effect of PointMixup to two other networks, namely PointNet [1] and DGCNN [29]. The experiments are performed on ModelNet40. For PointNet, we perform the evaluation on the unaligned setting and for DGCNN with pre-aligned setting to remain consistent with the alignment choices made in the respective papers. The results are shown in Table 3 on the left. We find improvements when including PointMixup for both network architectures.

PointMixup on real-world point clouds. We also investigate PointMixup on point clouds from real-world object scans, using ScanObjectNN [25], which collects object from 3D scenes in SceneNN [9] and ScanNet [4]. Here, we rely on PointNet++ as network. The results in Table 3 on the right show that we can adequately deal with real-world point cloud scans, hence we are not restricted to point clouds from virtual scans. This result is in line with experiments on point cloud perturbations.

4.4 Beyond standard classification

The fewer training examples available, the stronger the need for additional examples through augmentation. Hence, we train PointNet++ on ModelNet40 in both a few-shot and semi-supervised setting.

Semi-supervised learning. Semi-supervised learning learns from a dataset where only a small portion of data is labeled. Here, we show how PointMixup directly enables semi-supervised learning for point clouds. We start from Interpolation Consistency Training [27], a state-of-the-art semi-supervised approach, which utilizes Mixup between unlabeled points. Here, we use our Mixup for point clouds within their semi-supervised approach. We evaluate on ModelNet40 using 400, 600, and 800 labeled point clouds. The result of semi-supervised learning are illustrated in Table 4 on the left. Compared to the supervised baseline, which only uses the available labelled examples, our mixup enables the use of additional unlabelled training examples, resulting in a clear boost in scores. With 800 labelled examples, the accuracy increases from 73.5% to 82.0%, highlighting the effectiveness of PointMixup in a semi-supervised setting.

height=1cm Semi-supervised classification Supervised [27]+PointMixup 400 examples 69.4 76.7 600 examples 72.6 80.8 800 examples 73.5 82.0

height=1cm Few-shot classification [3] + PointMixup 5-way 1-shot 72.3 77.2 5-way 3-shot 80.2 82.2 5-way 5-shot 84.2 85.9

Table 4: Evaluating PointMixup in the context of semi-supervised (left) and few-shot learning (right). When examples are scarce, as is the case for both settings, using our approach provides a boost to the scores.

Few-shot learning. Few-shot classification aims to learn a classifier to recognize unseen classes during training with limited examples. We follow  [28, 18, 3, 6, 21] to regard few-shot learning a typical meta-learning method, which learns how to learn from limited labeled data through training from a collection of tasks, i.e., episodes. In an -way -shot setting, in each task, classes are selected and examples for each class are given as a support set, and the query set consists of the examples to be predicted. We perform few-shot classification on ModelNet40, from which we select 20 classes for training, 10 for validation, and 10 for testing. We utilize PointMixup within ProtoNet [3] by constructing mixed examples from the support set and update the model with the mixed examples before making predictions on the query set. We refer to the supplementary material for the details of our method and the settings. The results in Table 4 on the right show that incorporating our data augmentation provides a boost in scores, especially in the one-shot setting, where the accuracy increases from 72.3% to 77.2%.

5 Conclusion

This work proposes PointMixup for data augmentation on point clouds. Given the lack of data augmentation by interpolation on point clouds, we start by defining it as a shortest path linear interpolation. We show how to obtain PointMixup between two point clouds by means of an optimal assignment interpolation between their point sets. As such, we arrive at a Mixup for point clouds, or latent point cloud representations in the sense of Manifold Mixup, that can handle permutation invariant nature. We first prove that PointMixup abides to our shortest path linear interpolation definition. Then, we show through various experiments that PointMixup matters for point cloud classification. We show that our approach outperforms baseline interpolations and regularizers. Moreover, we highlight increased robustness to noise and geometric transformations, as well as its general applicability to point-based networks and datasets. Lastly, we show the potential of our approach in both semi-supervised and few-shot settings. The generic nature of PointMixup allows for a comprehensive embedding in point cloud classification.

Acknowledgment This research was supported in part by the SAVI/MediFor and the NWO VENI What & Where projects. We thank the anonymous reviewers for helpful comments and suggestions.

References

  • [1] Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.: Learning representations and generative models for 3d point clouds. In: ICML (2018)
  • [2] Berthelot, D., Carlini, N., Cubuk, E.D., Kurakin, A., Sohn, K., Zhang, H., Raffel, C.: Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. In: CoRR (2019)
  • [3] Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.A.: Mixmatch: A holistic approach to semi-supervised learning. In: NeurIPS (2019)
  • [4] Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: Richly-annotated 3d reconstructions of indoor scenes. In: CVPR (2017)
  • [5] Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3d object reconstruction from a single image. In: CVPR (2017)
  • [6] Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML (2017)
  • [7] French, G., Aila, T., Laine, S., Mackiewicz, M., Finlayson, G.: Consistency regularization and cutmix for semi-supervised semantic segmentation. In: CoRR (2019)
  • [8] Guo, H., Mao, Y., Zhang, R.: Mixup as locally linear out-of-manifold regularization. In: AAAI (2019)
  • [9] Hua, B.S., Pham, Q.H., Nguyen, D.T., Tran, M.K., Yu, L.F., Yeung, S.K.: Scenenn: A scene meshes dataset with annotations. In: 3DV (2016)
  • [10] Li, J., Chen, B.M., Hee Lee, G.: So-net: Self-organizing network for point cloud analysis. In: CVPR. pp. 9397–9406 (2018)
  • [11] Li, R., Li, X., Heng, P.A., Fu, C.W.: Pointaugment: an auto-augmentation framework for point cloud classification. In: CVPR. pp. 6378–6387 (2020)
  • [12] Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: Pointcnn: Convolution on x-transformed points. In: NeurIPS (2018)
  • [13] Liu, M., Sheng, L., Yang, S., Shao, J., Hu, S.M.: Morphing and sampling network for dense point cloud completion. In: CoRR (2019)
  • [14] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning library. In: NeurIPS (2019)
  • [15] Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: CVPR (2017)
  • [16] Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: NeurIPS (2017)
  • [17] Ravanbakhsh, S., Schneider, J., Poczos, B.: Deep learning with sets and point clouds. arXiv preprint arXiv:1611.04500 (2016)
  • [18] Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: ICLR (2016)
  • [19]

    Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. IJCV (2000)

  • [20] Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: NeurIPS (2017)
  • [21] Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: Relation network for few-shot learning. In: CVPR (2018)
  • [22]

    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)

  • [23] Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: Kpconv: Flexible and deformable convolution for point clouds. In: ICCV. pp. 6411–6420 (2019)
  • [24] Tokozume, Y., Ushiku, Y., Harada, T.: Between-class learning for image classification. In: CVPR (2018)
  • [25] Uy, M.A., Pham, Q.H., Hua, B.S., Nguyen, T., Yeung, S.K.: Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In: ICCV (2019)
  • [26] Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Courville, A., Lopez-Paz, D., Bengio, Y.: Manifold mixup: Better representations by interpolating hidden states. In: ICML (2019)
  • [27] Verma, V., Lamb, A., Kannala, J., Bengio, Y., Lopez-Paz, D.: Interpolation consistency training for semi-supervised learning. IJCAI (2019)
  • [28] Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: NeurIPS (2016)
  • [29] Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph cnn for learning on point clouds. TOG (2019)
  • [30] Wu, W., Qi, Z., Fuxin, L.: Pointconv: Deep convolutional networks on 3d point clouds. In: CVPR (2019)
  • [31] Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: A deep representation for volumetric shapes. In: CVPR (2015)
  • [32] Xu, Y., Fan, T., Xu, M., Zeng, L., Qiao, Y.: Spidercnn: Deep learning on point sets with parameterized convolutional filters. In: ECCV (2018)
  • [33] Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., Smola, A.J.: Deep sets. In: NeurIPS. pp. 3391–3401 (2017)
  • [34] Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: Beyond empirical risk minimization. In: ICLR (2017)
  • [35] Zhang, Z., He, T., Zhang, H., Zhang, Z., Xie, J., Li, M.: Bag of freebies for training object detection neural networks. In: CoRR (2019)
  • [36]

    Zhang, Z., Hua, B.S., Yeung, S.K.: Shellnet: Efficient point cloud convolutional neural networks using concentric shells statistics. In: ICCV (2019)

Appendices

Appendix 0.A Proofs for the properties of PointMixup interpolation

We provide detailed proofs for the shortest path property, the assignment invariance property and the linearity, stated in Section 3.4.

Proof for the shortest path property    We denote and are the points in and , then the generated and , where is the optimal assignment from to .

Then we suppose an identical one-to-one mapping such that . Then by definition of the EMD as the minimum transportation distance, so

(8)

where the right term of (8) is the transportation distance under identical assignment . Since . Thus,

(9)

Similarly as in (8) and (9), the following inequality (10) can be derived by assigning the correspondence from to with :

(10)

With (9) and (10),

(11)

However, as the triangle inequality holds for the EMD, i.e.

(12)

Then by summarizing (11) and (12), is proved.∎

Proof for the assignment invariance property    We introduce two intermediate arguments. We begin with proving the first intermediate argument: is the optimal assignment from to . Similarly as in (9) ,(10) and (12) from the proof for Proposition 1, in order to allow all the three inequalities, the equal signs need to be taken for all of the three inequalities. Consider that the equal sign being taken for (9) is equivalent to the the equal sign being taken for (8), then,

(13)

which in turn means that is the optimal assignment from to by the definition of the EMD. So the first intermediate argument is proved.

The second intermediate argument is that is the optimal assignment from to . This argument can be proved samely as the first one. Say the equal sign being taken for (10) is equivalent to that

(14)

Thus, is the optimal assignment from to is proved.

Then, with the two intermediate arguments, we can reformalize the setup to regard that is interpolated from source pairs and with the mix ratio , because the optimal assignment from to is the same as the optimal assignment from to . This argument then becomes an isomorphic with respect to the first intermediate argument. Then we prove that is the optimal assignment from to similarly as the proof for the first intermediate argument. ∎

Proof for linearity    We have shown that is optimal assignment between and . Thus, . ∎

1:Set of sampled episodes , where denoting the support and query sets
2:: feature extractor network: input latent embedding
3:randomly initialize
4:for episode  do
5:     for class  do
6:          calculate prototype from , with .
7:     end for
8:     Construct Mixup samples from support set .
9:     Predict the label distributions for mixed examples in , with distance to .
10:     Update with prediction from mixed examples, as episode-specific weights .
11:     for class  do
12:          calculate new prototype from , with
13:     end for
14:     Predict the label distributions for query examples in , with distance to .
15:     Update with prediction from query examples.
16:     
17:end for
18:return
Algorithm 1 Episodic training of ProtoNet with PointMixUp. From line 3 to line 8 is where PointMixUp takes a role in addition to the ProtoNet baseline. Testing stage is similar as training stage, but without line 13 and line 14 which learn new weight from query examples.

Appendix 0.B Few-shot learning with PointMixUp

We test if our PointMixup helps point cloud few-shot classification task, where a classifier must generalize to new classes not seen in the training set, given only a small number of examples of each new class. We take ProtoNet [3] as the baseline method for few-shot learning, and PointNet++ [2] is the feature extractor .

0.b.0.1 Episodic learning setup

ProtoNet takes the episodic training for few-shot learning, where an episode is designed to mimic the few-shot task by subsampling classes as well as data. A -way -shot setting is defined as that in each episode, data from classes are sampled and examples for each class is labelled. In the episode of training, the dataset consists of the training example and class pairs from classes sampled from all training classes. Denote is the support set which consists of labelled data from classes with examples, and is the query set which consists of unlabelled examples to be predicted.

0.b.0.2 Baseline method for few-shot classification: ProtoNet [3]

In each episode , ProtoNet computes a prototype as the mean of embedded support examples for each class , from all examples from the support set . The latent embedding is from the network (for which we use PointNet++ [2] without the last fully-connected layer). Then each example from the query set is classified into a label distribution by a softmax over (negative) distance to the class prototypes:

where is the Eudlidean distance in the embedding space. In training stage, the weights for the feature extractor is updated by the cross-entropy loss for the predicted query label distribution and the ground truth.

0.b.0.3 Few-shot point cloud classification with PointMixup

We use PointMixup to learn a better embedding space for each episode. Instead of using the directly to predict examples from query set, we learn a episode-specific weight from the mixed data, and the query examples are predicted by . We use PointMixup to construct a mixed set from the labelled support set , which consists of examples from class pairs and for each class pairs mixed examples are constructed from randomly sampling support examples. Then the weight is updated as from backprop the loss from the prediction of mixed examples from . After that, the label of query examples from is then predicted with the updated feature extractor . See Algorithm 1 for an illustration of the learning scheme.

Appendix 0.C Further Discussion on Interpolation Variants

The proposed PointMixUp adopts Optimal Assignment (OA) interpolation for point cloud because of its advantages in theory and in practice. To compare Optimal Assignment interpolation with the two alternative strategies, Random Assignment (RA) interpolation and Point Sampling (PS) interpolation, the proposed PointMixUp with OA interpolation is the best performing strategy, followed by PS interpolation. RA interpolation, which has a non-shortest path definition of interpolation, does not perform well.

Here we extend the discussion on the two alternative interpolation strategies, through which we analyze the possible advantages and limitations under certain conditions, which in turn validates our choice of applying Optimal Assignment interpolation for PointMixup.

0.c.0.1 Random Assignment interpolation

From our shortest path interpolation hypothesis for Mixup, the inferiority of RA interpolation comes from that it does not obey the shortest path interpolation rule, so that the mixed point clouds from different source examples can easily entangle with each other. From Fig. 3 in the main paper, the Random assignment interpolation produces chaotic mixed examples which can hardly been recognized with the feature from the source class point clouds. Thus, RA interpolation fails especially under heavy Mixup (the value of is large).

0.c.0.2 Point Sampling interpolation: yet another shortest path interpolation

Point Sampling interpolation performs relatively well in PointNet++ and sometimes comparable with the Optimal Assignment interpolation. From Fig. 3 in the main paper, the PS interpolation produces mixed examples which can be recognized which classes of source data it comes from.

Reviewing the shortest path interpolation hypothesis, We argue that when the number of points is large enough, or say , Point Sampling interpolation also (approximately) defines a shortest path on the metric space (Note that given the initial and the final points, the shortest path in is not unique). This is a bit counter-intuitive, but reasonable.

We show the shortest path property. Recall that point sampling interpolation randomly draws without replacement of points from each set are made according to the sampling frequency : where denotes a randomly sampled subset of , with elements. ( is the floor function.) And similar for with elements, such that contains exactly points. Imagine that a subset with a number of points in are identical with that in . For , the optimal assignment will return these identical points as matched pairs, thus they contribute zero to the overall EMD distance. Thus,

from which is because that and are the point clouds representing the same shape but with different density, and the same with and .

Similarly, , and thus , which in turn proves the shortest path property.

We note that the linearity of PS interpolation w.r.t. also holds and the proof can be derived similarly. Thus, although strictly not an ideally continuous interpolation path, PS interpolation is (appoximately) a shortest path linear interpolation in , which explains its good performance.

0.c.0.3 Point Sampling interpolation: limitations

The limitation of PS interpolation is from that the mix ratio controls change of local density distribution, but the underlying shape does not vary with . So, as shown in Table 5, PS interpolation fails with PointNet [1], which is ideally invariant to the point density, because a max pooling operation aggregates the information from all the points.

  Baseline   PointMixup   Random Assignment    Point Sampling
89.2 89.9 88.2 88.7
Table 5: Different interpolation strategies on PointNet [1] Following the original paper [1] we test on unaligned setting. PS interpolation fails with PointNet as a density-invariant model. The numbers are accuracy in percentage.

A question which may come with PS interpolation is that how it performs relatively well with PointNet++, which is also designed to be density-invariant. This is due to the sampling and grouping stage. PointNet++ takes same operation as PointNet in learning features, but in order to be hierarchical, the sampling and grouping stage, especially the farthest point sampling (fps) operation is not invariant to local density changes such that it samples different groups of farthest points, resulting in different latent point cloud feature representations. Thus, PointNet++ is invariant to global density but not invariant to local density differences, which makes PS interpolation as a working strategy for PointNet++. However, we may still expect that the performance of Mixup based on PS interpolation is limited, because it does not work well with PointNet as a basic component in PointNet++.

By contrast, the proposed PointMixup with OA interpolation strategy is not limited by the point density invariance. As a well established interpolation, OA interpolation smoothly morphes the underlying shape. So we claim that OA interpolation is a more generalizable strategy.

References

  • [1] Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: CVPR (2017)
  • [2] Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: NeurIPS (2017)
  • [3] Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: NeurIPS (2017)