Point cloud data is inevitably corrupted with noise, with the increasing access to scanning facilities especially the consumer-level depth sensors. Point cloud filtering is to reconstruct a point set by removing noise from its corresponding noisy input. The filtered point clouds can be used in various graphics applications, for example, point set resampling, surface reconstruction, point-based rendering and computer animation. Removing noise while preserving geometric features still remains a challenge in point cloud filtering.
State-of-the-art techniques have made remarkable progress in point cloud filtering. Specifically, the LOP-based methods (LOP [Lipman2007TOG], WLOP [Huang2009TOG], CLOP [Preiner2014TOG]
) are robust to noise and outliers. The RIMLS[Ztireli2009CGF] and GPF [Lu2018TVCG] consider to preserve geometric features. Nevertheless, these techniques still suffer from either feature smearing or less robustness in filtering. More precisely, the LOP-based techniques [Lipman2007TOG; Huang2009TOG; Preiner2014TOG] are not designed for preserving sharp features, because of their inherent isotropic property. RIMLS [Ztireli2009CGF] and GPF [Lu2018TVCG] can preserve geometric features to some extent; however, they depend greatly on the ability of normal filters which become less robust when meeting either large noise or irregular sampling. Furthermore, GPF [Lu2018TVCG] is not easy to find a radius that balances noise removal and gaps near edges, and it is slow due to the considerable amount of computation of the EM optimization. Finally, they all need trial-and-error parameter tuning which is boring and time-consuming, to achieve desired visual results. Point cloud filtering with resorting to deep learning has been rarely studied so far [Yu2018ECCV; Rakotosaona2019CGF]. Nevertheless, they still generate limited results with either smoothing out sharp features or poor generalization. As an ending result, the above limitations substantially restrict the robustness and applicability of these state-of-the-art point cloud filtering techniques. We show the comparison of the main characteristics between these techniques and our method in Table 1.
In this paper, we propose a novel point cloud filtering approach to settle the above issues. Motivated by the noticeable successes of AutoEncoder and PointNet [Schmidhuber2015NN; Qi2017CVPR]
in vision tasks, we design a learning framework referred to as Pointfilter for point cloud filtering. In particular, our framework is an encoder-decoder based architecture which straightforwardly takes the raw neighboring points of each noisy point as input, and regresses a displacement vector to push this noisy point back to its ground truth position. In designing the loss function, we also take geometric features into account so that features can be preserved by this network. Given a noisy point cloud as input, our trained model can automatically and robustly predict a corresponding quality point cloud, by removing noise and preserving geometric features. Various experiments demonstrate that our method achieves better performance than the state-of-the-art techniques (or comparable to optimization based methods like RIMLS and GPF which require trial-and-error parameter tuning), in terms of visual quality and error metrics. Our method is fast and avoids manual parameter tuning. The main contributions of this work are:
a novel framework that achieves point cloud filtering by encoder-decoder modeling;
a powerful loss function that takes geometric features into account;
extensive experiments and analysis in terms of visual quality and quantity;
the source code and involved dataset which will be made publicly available.
2. Related Work
We first review point cloud filtering techniques, and then look back to some previous works that conducted deep learning on point clouds.
2.1. Point Cloud Filtering
Point cloud filtering can be generally classified into two types: two-step based techniques and projection-based methods. The two-step based framework consists of at least two steps: normal smoothing and point position update under the guidance of the filtered normals. Avron et al.[Avron2010TOG] and Sun et al. [Sun2015CAGD] introduced and optimization for point set filtering, respectively. Recently, with sharp feature skeletons, a point cloud smoothing technique was presented based on the guided filter [Zheng2017TVC]. Zheng et al. [Zheng2018CAGD] extended the rolling guidance filter to point set filtering and designed a new point position updating strategy to overcome sharp edge shrinkage. Lu et al. [Lu2018ArXiv] proposed a two-step geometry filtering approach for both meshes and point clouds. Most point set filtering methods achieve filtered results through projecting the input point set onto the underlying point set surface. One popular category of this type is moving least squares and its variants [Levin1998MoC; Levin2004GMfSV; Alexa2001Vis; Alexa2003TVCG; Amenta2004TOG; Fleishman2005TOG; Ztireli2009CGF]. The moving least squares (MLS) has been seminally formulated by Levin [Levin1998MoC; Levin2004GMfSV]. Some works defined moving least squares (MLS) and extremal surfaces [Alexa2001Vis; Alexa2003TVCG; Amenta2004TOG]. Later, two different variants have been presented for projection: statistics-based and robust implicit moving least squares (RIMLS) [Fleishman2005TOG; Ztireli2009CGF]. Lange et al. [Lange2005CAGD]
developed a method for anisotropic fairing of a point sampled surface using an anisotropic geometric mean curvature flow. Recently, the LOP (locally optimal projection) based methods have become increasingly popular. For example, Lipman et al.[Lipman2007TOG] proposed the locally optimal projection operator (LOP) which is parameterization free. Later, Huang et al. [Huang2009TOG]
presented a weighted LOP (WLOP), which enhances the uniform distribution of the input points. A kernel LOP has also been proposed to speed up the computation of LOP[Liao2013CAD]. More recently, a continuous LOP (CLOP) has been presented to reformulate the data term to be a continuous representation of the input point set and arrives at a fast speed [Preiner2014TOG]. Note that a few projection-based methods utilize smoothed normals as prior to preserve geometric features, such as EAR [Huang2013TOG] and GPF [Lu2018TVCG].
2.2. Deep Learning on Point Clouds
Qi et al. [Qi2017CVPR] proposed the pioneering network architecture, named PointNet, that can consume raw point clouds directly. The key ingredient of PointNet is that using fully connected layers (MLPs) to extract features instead of convolution operators which are not suitable in irregular domain. Although PointNet achieves remarkable success in shape classification and segmentation, point-wise features fail to characterize local structures which is crucial for high-level semantic understanding. To do this, an improved version, PointNet++ [Qi2017NIPS], was proposed to aggregate local structures in a hierarchical way. Following PointNet and PointNet++, lots of network architectures applied on raw point clouds emerged. For instance, based on dynamic local neighborhood graph structure, Wang et al. [Wang2019TOG] designed an EdgeConv block to capture the relationships both in spatial and feature space. At the same time, an alternative convolutional framework, SpiderCNN [Xu2018ECCV], was proposed to aggregate neighboring features by a special family of parameterized weighted functions instead of MLPs. Inspired by the Scale Invariance Feature Transform [Lowe2004IJCV] (SIFT) which is a robust 2D representation, the SIFT-like module [Jiang2018ArXiv] was developed to encode information of different orientations and scales and could be flexibly incorporated into PointNet-style networks. Besides shape classification and segmentation tasks, there are few point input network architectures applied on upsampling [Yu2018CVPR; Yifan2019CVPR]
, local shape properties estimation[Guerrero2018CGF] and so on. As for point cloud filtering, Roveri et al. [Roveri2018CGF] proposed a filtering network, PointProNet, designed for consolidating raw point clouds corrupted with noise. Benefiting from powerful 2D convolution, PointProNet transfers 3D point clouds consolidation into 2D height map filtering. To preserve sharp edges while filtering, Yu et al. [Yu2018ECCV] introduced an novel edge-aware network architecture, EC-Net, by incorporating a joint loss function. Recently, a two-stage network architecture, PointCleanNet (PCN) [Rakotosaona2019CGF], was developed for removing outliers and denoisnig separately.
Given a noisy point cloud, we aim to restore its clean version by our Pointfilter in a manner of supervised learning. Before introducing details of our Pointfilter framework, we first formulate noisy point cloud as follows:
where is an observed point cloud corrupted with noise, is the corresponding clean point cloud (underlying surface) and is the additive noise. In this work, we address the filtering problem in a local way, which means the filtered result of a noisy point only depends on its neighboring structure. As we know, point cloud filtering is an ill-posed problem and it is difficult to straightforwardly regress the additive noise for each noisy point like image filtering. As an alternative, we handle point cloud filtering by projecting each noisy point onto the underlying surface. More specifically, we treat the additive noise as displacement vectors between the noisy point cloud and the clean point cloud , and learn the displacement vector for each noisy point. To achieve this, we propose an encoder-decoder architecture network, named Pointfilter, to regress the additive noise , shown in Fig. 1. We briefly introduce a pre-processing step for the input data in Section 3.2, and then show how to model our Pointfilter in Section 3.3. We finally explain how we train our network in Section 3.4 and how we make inference with the trained network in Section 3.5.
Given a pair of point clouds and , the noisy patch and its corresponding ground truth patch are defined as follows
where , and
is the patch radius. Once patches are generated, two issues need to be addressed in point cloud filtering: (1) how to avoid unnecessary degrees of freedom from observed space? (2) how to guarantee our Pointfilter is insensitive to certain geometric transformations (e.g. rigid transformations)? For the first issue, an immediate remedy is to translate patches into origin and then scale them into unit length, i.e.,. Similarly, the ground truth patch does the same thing, i.e., . To be invariant to rigid transformations (e.g., rotations), a few methods [Qi2017CVPR; Qi2017NIPS] attempted to predict rotation matrix
via an additive spatial transformer network, while it has been proven to be fragile to rotations without massive data augmentation[you2018ArXiv]. In this work, we align the input patches by means of aligning their principle axis of the PCA axes with the Z-axis. The alignment process is illustrated in Fig. 2. To effectively tune network parameters with batch processing, the number of points in each input patch should be the same. In experiments, we empirically set
by default. We pad the origin for patches with insufficient points () and do random sampling for patches with sufficient points (). As for patch generation, the patch radius is default to of the model’s bounding box diagonal.
3.3. The Pointfilter Framework
The architecture of our point cloud filtering framework is demonstrated in Fig. 1. The key idea of our Pointfilter is to project each noisy point onto the underlying surface according to its neighboring structure. To achieve this, we design our Pointfilter network as an encoder-decoder network. Specifically, the encoder consists of two main parts: (1) feature extractors (i.e., MLPs) that are used to extract different scales of features; (2) a collector that is used to aggregate the features () as a latent vector . The encoder module attempts to obtain a compact representation for an input patch. In the decoder module, a regressor is employed to evaluate the displacement vectors with the latent representation vector as input. In our paper, we adopt the recent PointNet [Qi2017CVPR] as the backbone in our Pointfilter. In practice, the extractors and collector are realised by the shared MLPs and max pooling layer, respectively, and the regressor is constructed by three fully connected layers. Details of our Pointfilter network are shown in Fig. 1. At the beginning of our Pointfiler, a PCA-induced rotation matrix is applied to transform the input patch to a canonical space. Therefore, at the end of our Pointfiler, an inverse matrix should be multiplied by the evaluated displacement vector to get final displacement vector.
Loss function. To enable the filtered point cloud approximating the underlying surface while preserving sharp features, the loss function should be elaborately defined. A simple option for measuring the filtered point cloud would be the distance, which has been used in [Rakotosaona2019CGF]. As shown in Fig. 4, compared with the -based distance (4 (b)) which is sampling dependent, a more general alternative is to project noisy points onto the underlying surface (4 (c), (d)). Moreover, the -based distance can hardly retain sharp features in point cloud filtering. Thus, the loss should be capable of measuring the projection distance. Inspired by [Kolluri2008TOA], our projection loss is defined as
where is the filtered point of the noisy point , and is the ground-truth normal of the point . And is a Gaussian function giving larger weights to points near , which is defined as
where is the support radius which is normally defined as . Here, is the diagonal length of the bounding box of and [Huang2009TOG]. Besides approximating the underlying surface, we also assume that the filtered point cloud should have a relatively regular distribution. To do this, a repulsion term is added to mitigate points aggregation. Therefore, the whole loss function is formulated as:
where is a trade-off factor to control the repulsion force in the filtering process and we empirically set in our training stage. However, we observed that the above project loss would generate gaps near sharp features, since the normal-based distance would increase in the geometric dissimilarity by definition (see Fig. 4 (c)). Although the regular loss function can alleviate gaps to some extent, it still fails to preserve sharp feature duing filtering process (see Fig. 5) (b)). We address this issue by considering the normal similarity in our loss function, in which we introduce a bilateral mechanism to construct the projection distance formula (Eq. (3)). Specifically, the function is defined as the normal similarity between the current point and its neighboring points in the patch. For simplicity, the function is [Huang2013TOG], where is the normal of filtered point and is support angle that the default value is . Thus, our final projection function is defined as
For efficiency and simplicity, the normal of the filtered point is assigned by the normal of the ground truth point which is nearest to the filtered point. It should be noted that our Pointfilter only requires ground-truth point normals in the training stage.
We chose the encoder-decoder structure because: (1) it is stable and mature; (2) it can learn complex and compact representations of point cloud; (3) the learned latent representations are helpful to regress the displacement vector according to the input noisy patch.
3.4. Network Training
Our Pointfilter is implemented in PyTorch on a desktop machine with an Intel Core I7-8750H CPU (2.20 GHz, 16GB memory) and a GeForce GTX 1060 GPU (6GB memory, CUDA 9.0). The epoch number of the training stage is
. SGD is set as our optimizer and the learning rate is decreased from 1e-4 to 1e-8. The mini-batch size is 64 and batch-normalization[Ioffe2015ArXiv], ReLU [Nair2010ICML] and TanH are used in our Pointfilter framework.
3.5. Network Inference
Given a trained Pointfilter, our approach filters noisy point cloud in a point-wise way. Firstly, we build a patch structure for each noisy point and transform each patch to a canonical space by following Section 3.2. Secondly, each pre-processed patch is fed into the trained Pointfilter to infer a displacement vector. Finally, the displacement vector evaluated by our Pointfilter should be mapped back to the original space. The inference can be formulated as follows:
where and are the filtered point and noisy point, respectively. represents our Pointfilter. is the PCA-induced rotation matrix, and is the patch radius. To get better filtered results, we adopt multiple iterations of inference to progressively filter the noisy point cloud, especially for point clouds corrupted with larger noise.
4. Experimental Results
As a supervised learning method, we prepare a training dataset consisting of 3D clean models ( CAD models and non-CAD models) which are shown in Fig. 6. Each model is generated by randomly sampling
k points from its original surface. Given a clean model, its corresponding noisy models are synthesized by adding Gaussian noise with the standard deviations of, , , , and of the clean model’s bounding box diagonal. In sum, our training dataset contains () models. Notice that these models are our final training dataset, and we do not augment any data on-the-fly in training. Besides, the normal information for clean models are required for training, as indicated in Eq. (6).
To demonstrate the generalization of the proposed Pointfilter, our test dataset includes both synthesized noisy models and raw-scan models, which will be explained in the following experiments (Section 4.4 and 4.5).
4.2. Compared Techniques
We compare our approach with the state-of-the-art point cloud filtering techniques, namely WLOP [Huang2009TOG], CLOP [Preiner2014TOG], RIMLS [Ztireli2009CGF], GPF [Lu2018TVCG], EC-Net [Yu2018ECCV] and PointCleanNet (PCN) [Rakotosaona2019CGF]. Specifically, RIMLS and GPF are designed to preserve sharp features by incorporating smoothed normals. For fair comparisons and visualization purposes, we (i) tune the main parameters of each state of the art technique to achieve as good visual results as possible (EC-Net, PCN and our method have fixed parameters); (ii) employ the same surface reconstruction parameters for the same model. Notice that surface reconstruction is straightforwardly applied to the filtered point sets. As for PCN, we use the source code released by the authors to train a new model over our training dataset. Since EC-Net requires manually labelling polylines of sharp edges for training, which is infeasible for training a new model on our training dataset, we simply utilize the trained model released by the authors instead. We compare our method with these methods, in terms of both visual quality and quantity.
4.3. Evaluation Metric
For the sake of analysing performance of our Pointfilter quantitatively, the evaluation metrics should be defined. The Pointfilter aims to project noisy points onto its underlying surface. It is intuitive to evaluate the distance errors by averaging the distances between a point in the ground truth and its closest points in the filtered point cloud[Lu2018TVCG]. The distance error between two models can be defined as
where is the ground truth point and is one of its neighboring point in the filtered point cloud. and represents the nearest neighbors. In our paper, we set as . Inspired by [Fan2017CVPR], we also introduce the Chamfer Distance (CD) to evaluate the error between the filtered point cloud and its corresponding ground truth (clean point cloud). CD is defined as
where and represent the cardinalities of the clean point cloud and the filtered point cloud , respectively. The CD metric finds the nearest neighbor in the other set and sums the squared distances up. It can be viewed as an indicator function which measures the “similarity” between two point sets. Also, it can be easily implemented in parallelism.
4.4. Visual Comparisons
Point clouds with synthetic noise. The synthetic noise level is estimated by the diagonal length of the bounding box. For example, noise denotes of the diagonal length. As shown in Fig. 3, we test four CAD models (Boxunion, Cube, Fandisk and Tetrahedron) with noise. Compared with the state-of-the-art point cloud filtering techniques, we observe that results by our Pointfilter generates visually better results, in terms of noise removal and features preservation. Note that RIMLS and GPF can also preserve sharp features to some extent; however, they depend greatly on the capability of normal filters which become less robust when meeting large noise. Compared to RIMLS and GPF, we elegantly detour the normal filtering issue since our framework requires the easily obtained ground-truth normals for training only. As shown in Fig. 10, RIMLS and GPF produce less desired results when handling noise, for example, obvious gaps in sharp edges (10 (b)) and striking outliers (10 (c)). By contrast, our Pointfilter can also preserve sharp features. Since the outliers exist in RIMLS and GPF results, their filtered point clouds involve shrinkage to some extent. Despite that WLOP and CLOP are good at generating smooth results, they still fail to retain sharp features. Regarding EC-Net, it generates less pleasing results, in terms of removing noise. Besides CAD models, we also test some non-CAD models corrupted with synthetic noise. As shown in Fig. 7, our proposed Pointfilter can also output visually decent results while preserving the geometric features. The reconstruction results of WLOP and CLOP has slight shrinkage in arms and legs (Fig. 8 (b) and 8 (c)).
Point clouds with raw noise. We also evaluate our Pointfilter on raw scanned point clouds corrupted with raw noise. Since the ground truth models of these raw scanned point sets are not available, we demonstrate the visual comparisons with other methods, as suggested by previous techniques [Lu2018TVCG]. Notice that we do not re-train our Pointfilter for the type of raw noise. From Fig. 11, we see that the result by our method is nicer than the state-of-the-art techniques. Besides noise removal, our Pointfilter is capable of retaining geometric features which are marked by yellow box and black arrows in Fig. 11. Fig. 13 shows that our approach induces a better enhancement on the surface reconstruction quality, in terms of preserving geometric features. Fig. 12 shows a virtually scanned point cloud model. Compared with other filtering methods, our Pointfilter still produces higher quality results, in terms of preserving sharp edges. In addition, we also test our Pointfilter on scene-level models from the Paris-rue-Madame Database [Serna2014ICPRAM] (see Fig. 18 and Fig. 19). As we can see from these figures, our Pointfilter is still able to produce better results.
Point clouds with strong outliers. Although the Pointfilter is not particularly designed for outliers removal, we still produce competitive results in point clouds with large outliers. As suggested by [Lu2018TVCG], we also conduct an experiment by comparing our method with LOP-based methods (WLOP, CLOP) which are robust to outliers due to the -norm term. This experiment demonstrates the capability of our Pointfiler in dealing with strong outliers. From the results in Fig. 14, we can observe that the Pointfilter can also generate a comparable result to WLOP and CLOP.
4.5. Quantitative Comparisons
Besides the above visual comparisons, we also contrast all the methods quantitatively. Specifically, we compare the above defined metrics, as well as the runtime for all methods.
Errors. We calculate the above metrics of all compared methods on some point clouds. These point sets are achieved by adding synthetic noise to the ground truth. To depict the distance errors, we calculate the mean square error (MSE) for each ground truth point via Eq. (8) and visualize the results in Fig. 9. From these visualization results, it is seen that the Pointfiler generates comparable or surpassing results, especially on the sharp features. To comprehensively evaluate our Pointfiler, we also introduce the chamfer distance (CD) to compute the overall error between a filtered point cloud and its ground truth. As illustrated in Fig. 15, we calculate the average errors of the models appeared in Fig. 9 in terms of MSE and CD, respectively. Despite that the results of RIMLS are comparable to our results, it requires trial-and-error parameter tuning to obtain satisfactory results. Moreover, such parameter tuning is boring and time-consuming, and even becomes difficult for users who do not have any background knowledge. By contrast, our method is automatic and simple for users to use, and is the most accurate one among all compared approaches.
Runtime. Because surface reconstruction is an application over point clouds, we only calculate the runtime of each point set filtering method. In particular, optimized-based methods (RIMLS, GPF, WLOP and CLOP) involve multiple steps and require trial-and-error efforts to tune parameters to produce decent visual results, which means these methods generally require a much longer “runtime” in practice. Thus, we only consider learning-based methods (EC-Net, PCN and Our), in terms of time consumption in the test stage. Table 2 summaries the runtime of each learning-based method on some point clouds using the same configuration. Table 2 sees that EC-Net is the fastest method among the learning-based methods, as it is an upsampling method and only requires a few patches evenly distributed on the noisy input in the test phase. For fair comparisons, we only consider the runtime of the noise removal module in PCN, because extra time consumption would be introduced for the outliers removal module. In spite of this, PCN is still the slowest one. Our approach ranks the second in speed, and we suspect that it is due to the point-wise manner.
In this paper, we proposed a Pointfilter framework for feature-preserving point cloud filtering. Our architecture can be easily trained. Given an input noisy point cloud, our method can automatically infer the involved displacement vectors and further the filtered point cloud with preserved sharp features. Extensive experiments and comparisons showed that our method outperforms the state-of-the-art point set filtering techniques (or comparable to optimization based methods like RIMLS and GPF which need trial-and-error parameter tuning), in terms of both visual quality and evaluation errors. Our approach is automatic and also achieves impressive performance on test time. Compared to PCN [Rakotosaona2019CGF], it should be noted that our Pointfilter is not designed for larger-scale outliers removing, and our Pointfiler and PCN are thus complementary in terms of sharp features preservation and heavy outliers removal.
Our method involves a few limitations. First, our Pointfilter becomes hard to retain the sharp features when handling excessive noise (see Fig. 16). Secondly, our method fails to handle significant holes in point clouds (see Fig. 17). In future, we would like to incorporate global shape information into our framework to help guide point cloud filtering.